IP Australia Metadata Extraction Project

The Client: IP Australia

IP Australia is the Australian Government agency responsible for administering intellectual property (IP) rights and legislation relating to patents, trademarks, designs and plant breeders’ rights.

 

The Patent Backcapture Challenge

IP Australia had 390,000 historic patent documents, dating back to 1904, with little or no metadata. It was impossible to search through them effectively. IPA asked Semantic Sciences to extract items of metadata using the Sintelix extraction capabilities so that these records would be accessible to clients.

Many of these documents were only available in hard copy and some of them over 100 years old, in black and white and of moderate quality.  Using OCR, these documents were converted into a PDF format, creating new opportunities for storage and analysis.

 

IP Australia Project Requirements

  1. Capture/extract bibliographic fields from OCRed patent records and specifications from 1904 to 1979.
  2. Provide IPA with captured/extracted data in a specified structured XML format

 

The Backcapture Solution

As shown in the workflow diagram below, Sintelix provided a solution to IP Australia’s challenges within 2 months by:

  1. Extracting and transforming existing patent specification documents into 390,000 PDF documents.
  2. Loading those documents into Sintelix
  3. Normalizing and extracting information from those documents, creating 390,000 xml files
  4. Placing the metadata back into IP Australia databases in a searchable and easy to analyze format, making records accessible to clients.

With Sintelix, IP Australia were able to transform a significant amount of data, extracting a large amount of information, including:

  1. Filing date (lodging or lodged date) of patent specification
  2. Invention title
  3. Applicant(s) name
  4. Inventor(s) name
  5. Agent’s name
  6. OPI date
  7. Filing date of basic application/ priority application
  8. IP Office of priority country
  9. Priority application number/number assigned to priority application
  10. Divisional application numbers (parent/child applications)

See examples below, showing the metadata extracted from historic patent specifications:

Data Backcapture Project Outcome:

With Sintelix, IP Australia were able to successfully extract metadata from 390,000 patent specifications within 6 weeks, meeting the tight deadline and delivering the required level of accuracy.

The letter of recommendation below from IP Australia confims the following project highlights:

  • High consistency
  • Excellent accuracy
  • Rapid execution
  • Low cost

Here are some of the comments from the letter of recommendation:

“The project was organised in two (2) stages: a proof of concept and a main delivery, with a decision gate in between. The results IPA received from the proof of concept were good and achieved within a very short period, so IPA authorised the main project to proceed. Its timelines were tight (6 weeks) and required high accuracy.

Semantic Sciences Research provided IPA with visibility of its progress via online access to progress reports with drill-down to the source and processed data provided from its Sintelix software platform.

Delivered results were excellent. A field accuracy of 99.7% was achieved, which is significantly greater that IPA would expect from human transcription. The project was performed on time and on budget.

IP Australia enjoyed a positive experience of working with Semantic Sciences Research and using Sintelix. The company met our procurement and performance expectations for service providers. We valued Semantic Sciences Research’s timeliness, responsiveness and proactivity.”  Veena Bhat, Patent Search Capability Coordinator, IP Australia.

PDF Download

Click here to download this case study in PDF format.

The Text Intelligence Blog

Information, tools, tips & real-world case-studies to keep you at the cutting edge of Text Intelligence.

Recent Posts

Defence and Security Equipment International (DSEI) 2021

Defence and Security Equipment International (DSEI) 2021The biennial Defence and Security Equipment International (DESI) defence and security trade exhibition returns to ExCeL in London from 14 to 17 September, 2021. The event will once again host the world’s largest...

ICETCI 2021 – Emerging Techniques in Computational Intelligence

ICETCI 2021 The International Conference on Emerging Techniques in Computational Intelligence, ICETCI 2021 will be held at virtually from August 25 to 27, 2021. The conference will consist of one day of tutorial sessions followed by two days of keynote lectures by...

About Sintelix

Sintelix is a world-leading supplier of text intelligence solutions and analytical software for unstructured data.

Organizations use Sintelix to transform data complexity into real-time, actionable intelligence.

Discover Sintelix

Sintelix Brochure

Sintelix Brochure (US Version)

Download Sintelix Brochure

Download Sintelix Brochure (US Version)