Searching Chinese Patents: Challenges and Solutions When Building an Innovative Discovery Interface
Share this Session:
  Eric Pugh   Eric Pugh
OpenSource Connections


Wednesday, April 30, 2014
09:30 AM - 10:15 AM

Level:  Intermediate

The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is the story of the Global Patent Search Network, the next generation multilingual search platform for the USPTO.

GPSN,, was the first public application deployed in the cloud, and allowed a very small development team to build a discovery interface across millions of patents.

This case study will cover:

  • How we leveraged Amazon Web Services platform for data ingestion, auto scaling, and deployment at a very low price compared to traditional data centers.
  • We will cover some of the innovative methods for converting XML formatted data to usable information.
  • Parsing through 5 TB of raw TIFF image data and converting them to modern web friendly format.
  • Challenges in building a modern Single Page Application that provides a dynamic, rich user experience.
  • How we built “data sharing” features into the application to allow third party systems to build additional functionality on top of GPSN.

Fascinated by the “craft” of software development, Eric Pugh has been heavily involved in the open source world as a developer, committer, and user for the past 5 years. He is an emeritus member of the Apache Software Foundation and lately has been mulling over how we move from the read/write web to the data web. In biotech, financial services and defense IT, he has helped European and American companies develop coherent strategies for embracing open source software. Eric became involved in Solr when he submitted the patch SOLR-284 for Parsing Rich Document types such as PDF and MS Office formats that became the single most popular patch as measured by votes!

Close Window