Making 10M government PDF documents searchable – FlowingData

Government organizations love to distribute documents as PDF files. They are easy to forward and to print. The problem is when you want to find and access them later among millions of other files. GovScape, a research project between the University of Washington and Boston University, provides a search interface through the End of Term Web Archive’s 2020 crawl.

The code for GovScape is open source and available on GitHub. I have a feeling such a tool will grow more important going forward.

Source link

spot_imgspot_img

Subscribe

Related articles

Snack Friday Sale + the Cutest Foodie Gift Guide Ever (shop small!)

This holiday season… we’re shopping small, buying thoughtful gifts...

How One Band Took Their Fans On A Trip To Japan

It'd be pretty surreal for anyone to join their...

Ahead of Market: 10 things that will decide stock market action on Monday

The Indian market closed flat with a slight negative...

Keeping it Fresh: Preservatives and The Poison Squad

More than a century ago, enterprising manufacturers added brand-new...
spot_imgspot_img