Manuel AristaránGet the code
Desktop app which makes information trapped in PDF format manipulable again.
If you’ve ever tried to use data in tables in a PDF and realized that there’s no easy way to copy and paste the rows from that format to a spreadsheet, you’ll find Tabula very useful. This tool allows you to extract this data into a manipulable format such as CSV or a Microsoft Excel spreadsheet using a friendly interface.
This tool was developed by Manuel Aristarán, Mike Tigas and Jeremy B. Merrill with the support of ProPublica, La Nación DATA, Knight-Mozilla OpenNews, and The New York Times. Tabula was designed by Jason Das. Researchers from diverse areas of expertise use this tool to convert PDF documents into spreadsheets and other formats for use in analysis and databases.
Tabula is being used to empower investigative reporting at organizations of all sizes, including:
ProPublica, The Times of London, Foreign Policy, La Nación (Argentina), The New York Times and the St. Paul (MN) Pioneer Press.
It can be used as a tool with its own interface or you can reuse Tabula-java to incorporate it into your project: https://github.com/tabulapdf/tabula-java