Apache Tika is best used as a library to wrap your own code around. Its GUI application is a toy, and its command line version isn’t all that great either. The command line can be improved with a little scripting, though.
- Follow Mad File Format Science on WordPress.com
- Introductory JHOVE workshop, January 25, 2019
- Why does one PDF display and another one download?
- The digital preservation song challenge!
- Fact-checking the GIF format
- How to approach the file format validation problem
- Emoji interoperability (or its lack)
- Data Transfer Project: New models for interoperability
- DNA as data storage