Post Processing Applications

From this page, users can access several post-processing applications designed to check text or HTML prior to submission to Project Gutenberg. All are written in Python3. All are Open Source under the MIT license if a user wants to download and use a local copy. Download links are on each project page.


ppgutc home page: here

This program borrows checks from many other scattered programs, including the gutcheck (Gutenberg check) macros. It runs about sixty checks with more to be added. Provide ppgutc a UTF-8 text file.

The discussion for this project is in the Distributed Proofreader's ppgutc thread


ppscan home page: here

This program provides a state-machine based search of a text file looking for unexepected curly quote sequences. For example, it knows not to expect an open double quote immediately after an open double quote if there isn't an intermediate paragraph break. Checks using a local context miss this type of error. The program currently generates many false positives, but the errors it does find are elusive and usually not findable using other methods. Provide ppscan a UTF-8 text file with curly quotes.

The discussion for this project is in the Distributed Proofreader's ppscan thread


ppppv home page: here

This program performs specific checks on an HTML file and images in an images folder related to post-processing verification. Provide ppppv a zip file containing the .html or .htm project file and the images folder.


ppspell home page: here

This is an online version of the ppspell program. It attempts to do an intelligent spell-check of a text file. For example, non-dictionary words that meet certain tests, such as frequency of occurence, are accepted as good words. Provide ppspell a UTF-8 text file.


jeebies home page: here

OCR scanning often confuses the letter "h" and the letter "b" when scanning "he" or "be" in the source text. Jeebies tries to find where this might have happened. Provide jeebies a UTF-8 text file.


ppsmq home page: here

This program takes a user's source file and creates a new one with straight quotes curled converted to curly quotes as much as is programmatically possible. A few minutes of user cleanup with an editor and it's done.

Other Tools

There are many other text/HML/project-checking tools out there. Walt has upgraded and maintains pptxt and he's written ppscannos1. Some people still use gutcheck, though I think that struggles with UTF-8. If you know of another tool that should be on this page for PPers/PPVers, let me know where it is and what it does. Thanks.