I tend to scribble a lot -Nic McPhee

Spelling Errors in Academic Texts

Minding Your Ps and Qs in Academic Papers

In my work, I frequently produce documents such as scientific papers in PDF format from text setting systems such as Latex or Docbook.

The problem: With each of these documents, I need to run a spell checking before the document can be released. With the typesetting systems I use, the source from that the PDF is created is a mixture of the actual text content that I’d like to spell-check, and directives or XML elements that control the type setting systems. Running a spell checker on the source would thus give me tons of false positives in the non-text parts of the source.

This raises the demand to spell-check not the source, but the produced PDF, which does not contain the control directives anymore. Unfortunately, most PDF tools, including Adobe’s Acrobat, only support spell checking annotations and comments added to the PDF, not the regular content of the document. This is probably because PDF files are commonly produced from tools that allow spell checking in the first place, such as Microsoft Word.

The workaround: One common solution you find on the web to deal with this issue is to convert the PDF back to a text format, for example a Word document, and then use Word or some other spell checker on that document. This is something that can be done with Adobe’s Acrobat Pro or tools like pdftotext. However, this conversion usually does not work flawlessly, especially if your text is mixed with figures, tables, etc., where the captions of these figures would end up mixed with the surrounding text and again produce a result that causes lots of false positives in the spell checker.

The solution: Being annoyed by going through lots of false positives, I searched the web again to figure if some solution to this turned up in the meantime, and my search turned up Infix PDF Editor. This tool supports exactly what is needed for my workflow, namely spell checking a PDF file. It even supports correcting the errors found in place, although this is not needed for my workflow. Using PDF Editor saves me a lot of time now, since conversion of the PDF back to text and dealing with false positives caused by flaws in the conversion is not needed anymore.

– Michael Stilkerich, Germany


Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>