You are not logged in.

#1 2014-12-11 21:33:12

felipecasta
Member
Registered: 2014-12-11
Posts: 1

OCR Pdf file exports translation xml with extra tags

Im performing OCR with Omnipage and exporting a PDF. Im opening up the PDF in Infix and exporting an xml file for translation. The problem is there are many extra tags appearing, specifically at points where text meets text box borders; there are no visible line breaks in the pdf or any other thing I can see, yet the tags are there.
How can I get rid of these tags?

Offline

#2 2014-12-12 10:09:54

martin
Moderator
Registered: 2013-03-27
Posts: 61

Re: OCR Pdf file exports translation xml with extra tags

Hi,

A PDF document does not contain any carriage returns or paragraph marks within it. Our software uses heuristics to infer where carriage returns and paragraph marks should be in the pdf. If you feel that we are getting them "wrong" can you email us a copy of the pdf to support@iceni.com and we'll have a look at it.

regards,

Martin.

Offline

Board footer

Powered by FluxBB