You are not logged in.

#1 2014-07-31 07:44:30

Dmitry1940
Member
Registered: 2014-07-30
Posts: 2

PDF with encoded fonts

Hello everyone.

Please forgive me if this has been brought up before but I didn’t manage to find anything using search.

I’ve been tasked with translating a number of similar PDF documents such as one available at http://glneurotech.com/docrepo/user-gui … _final.pdf and stumbled onto the fact that I can neither copy-paste their contents, nor export them for translation via the Export function, as both when pasting and exporting the resulting text contains gibberish. After some googling I found out that it’s because these PDFs use encoded fonts (symbols shown don’t match their code). I’m aware of the Text > Remap fonts… function, but it works only for the given document and the results of the remapping aren’t “exported” to other similar documents for obvious reasons. Is there any workaround for this?

I’m using Infix 6.30.

Thank you.

Offline

#2 2014-07-31 09:10:48

martin
Moderator
Registered: 2013-03-27
Posts: 61

Re: PDF with encoded fonts

Hi,

To make file sizes smaller in a PDF file fonts contained within the pdf can be subsetted, this means that they only contain the characters within them that are necessary to represent the text used in the PDF document. When pdf producing software does this quite often the raw character codes within the pdf do not match with the Unicode values for the characters displayed.

Fonts in a pdf document can have a "To Unicode" CMap in them that maps raw character codes contained within the  pdf document to Unicode values so that text can be copied and pasted from a pdf document to other applications and document formats.

Some PDF producing applications add To Unicode CMaps that contain invalid mappings from raw codes to unicode values. If this is the case then copying text to other applications or other document formats correctly is impossible as there is no way of telling what the correct Unicode values are for the raw character codes in the pdf.

Because of this we added the "Remap Fonts..." functionality so that you can correct the errors in the CMap for a given font in a document.

As each subsetted font is unique to each document (as every document will use a different subset of characters) and contains a unique to Unicode CMap it is not possible to port the fixes to other documents.

We are working on an "Auto Correct font Mapping" functionality that tries to correct the mappings for a font automatically but this is not finished yet.

If you are creating the pdfs yourself it might be worth trying to find a different software application to create the pdfs that will add a valid CMap when subsetting the fonts in the pdf.

I hope this has answered your questions,

Martin.

Offline

#3 2014-08-01 10:48:37

Dmitry1940
Member
Registered: 2014-07-30
Posts: 2

Re: PDF with encoded fonts

Hi Martin,

Thank you for the exhaustive explanation and good luck in the future development of the program.

Offline

Board footer

Powered by FluxBB