PDF with encoded fonts

Dmitry1940 · 31 July 2014 07:44

Hello everyone.

Please forgive me if this has been brought up before but I didn’t manage to find anything using search.

I’ve been tasked with translating a number of similar PDF documents such as one available at http://glneurotech.com/docrepo/user-guides/Note_BioCaptureEEG_final.pdf and stumbled onto the fact that I can neither copy-paste their contents, nor export them for translation via the Export function, as both when pasting and exporting the resulting text contains gibberish. After some googling I found out that it’s because these PDFs use encoded fonts (symbols shown don’t match their code). I’m aware of the Text > Remap fonts… function, but it works only for the given document and the results of the remapping aren’t “exported” to other similar documents for obvious reasons. Is there any workaround for this?

I’m using Infix 6.30.

Thank you.

martin · 31 July 2014 09:10

Hi,

To make file sizes smaller in a PDF file fonts contained within the pdf can be subsetted, this means that they only contain the characters within them that are necessary to represent the text used in the PDF document. When pdf producing software does this quite often the raw character codes within the pdf do not match with the Unicode values for the characters displayed.

Fonts in a pdf document can have a “To Unicode” CMap in them that maps raw character codes contained within the pdf document to Unicode values so that text can be copied and pasted from a pdf document to other applications and document formats.

Some PDF producing applications add To Unicode CMaps that contain invalid mappings from raw codes to unicode values. If this is the case then copying text to other applications or other document formats correctly is impossible as there is no way of telling what the correct Unicode values are for the raw character codes in the pdf.

Because of this we added the “Remap Fonts…” functionality so that you can correct the errors in the CMap for a given font in a document.

As each subsetted font is unique to each document (as every document will use a different subset of characters) and contains a unique to Unicode CMap it is not possible to port the fixes to other documents.

We are working on an “Auto Correct font Mapping” functionality that tries to correct the mappings for a font automatically but this is not finished yet.

If you are creating the pdfs yourself it might be worth trying to find a different software application to create the pdfs that will add a valid CMap when subsetting the fonts in the pdf.

I hope this has answered your questions,

Martin.

Dmitry1940 · 1 August 2014 10:48

Hi Martin,

Thank you for the exhaustive explanation and good luck in the future development of the program.

science2002 · 21 December 2017 09:57

Hi there,
and thank you for the great PDF editor. I have the opposite problem to the one requested here.

Is it possible for Infix obfuscate the text when one tries to copy text from a PDF, by remapping the fonts? In other words, can Infix produce documents similar to the one linked above (http://glneurotech.com/docrepo/user-guides/Note_BioCaptureEEG_final.pdf)?

It would be enough (in my case at least) to obfuscate few letters. In fact I tried by simply changing a letter but (obviously) any change I made in the “Remap fonts…”, it appeared visible also in the PDF document, with the exception of spaces. The latter, in copying the text, were turned to “!”, by remapping the space character to “0” , while the viewable PDF was still shown correctly. Can be done it, with “a”, “e”, “o”, etc.

Thanks for any hint.

scrowfoot · 16 January 2018 09:35

Sorry for the delay in replying. In this case the best cause of action would be to turn the text into outlines. This removes any “real” text from the document by converting into into vectors. To do this please use the “Text->Create Outlines…” menu.

Simon.

science2002 · 20 January 2018 14:57

Many thanks Simon for the feedback.
I already tested what you suggested: great feature. Yet, there are two drawbacks as compared to the font remap method:

quality text will be rather degradeted. At least in my case: sometimes it is not even very readable.
PDF file size increases dramatically: 200kb file becomes almost 20Mb.

Instead, I was looking for a remap font method that by using some ascii input would be able to disguise basic text searches in the pdf (like Emails, etc.). For instance by changing just vowels, turning the PDF (seen) text John.Smith@gmail.com into a PDF (hidden and searchable) text unreadeable J201hn.Sm301th@gm104392l.c254m, o similar disguised text. I found around the web that this was possible, though with some programming. See for instance here:
http://spivey.oriel.ox.ac.uk/corner/Obfuscated_PDF#Fonts_in_PostScript_and_PDF

a) Anything like that possible in Infix Pro?
b) If a) would have a negative answer, I notice that if I remap a letter, say “a”, into “b”, Infix shows before saving the PDF:
i) the text still with “a” in the foreground,
ii) while if I copy the text (i.e. the text in the background) “a” is already turned into “b”.
When I save the PDF also the foreground text is “b”. Is it possible to save the PDF file at stage i), without the foreground change?
Thanks