Feeding Infix xlm into a CAT Tool

Hello Infix

You know that I normally use Infix to translate pdf files inside CAT tools (MemQ)

Well, I’m sorry to have to post such a long question, but for every option that MemoQ offers me when I feed in your xlm file, I wonder if you could make a comment where possible as to which option I should chose if you happen to know

Here are some of the options (just the first tab in MemoQ) that I’m invited to decide but never quite know how to respond:

DTD or namespace URI: Specifies the name of the DTD file or the namespace in the XML schema that will be associated with the format. Associating a DTD or an XML namespace with the format allows memoQ to automatically select the format configuration when importing documents that contain a reference to the same DTD file or namespace.

Import XML comments: If you enable this option, XML comments will be imported from documents as translatable text.

Detect encoding if possible: If enabled, memoQ will attempt to detect the encoding based on information in the document to be imported.

Input encoding if not specified: Here you can define what encoding memoQ will use when importing a document without an encoding declaration.

Output encoding: Defines the encoding of the translated documents memoQ exports. You can choose a specific encoding or use the same encoding as the original.

Write BOM for Unicode-encoded files at export: Check this if you want memoQ to write a byte-order mark at the beginning of the exported file. Some content management systems require this.

Normalize whitespace by default: If enabled, memoQ will convert sequences of tab, space or newline characters into a single space character. In addition, sequences of whitespace at the beginning and end of elements will be trimmed. Normalization is recommended when the XML document uses whitespace characters for readability only. This setting adjusts the default behavior for the XML format, but normalization can also be adjusted on the tag level.

Observe xml:space attribute in file: XML documents can contain attributes that prescribe whether or not whitespace should be normalized in a specific element. If this option is enabled, memoQ will follow such instructions in the document. If it is disabled, memoQ will treat whitespace according to its global and tag level settings.

· Break segments at newlines if whitespace is preserved: Check this check box if you want memoQ to treat newline characters as segment boundaries. Text in XML files can contain newline characters if you choose to preserve whitespace. memoQ preserves whitespace when the Normalize whitespace by default check box is turned off. In this case, newline characters supposedly have a meaning in the text, and most of the time each line should be translated as a separate segment. The general advice is, turn on Break segments at newlines… if you choose to turn off Normalize whitespace by default.

Import processing instructions as inline tags: Check this check box to import processing instructions as inline tags. Processing instructions like <$ are now represented with uninterpreted tags. This check box is enabled by default. XML processing instructions were previously handled as {} tags, now they have a special tag: mq:pi.

Restore custom entities in export: Check this check box to export all characters included in custom entity rules in their custom entity format. (In our example, ‘©’ entered into the target side will be exported as ‘&copyright;’.) Uncheck this check box to export all characters included in custom entity rules as Unicode characters. (In our example, ‘©’ entered into the target side will be exported as ‘©’.)

Log warnings during import: If this option is enabled, memoQ will create a list of technical irregularities encountered in the source document during import, and save that list into a text file.

XSLT file text box: For XML files, memoQ by default creates a preview that shows the XML code of the file. In such a preview, memoQ uses some colors and formatting to make the code easy to read, and to show which parts were imported for translation, which are marked as non-translatable, etc. In this text box, you can specify another XML style sheet that will be used to create the preview. To locate and specify the XSLT files, click the … button next to the text box.

I’m not that familiar with the MemoQ options but I think I can answer you queries on the basis of the information you have provided

DTD or namespace URI: Specifies the name of the DTD file or the namespace in the XML schema that will be associated with the format. Associating a DTD or an XML namespace with the format allows memoQ to automatically select the format configuration when importing documents that contain a reference to the same DTD file or namespace.

*We do not supply a DTD or namespace URI for the XML exported for translation from Infix.

Import XML comments: If you enable this option, XML comments will be imported from documents as translatable text.

*This should be turned off. There will be no text that needs translating in comments

Detect encoding if possible: If enabled, memoQ will attempt to detect the encoding based on information in the document to be imported.

*This should be turned on. The XML will indicate that it is UTF8 encoded in the header

Input encoding if not specified: Here you can define what encoding memoQ will use when importing a document without an encoding declaration.

*This should not be necessary but set it to UTF8

Output encoding: Defines the encoding of the translated documents memoQ exports. You can choose a specific encoding or use the same encoding as the original.

*Use the same encoding as the original

Write BOM for Unicode-encoded files at export: Check this if you want memoQ to write a byte-order mark at the beginning of the exported file. Some content management systems require this.

*Not necessary but won’t cause a problem if turned on

Normalize whitespace by default: If enabled, memoQ will convert sequences of tab, space or newline characters into a single space character. In addition, sequences of whitespace at the beginning and end of elements will be trimmed. Normalization is recommended when the XML document uses whitespace characters for readability only. This setting adjusts the default behavior for the XML format, but normalization can also be adjusted on the tag level.

*This should be turned off. Spacing can sometimes be important.

Observe xml:space attribute in file: XML documents can contain attributes that prescribe whether or not whitespace should be normalized in a specific element. If this option is enabled, memoQ will follow such instructions in the document. If it is disabled, memoQ will treat whitespace according to its global and tag level settings.

  • Not needed. xml:space attribute is not used in Infix exported XML.

· Break segments at newlines if whitespace is preserved: Check this check box if you want memoQ to treat newline characters as segment boundaries. Text in XML files can contain newline characters if you choose to preserve whitespace. memoQ preserves whitespace when the Normalize whitespace by default check box is turned off. In this case, newline characters supposedly have a meaning in the text, and most of the time each line should be translated as a separate segment. The general advice is, turn on Break segments at newlines… if you choose to turn off Normalize whitespace by default.

  • Should be turned on.

Import processing instructions as inline tags: Check this check box to import processing instructions as inline tags. Processing instructions like <$ are now represented with uninterpreted tags. This check box is enabled by default. XML processing instructions were previously handled as {} tags, now they have a special tag: mq:pi.

  • Processing instructions are not contained in the Infix XML so this setting isn’t required.

Restore custom entities in export: Check this check box to export all characters included in custom entity rules in their custom entity format. (In our example, ‘©’ entered into the target side will be exported as ‘&copyright;’.) Uncheck this check box to export all characters included in custom entity rules as Unicode characters. (In our example, ‘©’ entered into the target side will be exported as ‘©’.)

*Uncheck this. Infix does not understand all custom entities so the unicode character is better.

Log warnings during import: If this option is enabled, memoQ will create a list of technical irregularities encountered in the source document during import, and save that list into a text file.

*Up to you.

XSLT file text box: For XML files, memoQ by default creates a preview that shows the XML code of the file. In such a preview, memoQ uses some colors and formatting to make the code easy to read, and to show which parts were imported for translation, which are marked as non-translatable, etc. In this text box, you can specify another XML style sheet that will be used to create the preview. To locate and specify the XSLT files, click the … button next to the text box.

*We don’t supply a XSLT file for Infix exported XML.

Simon

You are a miracle worker.

I may have a second batch of questions for you but once done, I will create an xlm filter configuration for you in MemoQ and sent it to you.

This means that any new user of Infix who happens to have MemoQ as their CAT Tool could import the filter and apply it to Infix xlm without worrying about the settings being right or not.

I’ll also try to push Infix again on MemoQ and upload the filter for users who want to try out Infix in MemoQ

Regards

SafeTex

Hello

Here is another tab that needs to be configured in MemoQ and the answers may depend on the contents of the pdf to be translated.

Can you give any pointers or indications ?

Handled tags: This list indicates all the tags you added to the XML format configuration with specified properties.

Note: The type and properties of handled tags are indicated by abbreviations in the Info column of the Handled tags list. Tag types are: Str stands for structural; In for inline; NT for non-translated; and Req for required. Whitespace handling options: Inh stands for inherit, Pres for preserve, and Norm for normalize. Context handling and commenting options: Ctxt signifies that content is imported as context ID, and Com signifies that content is imported as comment. All of these types and options are explained below.

Inline: Select this option to specify that the tag selected in the Handled tags list is inline. Inline tags represent markup that is imported inside segments, and is displayed as inline tags. (For more information on inline tags, refer to Formatting Tags. In other tools, inline tags are also referred to as internal.) If this option is not enabled, memoQ will handle the tag as structural. Structural tags mark elements that are blocks of content for translation. Being delimiters, structural tags never appear within text for translation after import. In other tools, structural tags are also referred to as external.

Note: In our example, it is recommended to specify the ref and img tags as inline (because they appear inside sentences), and all the others as structural.

Not translated: Select this option to specify that the tag selected in the Handled tags list represents non-translatable text. These portions of text will not be imported for translation.

Note: If you specify that an element is non-translated, the contents of its child elements will not be imported either. Therefore, make sure you do not set elements like Body or Main to be non-translated.

Note: If an inline tag is defined as non-translated, all of its content and children will be imported into a single inline tag.

Required: Check this option to specify that the tag selected in the Handled tags list or entered into the field under the Handled tags list is required. Required tags are special inline tags that must be kept in the translation if present in the source segment. memoQ enforces this condition and displays an error sign if a required inline tag is not copied to the target side.

Whitespace handling: Use this option to specify how whitespace will be handled in the text content of the element. Inherit means that the element will receive the same whitespace handling setting as the parent element. The root element receives the default setting specified in the General tab. Preserve means that all whitespace will be retained and imported into the translation document. Normalize means that sequences of whitespace characters will be replaced by a single space character.

Tag content is context ID for siblings: If this option is enabled, the content of the element will be used as the context identifier of the subsequent segment imported from the elements that are at the same level in the hierarchy.

This context ID will not be applied to all segments imported from the same level of the hierarchy. Instead, memoQ will use the context ID only for the next suitable segment. Other siblings will remain without a context ID.

Tag content is comment for siblings: If this option is enabled, the content of the element will be used as the comment of the segment(s) imported from the elements that are at the same level in the hierarchy.

As I mentioned I not an MemoQ expert but I think the following tags need to be Inline

span
strong
em

In fact any tag that is a child of the

tag should be treated as inline I think.

Thanks, Simon.