
Argus comes with a variety of example configurations covering most of the capabilities of the software. Useful in their own right they also serve as a convenient starting point for creating your own configurations.
Applications of Argus
Argus is a very powerful and flexible PDF document conversion and extraction tool. It is controlled via configuration files. A number of common applications of Argus are described in the tables below.
For each type of conversion/extraction application the appropriate configuration file(s) are identified with a brief description of each. Each configuration file path is also a link to page which gives more detailed information: sample input and output data, the command line used to produce the output and information regarding important config file settings that you are most likely to want to modify.
These descriptions and configuration files are by no means a comprehensive list of what Argus can do; they just serve as templates so that it should not be neccessary to write a configuration file from scratch.
Whole Page Conversion
Config Files Description htmlConfigs/articles.cfg Output PDF articles - simple html, one file per article with images. htmlConfigs/css1.cfg HTML using Cascading Style Sheet v1. Fonts and sizes are retained using styles but layout is not. htmlConfigs/linkedPages.cfg Generates simple HTML output with navigation links at the top and bottom of each page. htmlConfigs/reflow.cfg Generates simple HTML output as a single file without preserving layout. htmlConfigs/super.cfg Generates quite complicated HTML retaining layout using Cascading Style Sheet v2. htmlConfigs/tabulated.cfg Generates simple HTML where page is retained as best as possible by positioning the text in a table covering the whole page. htmlConfigs/table.cfg Example configuration for PDF documents which contain tables of data which must be output in HTML format. Config Files Description rtfConfigs/layout.cfg Converts to RTF readable by Microsoft Word and other word processors. Layout is preserved but no images are included. rtfConfigs/links.cfg Converts to RTF readable by Microsoft Word and other word processors. Hyperlinks are preserved but no images are included and layout is not preserved. rtfConfigs/reflow.cfg Converts to RTF readable by Microsoft Word and other word processors. No images are included and layout is not preserved. rtfConfigs/table.cfg Example configuration for PDF documents which contain tables of data which must be output in RTF format. Page Thumbnails
Config Files Description imageConfigs/pageThumbs.cfg Produces a fully rendered JPEG page thumbnail for each page Multi-page TIFF
Config Files Description imageConfigs/renderMultiTiff.cfg Output format is multi-page tiff which is generally most useful for archive or fax applications. Structured PDF to XML/HTML
Config Files Description structConfigs/structHTML.cfg Output HTML based on PDF structure tags. A single HTML file with images is produced. structConfigs/structXML.cfg Output XML based on PDF structure tags. A single XML file without images is produced. Vector EPS
Config Files Description imageConfigs/vectorEPS.cfg Converts PDF to vector EPS producing a single EPS file for each page.
Image Extraction
Config Files Description imageConfigs/tiff.cfg Extracts images from the PDF and saves each image as an individual TIFF file. imageConfigs/jpeg.cfg Extracts images from the PDF and saves each image as an individual JPEG file. imageConfigs/png.cfg Extracts images from the PDF and saves each image as an individual PNG file. imageConfigs/bmp.cfg Extracts images from the PDF and saves each image as an individual BMP file. imageConfigs/rasterEPS.cfg Extracts images from the PDF and saves each image as an individual Photoshop-compatible raster-EPS file. imageConfigs/opitiff.cfg Extracts images and OPI dictionary data from the PDF; output image info in text doc and images as TIFFs. imageConfigs/opihtml.cfg Extracts images and OPI dictionary data from the PDF; output image info in HTML doc and images as JPEGs.
- Text Extraction
Config Files Description textConfigs/text.cfg Extracts text from the PDF and saves output as plain (WinAnsii) text in a single file. - Extract text from the PDF and save output as HTML. See HTML configurations above - set Image Output config setting to false to produce and HTML file which just includes text. - Extract text from the PDF and save output as RTF. See RTF configurations above. - Export text using articles (threads) to dictate text order. See HTML article sample configuration - set Image Output config setting to false to produce and HTML file which just includes text. textConfigs/gridText.cfg Example configuration for PDF documents which contain tables of data which must be output in plain text format. charMaps/htmlSimple.cmap This is a config include file. It defines a CHAR MAP to map from Unicode bytecodes to HTML entities. Used by reflow.cfg and table.cfg. fontMaps/uniSymbol.fmap This is a config include file. It defines a FONT MAP to map from Unicode bytecodes for Symbol font characters to Symbol font bytecodes. Used by several HTML and RTF configs.
![]() | Back to previous page |
Permissions and trademarks | Licence agreements | Site design by sponge
