With 25 years of experience, Lexmark Document Filters is the engine that powers Perceptive Search and some of the leading Big Data, eDiscovery, DLP, email archival, content management, business intelligence and intelligent capture products on the market — offering the most advanced and proven alternative to other OEM and open-source solutions. Document Filters is an excellent alternative to generic file readers and file parsers, commercial solutions like Oracle’s Outside In (fka Stellent) and HP/Autonomy’s Keyview (fka Verity), and provide superior functionality to open source solutions like iFilters and the Apache TIKA libraries.
Because of an estimated 80 percent of content living outside of a structured data environment, information-driven software solutions need a way to unlock unstructured files like Word, PowerPoint and PDF, pull out the hidden and visible content that’s inside, and incorporate the data into their systems in a variety of different ways and formats.
Document Filters uniquely fills that void, offering document conversion software SDK that enables deep inspection of unstructured content and data, and then transforms them into usable information—including output to whichever format is needed for near native document viewing.
This front-end ingestion technology enables file identification, content & metadata extraction, as well as file conversion capabilities to enable doc-to-HTML conversion, doc-to-PDF conversion, among a variety of other output formats as well as enables content redaction in documents
- OCR document images and text, enabling searchable text and high definition output files
- Identify, extract and transform every document, email, legacy, archive and container format you need — Word, Excel, PowerPoint, PDF, AutoCAD, ZIPs, MSGs, Visio and hundreds more
- Analyze all text and metadata in a file with deep-inspection capability that even uncovers previously hidden information, such as tracked changes, comments, notes, annotations and embedded web links
- Determine the true nature of content, ensuring that source information is accurately identified for filtering without relying on file-name extensions
- Seamlessly render, manipulate and view content in high definition (HD) without the need for additional components like ActiveX
- Easily export content for further usage elsewhere by converting files into text, HTML, structured XML, paginated HTML, multipage TIFFs, images (JPG, BMP, PNG), searchable PDFs, and custom formats
- Replicate original files through a Layout Engine that maps out near pixel-by-pixel coordinates of text, images and objects (instead of relying on simple character positioning)
- Eliminate the need for a third-party image manipulation package, applying precise redaction marks, annotations, Bates stamps and watermarks to content during output
- Render files at the page level and control the size of output and other variables, making it easy to create thumbnails or convert files with or without headers and footers
- Deploy across 20 platforms including Windows, Mac OSX, Linux, Solaris, FreeBSD, HP-UX and AIX — plus full support of character sets and encodings, such as Unicode
- Benefit from industry-leading extraction and throughput speeds, processing content faster with greater stability
- Embed Document Filters quickly and cost-effectively into your product with our flexible APIs for C, C++, COM, .NET and Java