Most online and offline documents can be categorized as semi-structured data. They are not immediately processable by machines. Initially, template based software attempted to bridge this gap and allow companies to automatically extract data from documents. However, templates enable limited levels of automation and are hard to maintain. Since the last few years, vendors have built machine learning models using millions of sample documents. These models are able to automatically extract data from documents with a high accuracy rate

To be categorized as a document capture software, a product must be able to

  • automatically extract data out of a specific type (e.g. invoice) or various different types of documents.
  • provide a confidence for the extracted data so users can decide to auto-process or manually validate the software output
  • provide a User Interface (UI) for manually validating and correcting extracted data

