Nolan Business Solutions recently asked John Stinchcombe, a Business Development Director from our partner Spigraph UK, to write a blog piece around his view on how today's Automated Invoice Processing systems differ from Invoice OCR (Optical Character Recognition) systems of the past.

This blog piece goes into detail about the history of the AP landscape, the newer systems that are now in place, and the positives and negatives of these different systems depending on your business needs.




In a world of increasing automation and sophisticated supply chains as well as the inevitable pressure on cash and spend management, it’s no surprise that AP Invoice Processing falls under the spotlight. This is an area with proven benefits for improving the Finance team’s Operational Efficiency. It’s become clear that improved administration performance in this area delivers significant benefits to other groups across the organisation.

Automated AP Invoice Processing isn’t new as solutions of this type have existed for the past couple of decades. The purpose of this blog is to consider the evolution of these solutions and compare them to contemporary systems.

However, before diving into the subject, it’s worth noting that some early Invoice Automation solutions delivered less than anticipated benefits and as a result, invoice automation systems are sometimes viewed with caution or scepticism. This issue will hopefully be addressed in this document.

Additionally, this blog considers why the differences in the way that invoices are submitted has guided the evolution of solutions as much as advances in technology.




If we consider the AP invoice landscape of the late 1990s, two dominant types of invoice submission mechanisms were in play:

  • “Machine readable” EDI (Electronic Data Interchange) high volume invoice scenarios
  • “Human readable” traditional paper invoices, typically in paper form

In the late 1990s, some organisations started to submit scanned and PDF invoices as email attachments as an alternative to sending paper. However, due to concerns over the reliability of the process, poor access to networks, and the risk of the customer denying receipt of the invoice, few suppliers adopted this approach.

EDI systems provided many AP teams with an insight into the future but for reasons of high cost, highly specialised interfaces, and relatively low adoption levels meant this approach remained a non-starter for most organisations. The nature of these solutions meant that EDI systems had high entry costs and naturally suppliers would be reluctant to invest (irrespective of cost) unless their customers would participate and invest on their side too. It’s fair to summarise that most EDI users have little choice in the matter as the sheer number of transactions would make it uneconomic for manual processing.

Putting EDI to one side, for the rest of us stuck with the need to process traditional invoices, early invoice “automation” systems virtualised some of the steps and were aimed at streamlining invoice processing on the customer’s side, i.e. recipients of invoices. Systems appeared that enabled customers to link an inbound paper supplier invoice with its corresponding purchase order via a uniquely coded bar-code label. Once the bar-code label was affixed, the paper invoice was scanned, and the extracted bar-code reference associated the scanned invoice with the purchase order. These systems were great for early invoice approval workflow and simplified the subsequent retrieval of the invoice.

However, these systems could not avoid the need for an AP team member to manually check the invoice for completeness and accuracy. Nevertheless, these early “imaging” systems can be considered an essential first digital step on the journey to automation and operational efficiency.

The next phase was to introduce OCR technology to attempt to read off structured data-fields from the invoice. Early systems required that each invoice layout was defined in the system so that the OCR process to convert converted the appropriate “image snippets” to usable data. Because of the initial setup and on-going overhead of this so-called templating process, organisations restricted their scope to the largest group of suppliers by invoice volume. The size of this supplier group was typically in the range of 20-50 suppliers, and to ensure viability the aggregate number of invoices from this supplier group ideally needed to equate to over 50% of the total invoices received. So, the effectiveness of this approach depended on the profile of suppliers, i.e. a relatively small group of suppliers representing a large percentage of overall invoices worked well. The Achilles heel of the system was changing supplier dynamics (i.e. new suppliers and invoice volumes from suppliers) and inevitable changing invoice layouts that rendered the template useless. In short, this approach only suited those organisations with an appropriate supplier distribution – however, the important factor of optimising the setup for the largest suppliers (by total volumes of invoices) was widely understood.

Invoicing processing solutions evolved to use OCR technology in a different way in order to address the overhead of one template per invoice. The next development phase was to abandon the approach of templates and to use OCR to create a full data-representation from the entire scanned invoice. The resultant full text layer included the so-called “static” background text along with the “dynamic” invoice data. New field locator algorithms would use implicit invoice rules and combined with meta-data from the OCR process such as the x, y coordinates of the extracted entities would attempt to “parse” the structured invoice data-set from the invoice.

These rules-based systems worked well but included couple of pitfalls – firstly, their dependence on OCR generated data meant that they were susceptible to inaccurate OCR results from poor quality original invoices. After all, these invoices were never intended for machine reading and traditional AP teams simply carried on when presented with poor-quality, difficult to read, dot- matrix print invoices.

Fortunately, OCR systems do “complain”. Through their own qualitative assessment of the conversion process, OCR software can be configured to mark extracted individual characters and entire data-fields as “erroneous” or “suspect”. However, in some marginal cases, OCR engines deliver “false positive” results and that was another issue. Invariably, more unorthodox invoice layouts meant that the data-parsing algorithms were unable to identify and locate the data fields and so presented the AP teams with blank fields and “manual” intervention was required to complete the data-set.

There’s a lot more detail to these topics but the upshot is that solutions that were relied upon OCR generate data do have their limitations.

Advanced image processing solutions such as Kofax VRS went some way to mitigate poor invoice quality and significantly improved OCR results. However, even with the most sophisticated “image-perfection” algorithms, if the data is illegible, obliterated, or damaged to the extent a human operator would struggle to read it, then so will a machine. This combination of very poor- quality invoices and OCR was one the largest problems for automation in the era of scanning paper invoices. There are many anecdotes about the levels of work that AP teams needed to perform in order to “help” Invoice Automation systems work. In some cases, this level of work was equal to or even greater than the work needed to process invoices manually.

There is an upside though. As the data-parsing and locator algorithms continued to be developed, it was an obvious step to apply these to “born digital” invoice formats, i.e. those invoices rendered as say, PDF, and that had never been printed and scanned. Many accounting packages began to permit “invoice to email” where the invoice was attached as a generic or native PDF or HTML document. As this approach eliminated all problems with OCR accuracy, the overall automation rates improved dramatically. In these cases, the critical aspect for effectiveness came down to the veracity of the automated data extraction and verification processes.

The next phase in the evolution was somewhat inevitable introduction of machine learning. This was important as the only remaining problem with the largely rules-based data locator/parsing algorithm approach was the scenario of non-standard invoices that were not covered by the algorithms. Machine learning picked up the failures (that typically failed on a small subset of fields only). However, in addition to improvements in productivity of users (having to deal with less exceptions) was the reduction in system setup and maintenance time.



Changed Invoice Submission Practices

There are three main approaches that suppliers use today:

  • Traditional format invoices submitted as paper or “born digital” electronic formats such as PDF, HTML etc.
  • EDI format invoices that conform to original EDI standards (and many sub-standards and dialects) and that use EDI mechanisms and EDI is very much an exclusive “club”, these systems do persist due to the enormous investments and the disruption of switching.
  • e-Invoice systems that typically operate via Invoice Portals and require that customers either install software to receive these directly into their finance systems. Those customers who “won’t or can’t”, can elect to receive an emailed link to the portal from where they can be (manually) downloaded. As if to endorse the adage, “the good thing about standards is that there are so many to choose from”, there really are a lot of competing standards which may be one of the reasons for relatively low adoption rates for e-Invoicing.

It’s worth stating the obvious that most, if not all, organisations receive traditional invoices; even those invested in EDI and e-Invoicing, are unlikely to have 100% coverage and so still have the problem of processing invoices designed for “human reading” and manual work-flow. In fact, organisations that have adopted EDI or e-Invoicing have seen the “promised land” and are usually pretty fed up with the notion of having to process traditional invoices by hand and effectively spoon feed invoices into their finance systems.



Processing Traditional Invoices – Remaining Core Problems

However, as good as this evolution is, there are four remaining challenges associated with traditional invoices irrespective of whether submitted as paper or in native, born-digital formats.

Firstly, the originator of the invoice, the supplier, doesn’t provide a convenient, formatted, structured data-file in a readily ingestible form. Simply put, the invoice’s structured data is scattered and buried inside the invoice and the customer, needs to process the invoice with this data manually read from the invoice or first unravel it so that the process can be automated.

Secondly, suppliers make mistakes, errors, and omissions, and the onus is on the customer to check and cross-check that the invoice is an accurate representation of the goods or services ordered and supplied.

“Loose” processes that are based on know-how and tacit information are open to short cuts and non-compliance that are excused on the basis of getting the job done – this may be as simple as using the first instance on the vendor master file for the legal entity of the supplier (because it’s “always” that legal entity) to OK’ing the fact that a GRN isn’t present (because that department takes time to book goods in and the supplier is generally reliable) or simply, processing the same invoice twice – the first in paper form and then a week later as an emailed attachment.

Finally, not all invoices are legitimate or are from legitimate suppliers. Although this should be picked up by the AP team, where sophisticated attempts to defraud attempted while AP teams are heavily loaded, it’s not unknown for rogue supplier’s invoices to pass through checks and be paid.



Current State of Play

Contemporary Automated Invoice Processing solutions such as Kofax Readsoft Online benefit from a perfect timing of factors:

  • Data extraction algorithms and methods have probably reached the pinnacle of performance and Today, very few invoice layouts cannot be processed semi automatically. Full or 100% automation is possible but this typically requires invoices to conform to strict guidelines regarding data accuracy.
  • Machine learning or “cognitive” capture capabilities complete the picture for those invoices that don’t adhere to a conventional layout that rules-based data-extraction algorithms can process.
  • There’s been a significant switch from paper (aka “snail mail”) to email as the invoice submission mechanism of choice; as long as these invoices are “born digital” rather than scanned copies, this eliminates the need for scanning and OCR
  • E-Invoicing has expanded into projects that would have been fulfilled by EDI as well as higher volume scenarios for traditional invoicing methods. Customers receiving e- Invoices though a traditional channel can process many of these using automation techniques described above.
  • Traditional invoices are likely to confirm to a ubiquitous “born-digital” format such as PDF or HTML; these provide all the same “clean” unstructured data without the need for OCR.
  • Finance and ERP systems now include more uniform and consistent interfaces to allow extract supplier master and PO line data to be readily exported out for verification purposes. Similarly, these Finance/ERP systems include interfaces and mechanisms to allow the import or upload of verified invoice
  • Driven in part by the increase in awareness and attention to RPA aka “software robots”, the efficiency of knowledge workers such as AP teams has come into sharper focus. Significantly reduced dependency of hands-on keyboards and process interaction overall; this is augmented with more streamlined ergonomic interfaces that eliminate the need to switch between multiple applications screens at one Good talent retention strategies require that highly skilled personnel are shielded from repetitive, non-value-add tasks.
  • Deployment without heavy implementation or setup via cost-effective SaaS means lower barrier to entry costs; this in turn will widen the appeal of these systems which drives further




By a happy coincidence, although originally designed to work on scanned documents, data extraction capabilities in solutions such as Readsoft Online work very well on electronic, “born- digital” documents. Invoice submission using born-digital invoices as attachments to email overtook traditional paper format invoices as the dominant method a couple of years ago. By necessity, the Covid-19 pandemic has further extended the popularity and ubiquity of this approach.

It’s likely that e-Invoice solutions will continue to be adopted, albeit at a relatively slow pace as currently e-Invoice systems aren’t the panacea for all organisations. Until some very large vendors collaborate on common standards that are easy and cost effective for all customers to use then e-Invoice methods will continue as niche solutions rather than mainstream.

As organisations continue to work towards improved Operational Efficiency and high Administration performance, they will adopt systems like Readsoft Online handle traditional, “human readable” invoices, as well as the most prolific of e-Invoice formats too.

For this reason, combined with optional AP workflow capabilities, the use of these systems is expected to continue to grow significantly over the next few years.


For more information on this or any of our other services, contact our experts today via our website. Alternatively, email or call us on 01252 811 663.
var _glc =_glc || []; _glc.push('all_ag9zfmNsaWNrZGVza2NoYXRyDwsSBXVzZXJzGMTP7pYDDA'); var glcpath = (('https:' == document.location.protocol) ? '' : ''); var glcp = (('https:' == document.location.protocol) ? 'https://' : 'http://'); var glcspt = document.createElement('script'); glcspt.type = 'text/javascript'; glcspt.async = true; glcspt.src = glcpath + 'livechat-new.js'; var s = document.getElementsByTagName('script')[0];s.parentNode.insertBefore(glcspt, s);