Soldato
- Joined
- 22 Jan 2014
- Posts
- 3,878
Afternoon forumites,
Is anyone aware of software that can take a scanned copy of something (or deal directly with the physical document) and extract the text and numerical data from those scanned copies into a database?
The biggest issue is that whilst the fields on the form are all labelled the same way, the locations of these fields are not the same on the forms so it's not like it's a machine marked test or suchlike where all the copies being scanned are identical in layout. The forms cannot be standardised due to the processes used by those completing the forms not being standardised, unfortunately. This means that whilst the data are all input using word processing software or similar (no hand writing), the fonts, text sizes and locations vary.
It essentially needs to be able to find the named fields on the form and extract the data associated with that field, and input that data into the relevant field in the database. There will be human interaction to ensure what is being input is correct as we are not dealing with thousands a day, we're talking about maybe 20k pages per year.
Any suggestions at all would be very much appreciated.
Hugh
Is anyone aware of software that can take a scanned copy of something (or deal directly with the physical document) and extract the text and numerical data from those scanned copies into a database?
The biggest issue is that whilst the fields on the form are all labelled the same way, the locations of these fields are not the same on the forms so it's not like it's a machine marked test or suchlike where all the copies being scanned are identical in layout. The forms cannot be standardised due to the processes used by those completing the forms not being standardised, unfortunately. This means that whilst the data are all input using word processing software or similar (no hand writing), the fonts, text sizes and locations vary.
It essentially needs to be able to find the named fields on the form and extract the data associated with that field, and input that data into the relevant field in the database. There will be human interaction to ensure what is being input is correct as we are not dealing with thousands a day, we're talking about maybe 20k pages per year.
Any suggestions at all would be very much appreciated.
Hugh
Last edited: