Google Expands OCR Capabilities for Document Scanning
Google is Now Offering Free OCR Services
for Scanned Documents. Google announced yesterday that there is a new feature available in Google Docs to allow users to import Scanned Documents. The feature, describes as "Convert Text from PDF or image files to Google Docs Documents," allows users to import a Scanned PDF or Image File (JPEG, GIF, or PNG).
There are still some questions that come up as to whether or not this new functionality indicates an intention by Google to broaden the scope of their Google Docs platform, as well as questions about how Google Docs' new OCR functionality works and the functionality that it provides. These questions include:
What OCR Engine is Google using in the Google Docs Platform?
The OCR Engine used by Google in this process is not immediately clear. Google does Sponsor an Open Source OCR Engine and Document Analysis Platform called OCRopus, but Google hasn't publicly acknowledged that this is the technology being used by any of their services, including Google Books or the new Google Docs OCR Functionality.
Does Google Docs OCR Work with TIF Files?
During our testing, we noticed that the OCR functionality didn't work for one of the most standard image formats that we find clients using, TIF Images. TIF, or TIFF (Tagged Image File Format), Images are widely considered an Industry Standard for Scanning Paper Documents, so I found the absence of this functionality to be a surprising.
For those looking to convert TIF images, you may want to use Adobe Acrobat or another utility to convert TIF files to PDF, or check out ABBYY FineReader Online. For organizations looking to convert large volumes of information, I would recommend using an alternate document capture software for converting your images to OCR.
How Well does Google Docs OCR work?
The technology is still a bit new, as it was only released yesterday, but ars technica did some testing and was nice enough to summarize their experiences. Their results were about the same as the results we experienced during our testing, and they summarized their findings: "There are still cases where this OCR would be better than nothing." Not quite the ringing endorsement that you'd hope to see attached to a Google Service, but the offering is still new.
Because of the way the import mechanism is configured, Google Docs OCR may not be the best document scanning solution for every business case, especially if you're looking to convert a large volume of paper documents to digital images. For ad-hoc, low volume OCR requirements however, the Google Docs OCR functionality serves as a solid utility for converting paper into useable text.
Have you tried the Google Docs OCR tool yet? What have your experiences been? Have you had better success with other services or software? Share your experiences in the Comments!