Testing Google’s OCR


in Geek stuff, Internet matters, Photography

Previously, I briefly mentioned the optical character recognition (OCR) technology within Google Docs. I decided to test it in the relatively challenging circumstance of converting photographs of pages from a book into text:

As you can see, the image to text conversion isn’t perfect. Indeed, it doesn’t work terribly well in the conditions to which I subjected it. Substantial strings of text are missing, and there are many errors.

Probably, the system would work better if the pages had been perfectly flat and evenly illuminated, and if my camera had been perfectly parallel to the page.

{ 1 comment… read it below or add one }

R.K. December 21, 2010 at 12:56 pm

I am surprised it works so poorly. How did they get so much into Google Books?

Leave a Comment

Previous post:

Next post: