Testing Google’s OCR

2010-12-19

in Geek stuff, Internet matters, Photography

Previously, I briefly mentioned the optical character recognition (OCR) technology within Google Docs. I decided to test it in the relatively challenging circumstance of converting photographs of pages from a book into text:

As you can see, the image to text conversion isn’t perfect. Indeed, it doesn’t work terribly well in the conditions to which I subjected it. Substantial strings of text are missing, and there are many errors.

Probably, the system would work better if the pages had been perfectly flat and evenly illuminated, and if my camera had been perfectly parallel to the page.

Report a typo or inaccuracy

{ 1 comment… read it below or add one }

R.K. December 21, 2010 at 12:56 pm

I am surprised it works so poorly. How did they get so much into Google Books?

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Previous post:

Next post: