Extracting Text from a Image

image On numerous occasions I have had a need to extract text from an image and have never found a way to be able to do so. Typically I want to get the text of an error message and while, sometimes, ctrl+c will copy the text to the clipboard, mostly it won’t.

I stumbled upon a neat solution the other day using a little know (to me) tool that comes as part of Microsoft Office. Microsoft Office Document Imaging is a tool to bring the paper and electronic worlds together by managing the scanning of documents. It also has another trick up its sleeve in the form of optical character recognition (OCR). So with a few steps it is possible to take a picture and extract the text from it.

The process has a couple of steps as follows:

1. firstly you need to convert your image to TIFF format – you can do this in Paint

2. Load the TIFF image in Document Imaging

3. On the tool bar click the icon with the eye to enable the OCR to read the text

4. Select the text you want from the image and copy it to the clipboard.

And that’s it. This is a really neat solution and, as far as I am concerned, a great find.

Does anyone else use Document Imaging and if so what do you use it for?

  • Thanks Neil, I didn’t know about it either and may come in handy.

    Burning Glass use document imaging to process text from images within CVs. Our software extracts and normalises the text from HR documents but more and more people are creating images in their CVs that contain e.g. contact details. The best product we’ve found (we require enteprise OCR) is ABBYY Finereader.