ocrImage - OCR an image - with source code

The OCR library that ships as part of macOS is surprisingly good, and surprisingly easy to use.

Here is a simple command-line tool that given an image of a page (screenshots also work) in about one second produces the text on that page.

To use:


ocrImage sample.png > sample.txt

Where sample.png is an image file. It's a .png, but .jpg, or some other format that NSImage knows how to read will also work. It's a page from a Piers Anthony novel.

sample.png looks like this: (It's actually 616 × 878 pixels.)

It takes about one second to run on my Mac mini.

And the output from running ocrImage is:



12

Juxtaposition

But later the situation eased. "They have saved him," Serrilryan reported. "He is weak, but survives."

Clef's own tension abated. ""T am exceedingly glad to hear that. He lent me the Platinum Flute, and for this marvelous instrument I would lay down my life. It was the sight of it that brought me here, though I am wary of the office it portends."

"Aye.'

In the afternoon they heard a sudden clamor. Something was fluttering, squawking, and screeching. The sounds were hideous, in sharp contrast to the pleasure of the terrain.

Serrilryan's canine lip curled. Quickly she shifted to human form. "Beast birds! Needs must we hide."

But it was not to be. The creatures had winded them, and the pursuit was on. "Let not their filthy claws touch thee," the werebitch warned. "The scratches will fester into gangrene.' She changed back to canine form and stood guarding him, teeth bared.

The horde burst upon them. They seemed to be large birds--but their faces were those of ferocious women. Clef's platinum rapier was in his hand, but he hesitated to use it against these part-human creatures. Harpies -that was what they were.

They gave him little opportunity to consider. Three of them flew at his head, discolored talons extended. "Kill! Kill!" they screamed. The smell was appalling.

Serrilryan leaped, her teeth catching the grimy underbelly of one bird. Greasy feathers fell out as the creature emitted a shriek of amazing ugliness. Immediately the other two pounced on the wolf, and two more swooped down from above.

Clef's misgivings were abruptly submerged by the need



It isn't perfect, but it's pretty good considering how tiny the source code is to use it. Roughly 200 lines of code.


The edge of the knife, the part of the source code that does the work, is the pair of lines:


     BKSOCRBoss *boss = [[BKSOCRBoss alloc] init];
     NSArray<BKSTextPiece *> *pieces = [boss recognizeImageURL:url error:&error];

where BKSOCRBoss is my wrapper for the vision library, included in the source code as BKSOCRBoss.m, url is the file URL of the image file, and BKSTextPiece is a simple Objective-C object that holds a line of recognized text and the graphic coordinates of where that line is located in the image.

There's a little bit of code in main to guess where the paragraphs start.

BKSOCRBoss is also short. It creates a VNRecognizeTextRequest and tells a VNImageRequestHandler to performRequests: . All from Apple's Vision framework. BKSOCRBoss does all of this on the main thread because it is a command line tool. An interactive app would do its work off the main thread.


I did experiment in my interactive app with trying to OCR multiple pages in parallel using NSOperation in a concurrent queue. The fastest I could make BKSOCRBoss go was only about twice as fast as single-threaded. My guess is that the Vision framework is using resources on the GPU and trying to spawn more threads just makes them wait until the GPU is available.

The OCR feature of the Vision framework is also in iOS from iOS 13 and up.

There's a free mac app in the Apple App Store: OwlOCR that does much of what my ocrImage does, but with mine you get the source code to use in your own projects. For example it would be easy to write an app that reads aloud what the camera sees.


ocrImage.zip - the compiled code for the command line app. [9 KB]

ocrImage_src.zip - the complete source code and Xcode project. [11 KB]

ocrImage_src - the complete Objective-C source code as a web page


ocrImage is open source under the Apache license.

Version 1.0 - initial release

Version 1.0.1 - added complete source as a web page

Requires macOS 10.15 (Catalina) or newer.

Other Source Code by David Phillip Oster

Page last modified 4/23/2020