The OCR library that ships as part of macOS is surprisingly good, and surprisingly easy to use.
Here is a simple command-line tool that given an image of a page (screenshots also work) in about one second produces the text on that page.
To use:
ocrImage sample.png > sample.txt
Where sample.png
is an image file. It's a .png, but .jpg, or some other format that
NSImage
knows how to read will also work. It's a page from a Piers Anthony novel.
sample.png
looks like this: (It's actually 616 × 878 pixels.)
It takes about one second to run on my Mac mini.
And the output from running ocrImage
is:
12
Juxtaposition
But later the situation eased. "They have saved him," Serrilryan reported. "He is weak, but survives."
Clef's own tension abated. ""T am exceedingly glad to hear that. He lent me the Platinum Flute, and for this marvelous instrument I would lay down my life. It was the sight of it that brought me here, though I am wary of the office it portends."
"Aye.'
In the afternoon they heard a sudden clamor. Something was fluttering, squawking, and screeching. The sounds were hideous, in sharp contrast to the pleasure of the terrain.
Serrilryan's canine lip curled. Quickly she shifted to human form. "Beast birds! Needs must we hide."
But it was not to be. The creatures had winded them, and the pursuit was on. "Let not their filthy claws touch thee," the werebitch warned. "The scratches will fester into gangrene.' She changed back to canine form and stood guarding him, teeth bared.
The horde burst upon them. They seemed to be large birds--but their faces were those of ferocious women. Clef's platinum rapier was in his hand, but he hesitated to use it against these part-human creatures. Harpies -that was what they were.
They gave him little opportunity to consider. Three of them flew at his head, discolored talons extended. "Kill! Kill!" they screamed. The smell was appalling.
Serrilryan leaped, her teeth catching the grimy underbelly of one bird. Greasy feathers fell out as the creature emitted a shriek of amazing ugliness. Immediately the other two pounced on the wolf, and two more swooped down from above.
Clef's misgivings were abruptly submerged by the need
It isn't perfect, but it's pretty good considering how tiny the source code is to use it. Roughly 200 lines of code.
The edge of the knife, the part of the source code that does the work, is the pair of lines:
BKSOCRBoss *boss = [[BKSOCRBoss alloc] init]; NSArray<BKSTextPiece *> *pieces = [boss recognizeImageURL:url error:&error];
where BKSOCRBoss
is my wrapper for the vision library, included in the source code
as BKSOCRBoss.m
, url
is the file URL of the image file, and
BKSTextPiece
is a simple Objective-C object that holds a line of recognized text and
the graphic coordinates of where that line is located in the image.
There's a little bit of code in main
to guess where the paragraphs start.
BKSOCRBoss
is also short. It creates a VNRecognizeTextRequest
and
tells a VNImageRequestHandler
to performRequests:
. All from Apple's
Vision
framework. BKSOCRBoss
does all of this on the main thread because
it is a command line tool. An interactive app would do its work off the main thread.
I did experiment in my interactive app with trying to OCR multiple pages in parallel using NSOperation
in a concurrent queue.
The fastest I could make BKSOCRBoss
go was only about twice as fast as single-threaded.
My guess is that the Vision
framework is using resources on the GPU and trying to spawn more threads just makes
them wait until the GPU is available.
The OCR feature of the Vision
framework is also in iOS from iOS 13 and up.
There's a free mac app in the Apple App Store: OwlOCR that does much of what my ocrImage
does,
but with mine you get the source code to use in your own projects. For example it would be easy to
write an app that reads aloud what the camera sees.
ocrImage.zip - the compiled code for the command line app. [9 KB]
ocrImage_src.zip - the complete source code and Xcode project. [11 KB]
ocrImage_src - the complete Objective-C source code as a web page
ocrImage
is open source under the Apache license.
Version 1.0 - initial release
Version 1.0.1 - added complete source as a web page
Requires macOS 10.15 (Catalina) or newer.
Other Source Code by David Phillip Oster
Page last modified 4/23/2020