Select PDFs in Finder and perform OCR
This writeup lists what I did in order to
- select one or multiple files in finder
- perform OCR
The reason for is is the acquisition of a new scanner.
It turns out, I first found OCRmyPDF, which I installed via Homebrew. Then I was looking into watching folder (which I might still look into, option from stackexchange).
After this, I followed basically this solution
- Easier OCR on macOS
- and added the ability to select and process multiple files via Creating macOS Monterey Shortcuts for Multiple files | by richard moult
Find below the picture of the shortcut. Two important things:
- inside the loop you have to select the proper variable name that refers to the current file (
Repeat Item
) the shell script reads like this (where
$1
and$2
refer to the input filename and output filename, respectively, setup in a listexport PATH=/opt/homebrew/bin:$PATH && ocrmypdf –force-ocr $1 $2
Update 2024-Oct-19
- I am using Hazel now to watch the FTP folder into which the scanner puts the PDFs
Hazel does basically two things:
check if a new PDF comes into this folder, and if so it performs OCR via this embedded Script, where
$1
refers to the filename that is passed as a full path into the sh script, hence the basename is extracted viabasename
, without the extension, hence two arguments in the round brackets:# /bin/bash base_name=$(basename $1 .pdf) # echo ${base_name} # echo ${base_name}.pdf cd <<FullPathWherePdfsReside>> ocrmypdf ${base_name}.pdf ${base_name}_ocr.pdf
perform for each kind of PDF an extra rule, which renames the pdf, assigns a color label (because why not) and adds a task to my OmniFocus system via AppleScript, in which it adds a due date, and links to this PDF as a reference (this is wonderful and makes my life much easier)
set theDate to current date set theTask to "NameOfTask" set theNote to "Scanned and processed on " & (theDate as string) & return tell application "OmniFocus" tell front document set theTag to first flattened tag where its name = "MyTag" set theProject to first flattened project where its name = "MyProject" set theDueDate to theDate + (24 * hours) tell theProject set theOtherTask to make new task with properties {name:theTask, note:theNote, primary tag:theTag, due date:theDueDate} tell the note of theOtherTask make new file attachment with properties {file name:theFile, embedded:false} end tell end tell end tell end tell
This is just wonderful!