planetwater

ground- water, geo- statistics, environmental- engineering, earth- science

Select PDFs in Finder and perform OCR

without comments

This writeup lists what I did in order to

  • select one or multiple files in finder
  • perform OCR

The reason for is is the acquisition of a new scanner.

It turns out, I first found OCRmyPDF, which I installed via Homebrew. Then I was looking into watching folder (which I might still look into, option from stackexchange).

After this, I followed basically this solution

Find below the picture of the shortcut. Two important things:

  1. inside the loop you have to select the proper variable name that refers to the current file (Repeat Item)
  2. the shell script reads like this (where $1 and $2 refer to the input filename and output filename, respectively, setup in a list

    export PATH=/opt/homebrew/bin:$PATH && ocrmypdf –force-ocr $1 $2

Picture of the Shortcut

Update 2024-Oct-19

  • I am using Hazel now to watch the FTP folder into which the scanner puts the PDFs
  • Hazel does basically two things:

    1. check if a new PDF comes into this folder, and if so it performs OCR via this embedded Script, where $1 refers to the filename that is passed as a full path into the sh script, hence the basename is extracted via basename, without the extension, hence two arguments in the round brackets:

       # /bin/bash
       base_name=$(basename $1 .pdf)
       # echo ${base_name}
       # echo ${base_name}.pdf
       cd <<FullPathWherePdfsReside>>
       ocrmypdf ${base_name}.pdf ${base_name}_ocr.pdf 
    2. perform for each kind of PDF an extra rule, which renames the pdf, assigns a color label (because why not) and adds a task to my OmniFocus system via AppleScript, in which it adds a due date, and links to this PDF as a reference (this is wonderful and makes my life much easier)

       set theDate to current date
       set theTask to "NameOfTask"
      
       set theNote to "Scanned and processed on " & (theDate as string) & return
      
       tell application "OmniFocus"
           tell front document
               set theTag to first flattened tag where its name = "MyTag"
               set theProject to first flattened project where its name = "MyProject"
               set theDueDate to theDate + (24 * hours)
               tell theProject
                   set theOtherTask to make new task with properties {name:theTask, note:theNote, primary tag:theTag, due date:theDueDate}
                   tell the note of theOtherTask
                       make new file attachment with properties {file name:theFile, embedded:false}
                   end tell
               end tell
           end tell
       end tell

This is just wonderful!

Written by Claus

October 18th, 2024 at 4:26 pm

Posted in Uncategorized

Leave a Reply