planetwater

ground- water, geo- statistics, environmental- engineering, earth- science

Select PDFs in Finder and perform OCR

without comments

This writeup lists what I did in order to

  • select one or multiple files in finder
  • perform OCR

The reason for is is the acquisition of a new scanner.

It turns out, I first found OCRmyPDF, which I installed via Homebrew. Then I was looking into watching folder (which I might still look into, option from stackexchange).

After this, I followed basically this solution

Find below the picture of the shortcut. Two important things:

  1. inside the loop you have to select the proper variable name that refers to the current file (Repeat Item)
  2. the shell script reads like this (where $1 and $2 refer to the input filename and output filename, respectively, setup in a list

    export PATH=/opt/homebrew/bin:$PATH && ocrmypdf –force-ocr $1 $2

Picture of the Shortcut

Update 2024-Oct-19

  • I am using Hazel now to watch the FTP folder into which the scanner puts the PDFs
  • Hazel does basically two things:

    1. check if a new PDF comes into this folder, and if so it performs OCR via this embedded Script, where $1 refers to the filename that is passed as a full path into the sh script, hence the basename is extracted via basename, without the extension, hence two arguments in the round brackets:

      # /bin/bash
      base_name=$(basename $1 .pdf)
      # echo ${base_name}
      # echo ${base_name}.pdf
      cd <<FullPathWherePdfsReside>>
      ocrmypdf ${base_name}.pdf ${base_name}_ocr.pdf 
      
    2. perform for each kind of PDF an extra rule, which renames the pdf, assigns a color label (because why not) and adds a task to my OmniFocus system via AppleScript, in which it adds a due date, and links to this PDF as a reference (this is wonderful and makes my life much easier)

      set theDate to current date
      set theTask to "NameOfTask"
      
      set theNote to "Scanned and processed on " & (theDate as string) & return
      
      tell application "OmniFocus"
          tell front document
              set theTag to first flattened tag where its name = "MyTag"
              set theProject to first flattened project where its name = "MyProject"
              set theDueDate to theDate + (24 * hours)
              tell theProject
                  set theOtherTask to make new task with properties {name:theTask, note:theNote, primary tag:theTag, due date:theDueDate}
                  tell the note of theOtherTask
                      make new file attachment with properties {file name:theFile, embedded:false}
                  end tell
              end tell
          end tell
      end tell
      

This is just wonderful!

Written by Claus

October 18th, 2024 at 4:26 pm

Posted in Uncategorized

Leave a Reply