Select PDFs in Finder and perform OCR
This writeup lists what I did in order to
- select one or multiple files in finder
- perform OCR
The reason for is is the acquisition of a new scanner.
It turns out, I first found OCRmyPDF, which I installed via Homebrew. Then I was looking into watching folder (which I might still look into, option from stackexchange).
After this, I followed basically this solution
- Easier OCR on macOS
- and added the ability to select and process multiple files via Creating macOS Monterey Shortcuts for Multiple files | by richard moult
Find below the picture of the shortcut. Two important things:
- inside the loop you have to select the proper variable name that refers to the current file (
Repeat Item
) the shell script reads like this (where
$1
and$2
refer to the input filename and output filename, respectively, setup in a listexport PATH=/opt/homebrew/bin:$PATH && ocrmypdf –force-ocr $1 $2
Update 2024-Oct-19
- I am using Hazel now to watch the FTP folder into which the scanner puts the PDFs
Hazel does basically two things:
check if a new PDF comes into this folder, and if so it performs OCR via this embedded Script, where
$1
refers to the filename that is passed as a full path into the sh script, hence the basename is extracted viabasename
, without the extension, hence two arguments in the round brackets:# /bin/bash base_name=$(basename $1 .pdf) # echo ${base_name} # echo ${base_name}.pdf cd <<FullPathWherePdfsReside>> ocrmypdf ${base_name}.pdf ${base_name}_ocr.pdf
perform for each kind of PDF an extra rule, which renames the pdf, assigns a color label (because why not) and adds a task to my OmniFocus system via AppleScript, in which it adds a due date, and links to this PDF as a reference (this is wonderful and makes my life much easier)
set theDate to current date set theTask to "NameOfTask" set theNote to "Scanned and processed on " & (theDate as string) & return tell application "OmniFocus" tell front document set theTag to first flattened tag where its name = "MyTag" set theProject to first flattened project where its name = "MyProject" set theDueDate to theDate + (24 * hours) tell theProject set theOtherTask to make new task with properties {name:theTask, note:theNote, primary tag:theTag, due date:theDueDate} tell the note of theOtherTask make new file attachment with properties {file name:theFile, embedded:false} end tell end tell end tell end tell
This is just wonderful!
VEGAS Workshop
Last week we had our two-day @VegasIws@mathstodon.xyz workshop in the Black Forrest. We role-played group communication (the “NASA-exercise”), discussed research data management and publications as relevant for @VegasIws@mathstodon.xyz, dreamed up projects, ate, drank and experienced the firs of the Black Forrest!
We think, very useful for the team! đ
Talks at EGU24
I’m happy to be able to give two presentations at the #EGU24 meeting in Vienna on Monday April 15, 2024
Time | ID | Session | Title | Authors |
---|---|---|---|---|
08:55â09:05 | EGU24-7899 | HS8.1.7 | Immobilization of Per- and Polyfluoroalkyl Substances (PFAS) â Experimental and Model-based Analysis of Leaching Behavior | Claus Haslauer, Thomas Bierbaum, Simon Kleinknecht, and Tobias Junginger |
15:25â15:35 | EGU24-7820 | HS3.9 | Data-driven surrogate-based Bayesian model calibration for predicting vadose zone temperatures in drinking water supply pipes | Ilja Kröker, Elisabeth NiĂler, Sergey Oladyshkin, Wolfgang Nowak, and Claus Haslauer |
Looking forward to meet you in Vienna! The conference is in Vienna, Austria & Online | 14â19 April 2024
PFClean Project in the news
Our PFAS-remediation project “PFClean” is in the news today
– at U of Stuttgart news
– at LinkedIn of U of Stuttgart
New Water Blog “The Water Droplet”
Almost one year ago already, Steve Shikaze started a new blo, “The Water Droplet”.
Steve has a long career in hydrogeology, we “share” the same Phd-advisor, and he almost hired me once. I guess it’s fair to assume, I am biased in his favour.
Steadily, Steve is covering important hydrogeological issues such as land subsidence, groundwater recharge, PFAS / forever chemicals, groundwater management issues in California and in the south/central US.
Since you are already on this site, there is a good chance that you will be interested in The Water Droplet. So I encourage you to head over there!
Stones Turned This Week
Net News Wire
NetNewsWire 6 is now on the (iOS, iPadOS) AppStore!
Together with NetNewsWire 6 on macOS it is a wonderful open RSS solution, offers iCloud sync that works well (and makes me think about my long-time, by now beloved, but not mentioned on _DavidSmith’s blog in quite some time, Feed Wrangler subscription).
One feature that I use more than anticipated is NNW’s novel ability to subscribe to Twitter Accounts (even searches) like a feed. It’s not surprising given twitter’s gradual and steady deterioration of the timeline.
How else, other than NNW, do you keep track of RSS feeds?
Speaking of blogging, Macdrifter seems to back from his hiatus. What a nice polarity to twitter – few posts, a lot of content! Also, I completely stole the title of this post from him.
Marble Quarry
Admittedly, this video is another link from Jason Kotke, but it has a strong connection to fractured rock hydrogeology, and hence is relevant for this site. The combination of the visuals from the open pit mine with its bulldozers together with the audio from an opera, is more delightful than expected. Then again, I am not so sure what to expect from an ad for a quarry.
Trinkwasser in Deutschland — Mengenproblematik
Obwohl wir (in Stuttgart) bisher eher ein durchschnittlich nasses Jahr haben, werden die Rufe nach mangelnder WasserverfĂŒgbarkeit lauter.
DĂŒrreperioden: Wird in Franken das Trinkwasser knapp? – NĂŒrnberg – nordbayern.de
Bundesamt fĂŒr Bevölkerungsschutz warnt vor Trinkwasserknappheit
Thermally Enhanced Wall Vapour Extraction
It’s been awfully quiet around here. My delightful New Year’s resolutions are already in the can, and it’s only early March. As if there were not sufficient issues with a raging pandemic.
Maybe the picture below tells you a little bit about my state of mind. This is how my bedroom and office has been looking. Thermally enhanced wall vapour extraction. The person who built a wastewater pipe in the shape of a double-S (yes, literally) should be ashamed for a long time.
Too Many Meetings
It’s mid-January. I had great New Year’s resolutions. I have already downgraded to “blog more”. Brian Romans started what he calls “friday links” on twitter. Let’s say that I aim for something similar, but here on my blog. Which I have been neglecting. Starting to write again on a blog that has been neglected, is in style.
The thing that has been bothering me substantially before Christmas: too many meetings. To some extent, the “too many meetings” problem has been going on before Corona. At the early times of Corona (Spring 2020), there might have been actually less meetings. Currently, the situation seems as bad as it has ever been. One Webex or Zoom, meeting after meeting. It’s difficult to find time to get anything done. This has become very clear over the Christmas holidays.
This article (via Rui Carmo) gets many things right. This I found particularly interesting:
Itâs even worse when a worker has several meetings that are separated by 30 minutes. âNot enough time to transition in a non-MRS situation to get anything done, and in an MRS situation, not quite enough time to recover for the next meeting,â
One remedy sounds simple but I am not sure how to achieve this: less meetings
Less time in meetings would ultimately lead to more employee engagement in the meetings they do attend, which experts agree is a proven remedy for future cases of MRS.
So, “just” saying “no” more often!(?) Maybe more tools and automation? Maybe this high – profile advice will help me to decide if I should accept a meeting. Also important: he emphasises that everybody should be prepared, everybody should speak. Breaks sometimes are no real breaks but are giving your brain time to digest thoughts. Even more, breaks are always necessary. And: everything is uncertain!
It is one thing to realise that everything is uncertain, but as @dougmcneall points out:
Making constant risk decisions is exhausting.
This brings us to Corona. During which the usual exhausted-ness seems to be amplified, e.g., with sub-optimal working conditions, and with kids at home. Like Hayley Fowler points out: I’m just tired of everything. Like Brent Simmons point outs:
“Iâve been haunted since hearing, in the early days of the pandemic, that if we all wore masks for six weeks this thing would be over. I was there. Iâve done that for six weeks, and another six weeks, and another. And now itâs worse than ever. Itâs a challenge not to be angry. There are healthy, uninfected people right now, today, who are excited for the vaccine and who will die before they get it.
Teaching Experiences
webex (which we use at the University of Stuttgart), has now the ability to share the iPad’s screen! It might have been there before, but I realised it existed only on Monday. Before, I knew that zoom can do it. Anyways, this has proven to be a nice tool for teaching sequentially and more spontaneously than an animated slideshow.
I’ve upgraded to JupyterLab 3 with it’s visual debugger. Very nice, also for teaching! This ranked list of awesome Jupyter Notebook, Hub and Lab projects (extensions, kernels, tools), that is updated weekly, provides also very many useful hints!
Ending
France and Switzerland have a great free data policy
I re-built an old Dell Latitude 6410 with Ubuntu 20.10 â and it works great (except for the screen after sleeping). But it has been serving us (and my kid) great for instructional videos and moodle!
With Input from
- Brian Romans ?? (@clasticdetritus) / Twitter
- Don Melton (@donmelton) / Twitter
- Rui Carmo (@rcarmo) / Twitter
- Dr Doug McNeall (@dougmcneall) / Twitter
- Prof Hayley J. Fowler (@HayleyJFowler) / Twitter
- Barack Obama (@BarackObama) / Twitter
- Brent Simmons (@brentsimmons) / Twitter
- Massimiliano Zappa (@Hydrology_WSL) / Twitter
Pickup Basketball
Life under COVID is all organized virtual play dates and no unexpected pickup basketball.
I’ve never played much basketball, but there was always a game of soccer going on on the streets or on the field in the neighbourhood where I grew up. I’ve never even considered that as “important”. Still, Peter’s statement resonates very well with me. It’s likely the same reason why I miss having a “break” with colleagues.
What can we do about it? How can we keep up with colleagues as easily?
This won’t solve everything. In an attempt to at least get out more (turns out, sitting in front of a computer all day does not make me more productive, d’oh), I followed Sina Trinkwalder’s motivation #lockdownlaufen #movemeber.
Adventskalender
Can you have too many Advent Calendars? — I don’t think so!
Here is a wonderful python related one!
PS: I already needed help from wtfpython
PPS: A colleague recommended this… interesting…
def (a, b=None):
if b is None or a / b < 0:
return a
return a / b