planetwater

ground- water, geo- statistics, environmental- engineering, earth- science

Select PDFs in Finder and perform OCR

without comments

This writeup lists what I did in order to

  • select one or multiple files in finder
  • perform OCR

The reason for is is the acquisition of a new scanner.

It turns out, I first found OCRmyPDF, which I installed via Homebrew. Then I was looking into watching folder (which I might still look into, option from stackexchange).

After this, I followed basically this solution

Find below the picture of the shortcut. Two important things:

  1. inside the loop you have to select the proper variable name that refers to the current file (Repeat Item)
  2. the shell script reads like this (where $1 and $2 refer to the input filename and output filename, respectively, setup in a list

    export PATH=/opt/homebrew/bin:$PATH && ocrmypdf –force-ocr $1 $2

Picture of the Shortcut

Update 2024-Oct-19

  • I am using Hazel now to watch the FTP folder into which the scanner puts the PDFs
  • Hazel does basically two things:

    1. check if a new PDF comes into this folder, and if so it performs OCR via this embedded Script, where $1 refers to the filename that is passed as a full path into the sh script, hence the basename is extracted via basename, without the extension, hence two arguments in the round brackets:

      # /bin/bash
      base_name=$(basename $1 .pdf)
      # echo ${base_name}
      # echo ${base_name}.pdf
      cd <<FullPathWherePdfsReside>>
      ocrmypdf ${base_name}.pdf ${base_name}_ocr.pdf 
      
    2. perform for each kind of PDF an extra rule, which renames the pdf, assigns a color label (because why not) and adds a task to my OmniFocus system via AppleScript, in which it adds a due date, and links to this PDF as a reference (this is wonderful and makes my life much easier)

      set theDate to current date
      set theTask to "NameOfTask"
      
      set theNote to "Scanned and processed on " & (theDate as string) & return
      
      tell application "OmniFocus"
          tell front document
              set theTag to first flattened tag where its name = "MyTag"
              set theProject to first flattened project where its name = "MyProject"
              set theDueDate to theDate + (24 * hours)
              tell theProject
                  set theOtherTask to make new task with properties {name:theTask, note:theNote, primary tag:theTag, due date:theDueDate}
                  tell the note of theOtherTask
                      make new file attachment with properties {file name:theFile, embedded:false}
                  end tell
              end tell
          end tell
      end tell
      

This is just wonderful!

Written by Claus

October 18th, 2024 at 4:26 pm

Posted in Uncategorized

VEGAS Workshop

without comments

Last week we had our two-day @VegasIws@mathstodon.xyz workshop in the Black Forrest. We role-played group communication (the “NASA-exercise”), discussed research data management and publications as relevant for @VegasIws@mathstodon.xyz, dreamed up projects, ate, drank and experienced the firs of the Black Forrest!

We think, very useful for the team! 😉

Written by Claus

March 12th, 2024 at 5:51 pm

Posted in Uncategorized

Talks at EGU24

without comments

I’m happy to be able to give two presentations at the #EGU24 meeting in Vienna on Monday April 15, 2024

Time ID Session Title Authors
08:55–09:05 EGU24-7899 HS8.1.7 Immobilization of Per- and Polyfluoroalkyl Substances (PFAS) – Experimental and Model-based Analysis of Leaching Behavior Claus Haslauer, Thomas Bierbaum, Simon Kleinknecht, and Tobias Junginger
15:25–15:35 EGU24-7820 HS3.9 Data-driven surrogate-based Bayesian model calibration for predicting vadose zone temperatures in drinking water supply pipes Ilja Kröker, Elisabeth Nißler, Sergey Oladyshkin, Wolfgang Nowak, and Claus Haslauer

Looking forward to meet you in Vienna! The conference is in Vienna, Austria & Online | 14–19 April 2024

Written by Claus

March 12th, 2024 at 5:38 pm

Posted in Uncategorized

PFClean Project in the news

without comments

Our PFAS-remediation project “PFClean” is in the news today

– at U of Stuttgart news
– at LinkedIn of U of Stuttgart

Written by Claus

March 1st, 2024 at 10:19 am

Posted in

Tagged with

New Water Blog “The Water Droplet”

without comments

Almost one year ago already, Steve Shikaze started a new blo, “The Water Droplet”.

Steve has a long career in hydrogeology, we “share” the same Phd-advisor, and he almost hired me once. I guess it’s fair to assume, I am biased in his favour.

Steadily, Steve is covering important hydrogeological issues such as land subsidence, groundwater recharge, PFAS / forever chemicals, groundwater management issues in California and in the south/central US.

Since you are already on this site, there is a good chance that you will be interested in The Water Droplet. So I encourage you to head over there!

Written by Claus

January 9th, 2024 at 4:54 pm

Posted in Uncategorized

Tagged with ,

Stones Turned This Week

without comments

Net News Wire

NetNewsWire 6 is now on the (iOS, iPadOS) AppStore!

Together with NetNewsWire 6 on macOS it is a wonderful open RSS solution, offers iCloud sync that works well (and makes me think about my long-time, by now beloved, but not mentioned on _DavidSmith’s blog in quite some time, Feed Wrangler subscription).

One feature that I use more than anticipated is NNW’s novel ability to subscribe to Twitter Accounts (even searches) like a feed. It’s not surprising given twitter’s gradual and steady deterioration of the timeline.

How else, other than NNW, do you keep track of RSS feeds?

Speaking of blogging, Macdrifter seems to back from his hiatus. What a nice polarity to twitter – few posts, a lot of content! Also, I completely stole the title of this post from him.

Marble Quarry

Admittedly, this video is another link from Jason Kotke, but it has a strong connection to fractured rock hydrogeology, and hence is relevant for this site. The combination of the visuals from the open pit mine with its bulldozers together with the audio from an opera, is more delightful than expected. Then again, I am not so sure what to expect from an ad for a quarry.

Trinkwasser in Deutschland — Mengenproblematik

Obwohl wir (in Stuttgart) bisher eher ein durchschnittlich nasses Jahr haben, werden die Rufe nach mangelnder WasserverfĂŒgbarkeit lauter.

18806 year meo
Hydrometeorologische GrĂ¶ĂŸen erstes Halbjahr 2021 im Vergleich zum langjĂ€hrigen Schnitt an der Wetterstation der Uni Stuttgart.

DĂŒrreperioden: Wird in Franken das Trinkwasser knapp? – NĂŒrnberg – nordbayern.de

Bundesamt fĂŒr Bevölkerungsschutz warnt vor Trinkwasserknappheit

Ein neuer See fĂŒrs FrĂ€nkische Seenland? – Gunzenhausen, Treuchtlingen, Weißenburg, Roth, WassertrĂŒdingen, WassertrĂŒdingen | Nordbayern

Written by Claus

June 28th, 2021 at 1:03 pm

Posted in Uncategorized

Thermally Enhanced Wall Vapour Extraction

without comments

It’s been awfully quiet around here. My delightful New Year’s resolutions are already in the can, and it’s only early March. As if there were not sufficient issues with a raging pandemic.

Maybe the picture below tells you a little bit about my state of mind. This is how my bedroom and office has been looking. Thermally enhanced wall vapour extraction. The person who built a wastewater pipe in the shape of a double-S (yes, literally) should be ashamed for a long time.

Wall
Progress has been made!

Written by Claus

March 9th, 2021 at 4:47 pm

Posted in Uncategorized

Too Many Meetings

without comments

It’s mid-January. I had great New Year’s resolutions. I have already downgraded to “blog more”. Brian Romans started what he calls “friday links” on twitter. Let’s say that I aim for something similar, but here on my blog. Which I have been neglecting. Starting to write again on a blog that has been neglected, is in style.

The thing that has been bothering me substantially before Christmas: too many meetings. To some extent, the “too many meetings” problem has been going on before Corona. At the early times of Corona (Spring 2020), there might have been actually less meetings. Currently, the situation seems as bad as it has ever been. One Webex or Zoom, meeting after meeting. It’s difficult to find time to get anything done. This has become very clear over the Christmas holidays.

This article (via Rui Carmo) gets many things right. This I found particularly interesting:

It’s even worse when a worker has several meetings that are separated by 30 minutes. “Not enough time to transition in a non-MRS situation to get anything done, and in an MRS situation, not quite enough time to recover for the next meeting,”

One remedy sounds simple but I am not sure how to achieve this: less meetings

Less time in meetings would ultimately lead to more employee engagement in the meetings they do attend, which experts agree is a proven remedy for future cases of MRS.

So, “just” saying “no” more often!(?) Maybe more tools and automation? Maybe this high – profile advice will help me to decide if I should accept a meeting. Also important: he emphasises that everybody should be prepared, everybody should speak. Breaks sometimes are no real breaks but are giving your brain time to digest thoughts. Even more, breaks are always necessary. And: everything is uncertain!

It is one thing to realise that everything is uncertain, but as @dougmcneall points out:

Making constant risk decisions is exhausting.

This brings us to Corona. During which the usual exhausted-ness seems to be amplified, e.g., with sub-optimal working conditions, and with kids at home. Like Hayley Fowler points out: I’m just tired of everything. Like Brent Simmons point outs:

“I’ve been haunted since hearing, in the early days of the pandemic, that if we all wore masks for six weeks this thing would be over. I was there. I’ve done that for six weeks, and another six weeks, and another. And now it’s worse than ever. It’s a challenge not to be angry. There are healthy, uninfected people right now, today, who are excited for the vaccine and who will die before they get it.

Teaching Experiences

  • webex (which we use at the University of Stuttgart), has now the ability to share the iPad’s screen! It might have been there before, but I realised it existed only on Monday. Before, I knew that zoom can do it. Anyways, this has proven to be a nice tool for teaching sequentially and more spontaneously than an animated slideshow.

  • I’ve upgraded to JupyterLab 3 with it’s visual debugger. Very nice, also for teaching! This ranked list of awesome Jupyter Notebook, Hub and Lab projects (extensions, kernels, tools), that is updated weekly, provides also very many useful hints!

Ending

With Input from

Written by Claus

January 18th, 2021 at 5:03 pm

Posted in Uncategorized

Pickup Basketball

without comments

Life under COVID is all organized virtual play dates and no unexpected pickup basketball.

I’ve never played much basketball, but there was always a game of soccer going on on the streets or on the field in the neighbourhood where I grew up. I’ve never even considered that as “important”. Still, Peter’s statement resonates very well with me. It’s likely the same reason why I miss having a “break” with colleagues.

What can we do about it? How can we keep up with colleagues as easily?

This won’t solve everything. In an attempt to at least get out more (turns out, sitting in front of a computer all day does not make me more productive, d’oh), I followed Sina Trinkwalder’s motivation #lockdownlaufen #movemeber.

Walking

Written by Claus

December 3rd, 2020 at 5:11 pm

Posted in Uncategorized

Adventskalender

without comments

Can you have too many Advent Calendars? — I don’t think so!

Here is a wonderful python related one!

Screenshot 2020 12 03 at 16 07 28

PS: I already needed help from wtfpython

PPS: A colleague recommended this… interesting…

def (a, b=None):
    if b is None or a / b &lt; 0:
        return a
    return a / b

Written by Claus

December 3rd, 2020 at 4:07 pm

Posted in Uncategorized

Tagged with