ground- water, geo- statistics, environmental- engineering, earth- science

come to AGU18 session “H114: Space-Time Data and Models”

without comments

I would like to invite you to the following session at #AGU18:

The problem of estimating a variable at unobserved locations and/or times is important for many areas of research, including geosciences, civil-/ environmental engineering, soil sciences, agriculture, ecology, forestry, meteorology / climatology, oceanography, health / epidemiology.

The amount of data gathered is increasing (e.g., advances in measurement technologies, remote sensing, or citizen science). Challenges remain related to the interplay between heterogeneous measurements and improvements in models that can make use of the various types of data. This session aims to bring contributions together that demonstrate how to improve datasets and maximise their use through measurement techniques, statistics, and modelling, e.g., via

  • innovative ways to measure data in the environment;
  • the incorporation of innovatively measured data into modelling (usefulness, relevant scale);
  • the inclusion of as much information as available to improve prediction (secondary / heterogeneous data, data on different scales);
  • the consideration of the variability in the quality of the measurements;

Please find a pdf about the session here.

(cross-posted from

Written by Claus

July 30th, 2018 at 4:41 pm

Posted in Uncategorized

Tagged with ,

without comments

How to Filter a List in Python; also: how to compare two results of %timeit

Written by Claus

May 16th, 2018 at 9:02 pm

Posted in Uncategorized

Tagged with

Breaking Twitter?

without comments

Twitter might soon be broken (#BreakingMyTwitter). Really, third party clients might be broken, due to API changes. More details are available at, a website from a group of third party twitter clients.

I am happy with Twitterrific, both on the mac (both before and after the revival) and on iOS. I have never used a native twitter client on any OS. I am not sure since when it is known that the end of the third party clients could be near. Version 5 of Twitterrific has been out since October 2017. Was it known then?

Now, as I have posted before, I appreciate the free web. The existence of this website is evidence of this. I guess, a lot of things can happen until June. It would be nice if open alternatives (e.g.,, mathstodon) would gain more users. On the other hand, on work-related topics, it seems like Twitter has recently stepped over a critical mass threshold, and I do enjoy the conversations there. Yet again, I know people who leave twitter, because of trolling and because of being not open. As they say, the future remains interesting!

Written by Claus

April 9th, 2018 at 8:05 pm

Posted in Uncategorized

Getting Ready for #hymod18

without comments

To get ready for the “Integrated Hydrosystem Modelling 2018″ Conference”s, about to start in a couple of hours, I played with a watershed on my sofa and a bivariate Gaussian distribution on my coffee table

Both apps are enabled via Apple’s ARKit:

  • GeoGebra has an app called “GeoGebra Augmented Reality” that allows you to plot functions of two variables on a surface that you can pick, like my coffee table. You can then rotate, walk around it, look on top of it and explore in other ways those functions. Great fun!
  • The WWF Free Rivers app puts a simple watershed on a surface you can define (like my sofa). Then clouds move in, and you can paddle down the river. Maybe more for kids. Still fun.

Great to see such nice use cases, and let’s get ready for integrated hydrosystem modelling #hymod18!

A watershed on my sofa
A watershed on my sofa.

a bivariate Gaussian density function on my coffee table
A bivariate Gaussian density on my coffee table.

Written by Claus

April 3rd, 2018 at 6:48 am

Posted in Uncategorized

Crisp and Interval-Based Conditional Probabilities

without comments

“Censored data” is the common statistical term for values that are within an interval. A typical example from environmental data-sets are measurements below a certain detection limit. If a measurement is below detection limit, due to the analytical performance of the device that measures the concentration, we don’t know its precise value, but we do know that it is somewhere in the interval \smash{\in(0, \textnormal{detection limit})}.

Application in Environmental Hydrology

A typical example where censored measurements play an important role are solute concentrations in groundwater. The measured concentration value of some solute depends on the analytical method that was used for quantification of the concentration. Sometimes, the concentration is so small that we can not be certain about it’s value, and we assume that the true value is somewhere between zero and the analytical detection limit.

In a recent example, my coauthors and I demonstrated the importance of including censored measurements to derive a representative concentration of chlorinated solutes in a hydrogeological layer at two boreholes within a fractured sandstone. Due to the fractured nature of the sandstone, at most depths the concentrations were fairly small and frequently below detection limit, whereas in the fractures, typically large concentrations were encountered. Taking the censored measurements (the concentrations below detection limit) in a statistical meaningful way into account lead to an estimate of representative concentrations that corresponded to the conceptual site hydrogeological model at the upstream and downstream borehole, and can be important for site assessment.

Related to censored measurements, but different, are true zeros. An example of a measurement of true zero is a rain gauge that measures precipitation when it does not rain. The distinction between a true zero and a measurement below detection limit can be tricky, because they are both small values. If you’re interested in how to include true zeros in this approach, please continue to read here. A truely zero measurement means that its value is zero and not in an interval between zero and the detection limit.

If you are interested in a statistically reasonable treatment of censored measurements, you can find the related publication in Environmental Science and Technology.

I’ll explain the basic underlying theory below.

Basic Statistics Example

I have written about conditional_probabilities quite some time ago. This can be viewed as an extension.

A crisp condition is something like “what is the probability of event A to occur, given event B has occurred”. This is how conditional probabilities are typically taught with. Compared to a univariate density, a conditional density should have a smaller variance, and is shifted towards the condition. So far so good.

It turns out that there is a “not-crisp” condition. This is something like “probability of event A given that ‘event’ B is somewhere between zero and b”. The funny thing is, that the uncertainty about this event to occur is smaller than a corresponding normally-distributed univariate event.

When looking at the figure below, this means:

  • the yellow line indicates a standard (variance=1) normal Gaussian density
  • two crisp conditional densities are shown by the solid (p(x|y=-2.0)) and the dashed (p(x|y=+2.0)). Both those densities have a smaller uncertainty (variance) than the univariate standard normal
  • two interval-based conditional densities are shown in red (p(x|y \leq -2.0)) and blue (p(x|y \leq +2.0)). The interval-based densities have the same location as the crisp conditionals. Their uncertainties are smaller than the corresponding univariate, but larger than the crisp conditionals.

Total rho0 7 sym 2 00

Written by Claus

March 28th, 2018 at 8:45 pm

Posted in Uncategorized

Tagged with

without comments


Written by Claus

March 27th, 2018 at 4:43 pm

Posted in

Hockey Stats (Nürnberg Plays Cologne Tonight)

without comments

I have been playing a bit with hockey data. There is some data wrangling, there is some interesting basic statistics, and there is some Bayes. As this has nothing to do directly with water (other than that it’s played on frozen water), I posted here.

tl,dr: The statistics related to both teams seems to suggest that the series is very close. Guess what, this is also what I saw when I watched it. Despite this similarity, the numbers favour Cologne slightly but consistently. Granted, the analysis is fairly averaging and not deeply distinguishing.

Written by Claus

March 23rd, 2018 at 12:01 pm

Posted in

Digging Into My Research Database

without comments

A new version of Script Debugger was released recently, and I dug a bit into it, using my research database Papers.

For fun, I linked AppleScript (that digs into my database on MacOS) with Python, that processes the data (creates a histogram).

The process worked nicely, and being able to debug AppleScript is wonderful.

More info at

Written by Claus

March 6th, 2018 at 9:22 am

Posted in

Smartphones and Creative Ideas

without comments

At NASA, at least some people get rid of “smart phones” to get creative ideas back: Lynda Barry at NASA’s Goddard Space Flight Centre (via

Barry’s impact on the assembled Goddard employees was immediate; from the moment she arrived, she insisted on abandoning all electronic devices. “They were really flipped out about it,” says Barry. “The phone gives us a lot but it takes away three key elements of discovery: loneliness, uncertainty and boredom. Those have always been where creative ideas come from.”

At the time of writing this, the Süddeutsche Zeitung insists that social media (WhatsApp) “belong into classrooms

update 2017-Oct-11

  • die Tagesschau reports that 14-29 year old Germans are online for about 4.5 hours per day
  • the guardian has a longer report on how smartphones are hijacking ones minds. The text warns about a much more severe consequence: “Drawing a straight line between addiction to social media and political earthquakes like Brexit and the rise of Donald Trump, they contend that digital forces have completely upended the political system and, left unchecked, could even render democracy as we know it obsolete.” The article goes on to explain how there are certain hooks emplaced in smartphone-related technology that are designed to keep you there and make for the companies advertising dollars.

Written by Claus

September 13th, 2017 at 10:56 am

Posted in

Days 2&3 at #spatialstatistics2017

without comments

It became increasingly difficult to post updates on the spatial statistics conference. The icebreaker, another day full with diverse interesting talks, the dinner, another day that ended the conference with an interesting session honouring the achievements of Peter Diggle. Former and current colleagues such as Paulo Ribeiro and Emanuel Giorgi gave enlightening talks that stressed both the scientific achievements and the great kindness and humanity of Peter Diggle. CHICAS, the center for health informatics, computing, and statistics, is the current culmination of his efforts.

It’s hard to pick topics that stood out during the last two days of the conference, just because there were many great talks on a large variety of topics. Here is an attempt.

Point Processes

There were a number of talks covering Point Processes, notably the keynotes by Thordis Thorarinsdottir and Rasmus Waagepetersen. Thordis had a variety of interesting quotes including this one by Frank H Bigelow from 1905:

There are three processes that are generally essential for the complete development of any branch of science, and they must be accurately applied before the subject can be considered to be satisfactorily explained. The first is the discovery of a mathematical analysis, the second is the discussion of numerous observations, and the third is a correct application of the mathematics to the observations, including a demonstration that these are in agreement.

Thordis urged the need for more and better inference methods. I might be worth pointing out that Bigelow went on to state that

Often a good theory is misapplied to good observations, or good observations are explained by a poor theory.

In summary, these thoughts are not too far away from Peter Diggle’s triangle, pictured above.


There were two nice talks that employed copulas for multivariate spatial models and one that I missed, unfortunately:

  • Jonathan Tawn from the University of Lancaster presented on “Modelling Spatial Extreme Events“; he takes great care of marginal distributions and how to reasonably include extremes there for a better joint representation in copula space;
  • Fakhereh Alidoost and Alfred Stein from the University of Twente presented on “Interpolation of Daily Mean Air Temperature Data via Spatial and Non-Spatial Copulas
  • the talk that I missed was entitled “Hierarchical Copula Regression Models for Areal Data” presented by D. Musgrove, J. Hughes and L. Eberly


Written by Claus

July 12th, 2017 at 3:48 pm

Posted in