## Archive for 2018

How to Filter a List in Python; also: how to compare two results of `%timeit`

## Breaking Twitter?

Twitter might soon be broken (#BreakingMyTwitter). Really, third party clients might be broken, due to API changes. More details are available at apps-of-a-feather.com, a website from a group of third party twitter clients.

I am happy with Twitterrific, both on the mac (both before and after the revival) and on iOS. I have never used a native twitter client on any OS. I am not sure since when it is known that the end of the third party clients could be near. Version 5 of Twitterrific has been out since October 2017. Was it known then?

Now, as I have posted before, I appreciate the free web. The existence of this website is evidence of this. I guess, a lot of things can happen until June. It would be nice if open alternatives (e.g., micro.blog, mathstodon) would gain more users. On the other hand, on work-related topics, it seems like Twitter has recently stepped over a critical mass threshold, and I do enjoy the conversations there. Yet again, I know people who leave twitter, because of trolling and because of being not open. As they say, the future remains interesting!

## Getting Ready for #hymod18

To get ready for the “Integrated Hydrosystem Modelling 2018″ Conference”s, about to start in a couple of hours, I played with a watershed on my sofa and a bivariate Gaussian distribution on my coffee table

Both apps are enabled via Apple’s ARKit:

- GeoGebra has an app called “GeoGebra Augmented Reality” that allows you to plot functions of two variables on a surface that you can pick, like my coffee table. You can then rotate, walk around it, look on top of it and explore in other ways those functions. Great fun!
- The WWF Free Rivers app puts a simple watershed on a surface you can define (like my sofa). Then clouds move in, and you can paddle down the river. Maybe more for kids. Still fun.

Great to see such nice use cases, and let’s get ready for integrated hydrosystem modelling **#hymod18**!

## Crisp and Interval-Based Conditional Probabilities

“Censored data” is the common statistical term for values that are within an interval. A typical example from environmental data-sets are measurements below a certain detection limit. If a measurement is below detection limit, due to the analytical performance of the device that measures the concentration, we don’t know its precise value, but we do know that it is somewhere in the interval .

## Application in Environmental Hydrology

A typical example where censored measurements play an important role are solute concentrations in groundwater. The measured concentration value of some solute depends on the analytical method that was used for quantification of the concentration. Sometimes, the concentration is so small that we can not be certain about it’s value, and we assume that the true value is somewhere between zero and the analytical detection limit.

In a recent example, my coauthors and I demonstrated the importance of including censored measurements to derive a representative concentration of chlorinated solutes in a hydrogeological layer at two boreholes within a fractured sandstone. Due to the fractured nature of the sandstone, at most depths the concentrations were fairly small and frequently below detection limit, whereas in the fractures, typically large concentrations were encountered. Taking the censored measurements (the concentrations below detection limit) in a statistical meaningful way into account lead to an estimate of representative concentrations that corresponded to the conceptual site hydrogeological model at the upstream and downstream borehole, and can be important for site assessment.

Related to censored measurements, but different, are true zeros. An example of a measurement of true zero is a rain gauge that measures precipitation when it does not rain. The distinction between a true zero and a measurement below detection limit can be tricky, because they are both small values. If you’re interested in how to include true zeros in this approach, please continue to read here. A truely zero measurement means that its value is zero and not in an interval between zero and the detection limit.

If you are interested in a statistically reasonable treatment of censored measurements, you can find the related publication in Environmental Science and Technology.

I’ll explain the basic underlying theory below.

## Basic Statistics Example

I have written about conditional_probabilities quite some time ago. This can be viewed as an extension.

A crisp condition is something like “what is the probability of event A to occur, given event B has occurred”. This is how conditional probabilities are typically taught with. Compared to a univariate density, a conditional density should have a smaller variance, and is shifted towards the condition. So far so good.

It turns out that there is a “not-crisp” condition. This is something like “probability of event A given that ‘event’ B is somewhere between zero and b”. The funny thing is, that the uncertainty about this event to occur is smaller than a corresponding normally-distributed univariate event.

When looking at the figure below, this means:

- the yellow line indicates a standard (variance=1) normal Gaussian density
- two crisp conditional densities are shown by the solid () and the dashed (). Both those densities have a smaller uncertainty (variance) than the univariate standard normal
- two interval-based conditional densities are shown in red () and blue (). The interval-based densities have the same location as the crisp conditionals. Their uncertainties are smaller than the corresponding univariate, but larger than the crisp conditionals.

Hello micro.blog

## Hockey Stats (Nürnberg Plays Cologne Tonight)

I have been playing a bit with hockey data. There is some data wrangling, there is some interesting basic statistics, and there is some Bayes. As this has nothing to do directly with water (other than that it’s played on frozen water), I posted here.

tl,dr: The statistics related to both teams seems to suggest that the series is very close. Guess what, this is also what I saw when I watched it. Despite this similarity, the numbers favour Cologne slightly but consistently. Granted, the analysis is fairly averaging and not deeply distinguishing.

## Digging Into My Research Database

A new version of Script Debugger was released recently, and I dug a bit into it, using my research database Papers.

For fun, I linked AppleScript (that digs into my database on MacOS) with Python, that processes the data (creates a histogram).

The process worked nicely, and being able to debug AppleScript is wonderful.

More info at claus-haslauer.de