Posts Tagged ‘tutorial’

The Wilcoxon Sign Rank Test: A Geological Example

Geological data is often not normally distributed, and with data that is not normally distributed parametric methods should not be used.  In this case, I did a paired experiment, which is a simplified example of blocking, where comparisons are made between similar experimental units. This blocking needs to be done prior to performing the experiment.

Wilcoxon sign rank test can be used when we can’t assume a normal distribution in a paired experiment.

We will use the fictional data below to determine if Arsenic (As) is increasing in soil samples over a 1-year period.

Location IDFeb 2018 (As ppm)Feb 2019 (As ppm)
Figure 1: Fictional As soil sample data.

In this sample data set, I used a Shapiro-Wilk test to test for normality (∝=0.05) (“Shapiro-Wilk Test Calculator”, 2020). For February 2018, we calculated a p-value of 0.457 that our sample set is normal, so we assume that it is normal. For February 2019, our calculated p-value is 0.047, this is statistically significant and therefore we assume it is not normally distributed.

As usual we state our hypothesis:

We state our level of significance at ∝=0.05, and our number of observations n=9(we ignore values with no differences such as location 4). First, we need to calculate some new information for our test: the absolute difference, and the rank.

Location IDFeb 2018 (As ppm)Feb 2019 (As ppm)DifferenceAbsolute DifferenceRank
Figure 2: Calculation for the Wilcoxon signed rank test.

As this is a Wilcoxon signed rank test, we need to know the rank sum of the negative and positive differences. We ignore sample pairs with no differences.

Our Wilcox test statistic is the smallest of these two calculations, therefore wstat=13. We use a table to find that the critical value for n=10 and ∝=0.05, is wcrit=8, is  (“Wilcoxon Signed-Ranks Table”, 2020). Therefore,  wstat>wcrit, and we cannot reject the null hypothesis. It is possible there has been no change in As levels between the two years. If our critical value was greater than our test statistic, we would be able to reject the null and confirm statistically significant change in As values.

Tags: , ,
Posted in geology, statistics, univariate | No Comments »

First off, I would like to reference and thank Michael Pyrcz, Associate Professor at the University of Texas at Austin. He has a number of lectures on YouTube which are a great resource in geostatistics and subsurface modelling. You can check them out here!

I’m sure it comes as no surprise to anyone, that I would say “Yes! A geologist should learn to code.” However, with this post I aim to de-stigmatize the fear of programming. First off, programming is nothing but building algorithms. And algorithms are nothing but processes and sets of rules used to solve problems. Being able to build these algorithms, is the heart of good programming, and you’ve probably been doing it for years without even realizing. Knowing the actual language itself is the easy part.

So how could I convince you that coding is a huge benefit to geological work? First off, coding is transparent. When you solve a problem programmatically, you are forced to show everyone exactly how you solved that problem. If there are any mistakes, they can be caught quickly. This should be a good thing.

Reproducibility is another benefit. If you run your code to solve a problem, and you compare your answer to someone else who runs their code to solve the same question, the answer should be the same. If it isn’t, finding out why could result in some interesting revelations.

Geology is a very data-driven science. We collect data, and analyze data, in order to understand earth processes. Being able to quantify things is very important. Programs run on numbers, and they do so much better than humans can. Generally speaking if computers are bad at something, humans are better. And if humans are bad at something, computers are better.

Not only that, but the heavy lifting has already been done! This is because of the many packages, libraries, and open source availability of code. Chances are, if you want to do something, someone else has already done it and has shared their code. Once you know how to code, you will probably realize you spend more of your time reading other peoples code, than writing your own.

Better communication between the geologist and the IT staff will allow each person to do their job better. When you bring domain knowledge to the table, along with techincal prowess, you are going to come up with a better result everytime. Even if you only have a basic understanding, you’ll know the capabilities and limits of coding, and you’ll be able to ask the right questions.

Sharing your code with others will allow them to make impacts on their projects in ways you may not have even imagined to use it. This type of altruism often pays itself back in the long run. The more people use your code, the better everyone is off, including you.

Being able to automate part of your job will help in two ways. First off, you can reduce the amount of time you take during your job doing the boring parts. Repetitive tasks can be done automatically leaving you more time to actually do geology! Secondly, it saves a lot of time. It may take you longer to code a workflow the first time, but it will pay for itself in the long run. Being more productive is always a good thing.

Most importantly, I promise you, when you finish your first block of code that does something you find very useful, you will feel like boundaries have been lifted. It empowers you, and you start to look at your computer as a lot more than a word processor.

I would suggest to anyone trying to pick up a language, to try Python first. Python is a very powerful scientific programming language with a lot of community and library support. It is also very intuitive for first time learners, and there are many very good beginner tutorials out there. If there is interest I will do another blog post on getting started in Python from a geological perspective. R is another language to consider. It is less intuitive than Python but very powerful in statistical methods. I would suggest sticking with one, and learning it well. When you find out you need to learn a new tool, library, or completely new language, that’s the time to try to acquire those skills. There is no doubt either, that basic Python/R skills will look good on a resume.

Posted in programming | No Comments »