## The Wilcoxon Sign Rank Test: A Geological Example

Geological data is often not normally distributed, and with data that is not normally distributed parametric methods should not be used.  In this case, I did a paired experiment, which is a simplified example of blocking, where comparisons are made between similar experimental units. This blocking needs to be done prior to performing the […]

## Missingness Analysis

Dealing with Missing Data Properly Using the missingno Package in Python

Recently, Kaggle started a playground competition Categorical Feature Encoding Challenge II. This competition built on a previous competition by adding a twist… missing data. I did a short analysis of that missing data and built a notebook you can see here, but I thought […]

## Univariate Statistics Part 4, Methods to compare distributions

Comparing Distribution. Chi-square and other robust methods.

Most of the discussion previously has assumed normal distributions, and given some options for nonparametric situations, but we need tests to determine what type of distribution we are dealing with.

The chi-square goodness-of-fit test is the first, and simplest way to do this. It compares a […]

## Univariate Statistics Part 3, How do you comparing central tendency

These are tests used to compare the averages between to distributions. Comparing two means, we could compare the mean to a predefined value (e.g., Are these sandwiches on average different that Subway’s 12” claim?). Or we could compare two means from two separate sample sets together (e.g., Do the sandwiches at this Subway […]

## Univariate Statistics Part 2, measuring scale, and the standard deviation

Often, we may want to compare our sample variance, with a hypothetical population variance, in order to ensure it doesn’t exceed a certain value. Back to our sandwiches, perhaps Subway is okay with footlongs varying in length by no more than 1”. The test statistic we use is […]

## Univariate Statistics Part 1, Formulating your hypothesis, and the p-value

We often would like to compare a population parameter to our sample. We may know this parameter in advance, or we may want to compare it to another sample. Suppose we want to know for sure Subway is giving us a real footlong sandwich when we ask for one. We buy 30 12″ […]

## Exploratory Data Analysis and data types in a geological context

Introduction

Most often in university statistics courses, parametric techniques are given the primary focus. These techniques involve summarizing distributions with typical parameters like mean, median, mode, variance, and standard deviation. The focus is on central tendency and the spread of subsets of the populations to make general conclusions and determine equivalence.

This has […]