
Geological data is often not normally distributed, and with data that is not normally distributed parametric methods should not be used. In this case, I did a paired experiment, which is a simplified example of blocking, where comparisons are made between similar experimental units. This blocking needs to be done prior to performing the […]
Dealing with Missing Data Properly Using the missingno Package in Python
Recently, Kaggle started a playground competition Categorical Feature Encoding Challenge II. This competition built on a previous competition by adding a twist… missing data. I did a short analysis of that missing data and built a notebook you can see here, but I thought […]
Comparing Distribution. Chisquare and other robust methods.
Most of the discussion previously has assumed normal distributions, and given some options for nonparametric situations, but we need tests to determine what type of distribution we are dealing with.
The chisquare goodnessoffit test is the first, and simplest way to do this. It compares a […]
These are tests used to compare the averages between to distributions. Comparing two means, we could compare the mean to a predefined value (e.g., Are these sandwiches on average different that Subway’s 12” claim?). Or we could compare two means from two separate sample sets together (e.g., Do the sandwiches at this Subway […]
Measures of spread and scale
Often, we may want to compare our sample variance, with a hypothetical population variance, in order to ensure it doesn’t exceed a certain value. Back to our sandwiches, perhaps Subway is okay with footlongs varying in length by no more than 1”. The test statistic we use is […]
We often would like to compare a population parameter to our sample. We may know this parameter in advance, or we may want to compare it to another sample. Suppose we want to know for sure Subway is giving us a real footlong sandwich when we ask for one. We buy 30 12″ […]
Introduction
Most often in university statistics courses, parametric techniques are given the primary focus. These techniques involve summarizing distributions with typical parameters like mean, median, mode, variance, and standard deviation. The focus is on central tendency and the spread of subsets of the populations to make general conclusions and determine equivalence.
This has […]


Recent Comments