PDAC 2019 – Concepts and Applications of Machine Learning to Mining Geoscience – Part 2

Orange: A Data Mining Tool

To continue from my previous post, I will introduce a great tool for basic data mining and machine learning that absolutely any geologist can use with no programming knowledge needed. That tool is Orange. It is free, open source, and intuitive. It can be used to simply visualize your data, and can even go as far as applying some of the most common machine learning algorithms that are used today.

Above we see the initial set-up of a project in Orange. On the left are the various widgets that are available to us. In the middle is our workflow. Typically we would start off with a file, and link that to a data file on our PC (e.g. CSV file). In this case the data was provided to us was 1729 samples of various rock types, locations names, and a few geochemical assay results.

We can also attach a Data Table to the file, to allow us to view the data in a familiar table format.

And as with any exploratory data analysis, some visualization is good to do. Here we can just connect a scatter plot (in visualize tools), to our file.

With any machine learning algorithm, and important step is to normalize the data. In this case, we will center by mean, and scale by standard deviation.

From here we continue on with our machine learning workflow. Below is an example of a basic, completed project that predicts the name of the rock on the remaining data. In this case Random Forest performed the best and was used in the prediction.

As you can see, this is just scraping the surface of Orange (or I suppose the peel!). There are numerous tutorials online that would do a much better job at getting into the nitty-gritty, as I myself am just starting to use this. Python itself is still more powerful, and more flexible. In fact Orange uses Python as it’s backend. However, I expect you can go very far with Orange. And although you can quickly start playing around with some machine learning, knowing how to set up training data and test data, and how to interpret the results, still requires careful thought.

Tags: ,

Leave a Reply