PDAC 2019 – Concepts and Applications of Machine Learning to Mining Geoscience – Part 1

Course Introduction and Reasons Why Machine Learning Projects Fail

I just had the privledge of attending the short course titled above at PDAC, 2019. I would like to thank the course instructors:

First I will give a quick overview of the first day were we went into the history of machine learning, and some of the basics. First to clearly define what Artificial Intelligence is versus Machine Learning. AI involves building machines that react like humans. To give an example, the new “Turing Test” would be to ask it “Can you go into the house and make me a cup of coffee?”. True AI should be able to do this, and we are nowhere near that point. Machine learning on the other hand, is a subset of AI that involves using algorithms to make predictions and classifications based on a large set of training data. A single algorithm can adpat and change it’s own parameters to solve a number of problems. Machine learning can be supervised, where we provide the labels for the data (e.g. rock names, ore, waste, etc.), or it could be unsupervised, where data is clustered based on similarities. Reinforcement learning is another field which is focused on performance and involves finding a balance between exporation and exploitation (e.g. multi-arm bandit problems). A humourus quote that captured the difference.

“If you’re seeing it in PowerPoint, it’s artificial intelligence. If you are seeing it in python, it’s machine learning.”

Cases were we will see machine learning perform the best will be automating menial tasks (e.g. core logging, autonomous driving, and drilling), dealing with highly complex data that humans are not capable of seeing trends in (e.g. exploration with layers of data over past 3D), and cases where rapid reaction time is necessary (e.g. real-time geometallurgy).

One important thing to keep in mind, this will always be a tool for the geologist to use, and not something to replace the geologist entirely. Data must be collected and curated competently, and must be interpreted properly afterwards.

However, this tool has the potential to greatly enhance the ability for the geologists to do both of these things.

A number of other key terms were discussed like cost functions, precision, recall, F-Scores, ROC curves, overfitting and underfitting, all of which deserve their own discussions, which I will do in later posts.

We also went over reasons why machine learning projects fail, which I believe deserves some specific attention:

  • Asking the wrong questions: A specific goal should be delineated before the process begins. This allows you to focus resources on what kinds of data needs to be collected. Aimlessly looking through data is a dangerous endeavor as well. We are notorious as humans, in seeing patterns that don’t exist.
  • Lack of firm support by key stakeholders: Data science projects often have impacts across many departments in an organization. Defining the strategy keeps the project on track, and prevents stakeholder apathy.
  • Data problems: This is a problem I’m particularly familiar with. Quality, consistency, and incompleteness of data is frequently a major problem (A PDF is not a geophysical survey). If there is not enough data, a data scientist should reserve the right to ask for more data. And data collection and data wrangling is often going to be a large part of the job.
  • Lack of the right data science “team”: Even within pure data science teams, you are rarely going to find one person that does everything. There are data engineers, data scientists, data analysists, with experience in Exploratory Data Analysis, Statistics, Coding, Feature Engineering, Visualization, and storytelling. This on top of the absolutely essential domain knowledge that the geologists can provide. Finding that unicorn can also set you up for a failed project should that person become unavailable in the middle of your project.
  • Overly complex models: As often the case, keeping it simple can often lead to better results.
  • Over-promising: Particularly with the increased interest in this area of research, keeping expectations reasonable is important. Often improvements don’t occur right away as each project requires it’s own solutions and refinements as time goes on.

That’s it for now, but I’ll post again shortly about a great new tool for geologists that requires no coding-savvy at all… Orange!

Tags: ,

Leave a Reply