Archive for the ‘programming’ Category

Programming Environments and an intro to Anaconda

Most of my experience in programming so far has been on my own, or at least as the lone programmer on site. Now I am about to start working more with people that are at (or more likely beyond) my capabilities. Part of the start of this journey is going to require everyone working in the same programming environments.

Messy desks are not programming environments
No, this is not what I mean when I say programming environments.

Programming in some ways, can be just like building scientific knowledge. Everything makes sense when hypothesis are built on theories, which are built on laws. But everyone has to work and agree on the same basic concepts. Programming environments are similar. If you want to collaborate with someone who is using Python 3.6, it is best you also work in 3.6. At least for that project. If you decide to contribute by introducing Python 3.8 code, some libraries may not work correctly or cease functioning at all.

So I made a video, my first, to show people how to set up Python programming environments on their computer using the Anaconda distribution. It is a very popular repository where Python libraries/packages are maintained. The GUI used, Anaconda Navigator, is easy and intuitive. And best of all, the individual distribution is open sourced and free.

Here is a like to the video! I hope you use it and get programming!

Pause the video if you need time to catch up!

This project was also my first introduction into making a video. I’d like to try and make more, but as this is my first, my editing is abysmal. There’s about 20 things I would have done differently. I definitely see the advantage of good video editing software, and good audio recording gear. But, every journey always begins with the first step!

Posted in programming | No Comments »

DataCamp Python Projects

From Cholera to Kardashians

Here is a link to some of the various projects I’ve done with DataCamp. These are guided projects with a variety of topics, with some of my favorites being, extracting stock sentiment from the news, predicting honey bees from bumble bees from images using deep learning, the discovery of the importance of handwashing, and recreating John Snow’s map of the Cholera outbreak in London 1854.

DataCamp Python Projects

Tags: ,
Posted in machine learning, programming | No Comments »

My First Jupyter Notebook!, and some code for Excel users

This is my first Jupyter Notebook, and I thought I’d do something very basic. Some very simple tools to help someone start using pandas DataFrames with Excel files. The file is a very, very, abbreviated Excel file of geochem data (some trace elements), from Nancy Normore. The data is a few samples of breccias from the Flin Flon area, with 5 trace elements, fragment types(mafic or felsic), areas, and UTMs. This data is purposefully a little messy (EDA is important), and I plan to do some demonstrations on how to clean data in Python in the future using this same dataset. If you know some basic Python skills (e.g. for loops, while, if statements), you’ll probably start to see some ways to streamline your workflow.

And just as I was finishing this, DataCamp posted a webinar on the same subject. I love their stuff and highly suggest a subscription as intermediate level training.

With all that said I’m going to have to switch gears on subject matter here, as I have a Networks class to finish. However, if you’re curious about how to do something, feel free to ask! I may have a solution for you, or at least be able to find one.

The notebook you will want to open is PyForExcel.ipynb

Tags: , ,
Posted in programming | No Comments »

First off, I would like to reference and thank Michael Pyrcz, Associate Professor at the University of Texas at Austin. He has a number of lectures on YouTube which are a great resource in geostatistics and subsurface modelling. You can check them out here!

I’m sure it comes as no surprise to anyone, that I would say “Yes! A geologist should learn to code.” However, with this post I aim to de-stigmatize the fear of programming. First off, programming is nothing but building algorithms. And algorithms are nothing but processes and sets of rules used to solve problems. Being able to build these algorithms, is the heart of good programming, and you’ve probably been doing it for years without even realizing. Knowing the actual language itself is the easy part.

So how could I convince you that coding is a huge benefit to geological work? First off, coding is transparent. When you solve a problem programmatically, you are forced to show everyone exactly how you solved that problem. If there are any mistakes, they can be caught quickly. This should be a good thing.

Reproducibility is another benefit. If you run your code to solve a problem, and you compare your answer to someone else who runs their code to solve the same question, the answer should be the same. If it isn’t, finding out why could result in some interesting revelations.

Geology is a very data-driven science. We collect data, and analyze data, in order to understand earth processes. Being able to quantify things is very important. Programs run on numbers, and they do so much better than humans can. Generally speaking if computers are bad at something, humans are better. And if humans are bad at something, computers are better.

Not only that, but the heavy lifting has already been done! This is because of the many packages, libraries, and open source availability of code. Chances are, if you want to do something, someone else has already done it and has shared their code. Once you know how to code, you will probably realize you spend more of your time reading other peoples code, than writing your own.

Better communication between the geologist and the IT staff will allow each person to do their job better. When you bring domain knowledge to the table, along with techincal prowess, you are going to come up with a better result everytime. Even if you only have a basic understanding, you’ll know the capabilities and limits of coding, and you’ll be able to ask the right questions.

Sharing your code with others will allow them to make impacts on their projects in ways you may not have even imagined to use it. This type of altruism often pays itself back in the long run. The more people use your code, the better everyone is off, including you.

Being able to automate part of your job will help in two ways. First off, you can reduce the amount of time you take during your job doing the boring parts. Repetitive tasks can be done automatically leaving you more time to actually do geology! Secondly, it saves a lot of time. It may take you longer to code a workflow the first time, but it will pay for itself in the long run. Being more productive is always a good thing.

Most importantly, I promise you, when you finish your first block of code that does something you find very useful, you will feel like boundaries have been lifted. It empowers you, and you start to look at your computer as a lot more than a word processor.

I would suggest to anyone trying to pick up a language, to try Python first. Python is a very powerful scientific programming language with a lot of community and library support. It is also very intuitive for first time learners, and there are many very good beginner tutorials out there. If there is interest I will do another blog post on getting started in Python from a geological perspective. R is another language to consider. It is less intuitive than Python but very powerful in statistical methods. I would suggest sticking with one, and learning it well. When you find out you need to learn a new tool, library, or completely new language, that’s the time to try to acquire those skills. There is no doubt either, that basic Python/R skills will look good on a resume.

Tags:
Posted in programming | No Comments »