Data Science

Data Science is a growing field with different tasks and applications. Everyday more and more people are changing their career course and moving to this relatively new and exciting area. Here at the CREWES Data Science Initiative we are engaged on research and dissemination of what is new in the data science world.

With the CREWES Data Science Learning Labs, we focus on the learning steps to become a data scientist and how you can bring business value to your organization. The labs will focus on how a data science project is conducted, from data reading, through data cleaning and pre-processing, visualization, data transformation, machine learning modeling, and finishing with app development/deployment. Join us for bi-weekly webinars beginning July 2, 2020 to get access to codes and "cookbooks."

Lab 0: July 2, 2020, Noon (MST): Introduction to R and Shiny

Marcelo Guarido

In our first lab we will set out our goals, define a learning path, and introduce both the R programming language and the building of apps with the Shiny library.
Data Science Lab 0 (video)

Lab 1: July 16, 2020, Noon (MST): WTI crude oil price forecasting with the Facebook Prophet algorithm

Marcelo Guarido

In this lab, we will present a workflow in R to predict the WTI crude oil price that includes an automated API request from the Quandl database, as well as the univariate forecast algorithm Facebook Prophet. We will end the session with a demonstration of an app built in Shiny.

Data Science Lab 1 (video)
Data Science Lab 1 (zip)

Lab 2: July 30, 2020, Noon (MST): Fundamentals of R, Flexdashboard, and Shiny

Marcelo Guarido

Next on the CREWES Data Science Initiative online series of learning labs will expose you to the fundamentals of the Flexdashboard and Shiny libraries. We will start a new RMarkdown from scratch and show you how to create a functional application with HTML functionalities.

Data Science Lab 2 (video)
Data Science Lab 2 (zip)

Lab 3: August 13, 2020, Noon (MST): Introduction to HTML, CSS, and Chrome DevTools for Shiny Apps Layouts

Marcelo Guarido

For this lab, we will continue from where we stopped in Lab 2: Fundamentals of R, Flexdashboard, and Shiny for Data Science, when we built a Shiny App from scratch but without modifying its layout. Now, the next step to create a product to increase the business value of your organization is to edit the app's layout to something that has the "face" of your research group, company, or organization. This requires mild abilities in HTML, CSS, and a little help from the Chrome DevTools (this last one is not mandatory, but it is quite powerful). We are going to show you how to change and edit the app's fonts, colours, and behaviour by combining the tools cited before. By the end of the session, you will be able to easily read a Flexdashboard code, interpret all the CSS and HTML layouts, and to create your own app!!!

Data Science Lab 3 (video)
Data Science Lab 3 (zip)

Lab 4: August 27, 2020, Noon (MST): Natural Language Processing and Machine Learning to Classify Severe Injuries in the Oil and Gas Industry

Marcelo Guarido

For Learning Lab 4, we are going to use Natural Language Processing (NLP) methods, combined with machine learning algorithms, to classify severe injuries for the Oil and Gas industry from the accident report in the US. We are going to introduce you to some neat packages in R to process and prepare text data and, as a bonus, we are going to show how to use Python inside R with the library Reticulate!

Data Science Lab 4 (video)
Data Science Lab 4 (zip)

Lab 5: September 10, 2020, Noon (MST): Using Machine Learning for Lithology Classification from Wireline Logs

Marcelo Guarido

Facies classification is a common practice in the Oil and Gas industry, where rock types are interpreted as correlations between the wireline logs and core analysis logs. However, it is can be a long process and each interpreter has a different approach for the classification. The goal of an automated machine learning facies classification is to help the interpreters in their conclusions (not a replacement). We will go through the whole data science process for the facies classification: data cleaning, data analysis, data imputation, feature engineering, modeling, and interpretation. All in R.

Data Science Lab 5 (video)
Data Science Lab 5 (zip)

Lab 6: September 24, 2020, Noon (MST): Salt Identification in Seismic Sessions using Tensorflow for Deep Learning Solutions

Marcelo Guarido

For this lab, we will be presenting a deep learning solution for the TGS Salt Identification Challenge from the Kaggle website. We are going to demonstrate how to build an image segmentation model in Tensorflow 2 with the goal to classify each pixel in the seismic section as salt or no salt. For this session, we will be using the Google Colab system to run our notebook.

Data Science Lab 6 (video)
Data Science Lab 6 (zip)

Lab 7: October 8, 2020, Noon (MST): Unsupervised seismic facies classification using Python

Brian Russell

For this lab, we will be presenting clustering solutions for seismic facies classification.

Data Science Lab 7 (video)
Data Science Lab 7 (zip)

Lab 8: November 5, 2020, Noon (MST): Time Series Forecasting with SARIMA -Application to COVID-19 Pandemic Data

Marcelo Guarido

This lab will be a technical presentation and demonstration of the use of the R library Modeltime for time series forecasting, and all the theory behind the seasonal ARIMA (or SARIMA) model. We are living through the historic moment of the COVID-19 pandemic, so it actually makes sense for us to use our analytical skills to understand better the evolution of the pandemic and how to forecast it. For that, we will use the data from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.

Data Science Lab 8 (video)
Data Science Lab 8 (zip)

Lab 9: November 18, 2020, Noon (MST): Impact analysis in R - the effects of the COVID-19 pandemic to the oil industry

Marcelo Guarido

We will use the work we have done on forecasting and we will be analyzing the impact of the COVID-19 pandemic on the Oil & Gas industry in the US. We will work with the oil production and price data from before and during the pandemic period, and we will perform an impact analysis. In this lab, we will continue using R and the library Modeltime for time series forecasting and will see the applications of different algorithms, such as the ARIMA, Facebook Prophet, and XGBoost.

Data Science Lab 9 (video)
Data Science Lab 9 (zip)

Lab 10: To be announced (January 2021)