DIMACS
DIMACS REU 2016

General Information

me
Student: Christina Williamson
Office: CoRE 450
School: Pomona College
E-mail: clw02013[at]mymail[dot]pomona[dot]edu
Project: Bayesian Statistical Modeling: Gaussian Process Regression Applied to Sea-Level Data
Mentors:

Project Description

We will use Gaussian process regression, a type of machine-learning algorithm, to model relative sea level. (A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. So, a Gaussian process is a distribution over functions. In GP regression, rather than restricting by function class (for example linear or quadratic), we give prior probabilities to every possible function.) Our goal is to use GP regression to emulate existing models based on scientific processes, so that our process runs in shorter time and is consistent with the given output as well as new sea level data.


Weekly Log

Week 1:
I met with one of my mentors, Erica Ashe, and discussed the direction of my project. She gave me a data set to look at and a few papers detailing some of the methods we will be using. I began the papers (also looking into some of the papers referenced within them), with a focus on learning the details of the model we will be emulating and getting a good idea of some of the underlying scientific concepts. I also looked into RCPs (Representative Concentration Pathways), which are used in making future sea level predictions and ran some preliminary analysis on the data.
Week 2:
I continued to look at the data, read more background on the physical processes involved in sea level change, and learned more about Gaussian process regression, especially as it is used in the R Stilt package. On Friday, after meeting with Erica and Dr. Kopp on Thursday afternoon, I began working on the emulator.
Week 3:
I continued to work on the emulator and step through the emulator function to understand the math in the stilt package. I met with my mentors as well as a few other people in Dr. Kopp's lab to hear about what they are working on and present what I have done so far. At the end of the week, I started looking into different ways to visualize the emulator output and compare it to our data.
Week 4:
The focus of this week was producing an informative visualization and thinking about diagnostics to test the emulator. I also started to look at a new, more expansive data set with predictions into the future. Now that I have a better idea of what I want to do, I have started generalizing my code as much as possible so that it can run with any similar data set as input.
Week 5:
This week I ran emulators, produced contour plots, and looked at some cross-validation plots for all of the data, just to learn that it is better to do some of the time periods separately! The next step will be to redo what I did this week on separated data sets, and also produce time series plots for GMSL according to the three future emissions scenarios.
Week 6:
I produced time series plots for many different combinations of parameters, looked at how the standard deviation of the predictions varies over time and parameter space, and looked at cross sections of my GMSL contour plots. I started investigating some trends that we saw and thinking about possible tranformations to remedy the discrepancies between the predictions and the actual data. I also started working on my final presentation and paper.
Week 7:
This week, I continued to investigate trends in the time series after doing a matrix transformation of the predictive means and covariance matrices to ensure that all GMSL predictions for the year 2000 were 0 (change is measured relative to this year). My mentors and I want to figure out why these emulator predictions are not already 0, so I started going through the STILT source code a little more carefully to really understand what they are doing and think of possible changes I could make to specialize it to our project. I also compiled my final presentation and poster.
Week 8:
This week, we traveled to Prague! I have been working on my paper, learning a lot from some combinatorics-related lectures by faculty here at the university, and getting to experience some of the sights and taste lots of the food here in Prague.
Week 9:
During the last week of the REU, I attended the Mathematics of Jiri Matousek conference here in Prague and continued working on my final paper. This was my first conference, and I really enjoyed the experience; while many of the topics were completely new to me, and I often felt like I didn't have all the background information I needed to really understand all of the material, I still feel like I got something out of attending. I am very grateful to the NSF, DIMACS, and DIMITIA for making this experience possible!

Presentations


Research funded by the National Science Foundation.