||Bayesian Statistical Modeling: Gaussian Process Regression Applied to Sea-Level Data
We will use Gaussian process regression, a type of machine-learning algorithm, to model relative sea level. (A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. So, a Gaussian process is a distribution over functions. In GP regression, rather than restricting by function class (for example linear or quadratic), we give prior probabilities to every possible function.) Our goal is to use GP regression to emulate existing models based on scientific processes, so that our process runs in shorter time and is consistent with the given output as well as new sea level data.
- Week 1:
- I met with one of my mentors, Erica Ashe, and discussed the direction of my project. She gave me a data set to look at and a few papers detailing some of the methods we will be using. I began the papers (also looking into some of the papers referenced within them), with a focus on learning the details of the model we will be emulating and getting a good idea of some of the underlying scientific concepts. I also looked into RCPs (Representative Concentration Pathways), which are used in making future sea level predictions and ran some preliminary analysis on the data.
- Week 2:
- I continued to look at the data, read more background on the
physical processes involved in sea level change, and learned more
about Gaussian process regression, especially as it is used in the R
Stilt package. On Friday, after meeting with Erica and Dr. Kopp on
Thursday afternoon, I began working on the emulator.
- Week 3:
- I continued to work on the emulator and step through the emulator
function to understand the math in the
stilt package. I met with my mentors as well as a few other people
in Dr. Kopp's lab to hear about what they are working on and present
what I have done so far. At the end of the week, I started looking
into different ways to visualize the emulator output and compare it
to our data.
- Week 4:
- The focus of this week was producing an informative visualization
and thinking about diagnostics to test the emulator. I also started
to look at a new, more expansive data set with predictions into the
future. Now that I have a better idea of what I want to do, I have
started generalizing my code as much as possible so that it can run
with any similar data set as input.
- Week 5:
- This week I ran emulators, produced contour plots, and looked at some
cross-validation plots for all of the data,
just to learn that it is better to do some of the time periods
separately! The next step will be to redo what I did this week on
separated data sets, and also produce time series plots for GMSL
according to the three future emissions scenarios.
- Week 6:
- I produced time series plots for many different combinations of
parameters, looked at how the standard deviation of the predictions varies over time and
parameter space, and looked at cross sections of my GMSL contour
plots. I started investigating some trends that we saw and thinking
about possible tranformations to remedy the discrepancies between the
predictions and the actual data. I also started working on my final
presentation and paper.
- Week 7:
- This week, I continued to investigate trends in the time series
after doing a matrix transformation of the predictive means and
covariance matrices to ensure that all GMSL predictions for the year 2000
were 0 (change is measured relative to this year). My mentors and I
want to figure out why these emulator predictions are not already 0,
so I started going through the STILT source code a little more
carefully to really understand what they are doing and think of
possible changes I could make to specialize it to our project. I also compiled my
final presentation and poster.
- Week 8:
- This week, we traveled to Prague! I have been working on my paper,
learning a lot from some combinatorics-related lectures by faculty
here at the university, and getting to experience some of the sights
and taste lots of the food here in Prague.
- Week 9:
- During the last week of the REU, I attended the Mathematics of
Jiri Matousek conference here in Prague and continued working on my
final paper. This was my first conference, and I really enjoyed the
experience; while many of the topics were completely new to me, and I
often felt like I didn't have all the background information I
needed to really understand all of the material, I still feel like I
got something out of attending. I am very grateful to the NSF,
DIMACS, and DIMITIA for making this experience possible!
Research funded by the National Science Foundation.