DIMACS
DIMACS REU 2019

General Information

me
Student: Amin Fadel
Office: 434
School: Stockton University
E-mail: amin.fadel@rutgers.edu
Project: Investigating Sea Level Rise and Variability at Tide-Gauge Stations using Supervised Machine Learning

Project Description

In this project, I will be using Gaussian Process modeling to fit a temporal regression over data obtained from tide gauge stations. Gaussian Process modeling is a non-parametric form of statistical modeling which maps inputs to outputs. The advantage of using this method to model the data is that the resulting signal can be decomposed into seasonal, long-term, and interannual components. These decomposed signals will be used as a baseline for predictions of future sea level rise and flood risks.


Weekly Log

Week 1:
This week I arrived at Rutgers and started studying the theory behind the Gaussian Process, as well as research papers involving the earth science aspect of this project. I feel like I have already learned a lot, and I am looking forward to studying more and getting started on modeling.
Week 2:
This week I spent time reading a couple of articles. One of them was Global and Regional Sea Level Rise Scenarios for the United States. This article discusses several sea rise scenarios, ranging from "low" to "extreme" based on the severity of the projected sea rise. The other article is Geographic Variability of Sea-Level Change. This article explains how sea levels in localized areas can differ greatly from the global mean sea level, as well as the different processes that create these differences. Both of these articles provided me with more context for this project, such as how the model I make could be used alongside other models to create more complete predictions of climate change. Besides the articles, I have begun working with the tide-gauge data. So far I have been looking at the data for the Boston tide-gauge, and I have programmed a method that extracts the data from the text file and prepares it for use in my model. I have also begun experimenting with using different covariance functions for the Gaussian Process, but I will definitely need more work on this aspect, considering that finding the proper covariance function is one of the main problems of creating a Gaussian Process.
Week 3:
This week I continued to refine the covariance function. The covariance function I had previously been using was rather complicated, which in turn made interpreting the model more complicated. So, I started focusing on creating a simpler version that better represented the knowledge we already have about sea level variability, including a seasonal periodic component, and a linear trend component. This week I have also started working with a different set of data, which includes hourly measurements, as opposed to the monthly averages I was working with before. As of right now I am not using the hourly component, since the increase in input size would make training the model too computationally expensive. It may be interesting in the future, however, to look at a subset of the hourly data, and experiment with that.
Week 4:
This week I worked on reformatting the data files I have been working with. Previously, I had been scanning through a text file to extract the relavent data I needed, and then fitting the model to the data, all in one script. This was taking up too much time doing redundant work, so I created a script that iterated through all of the data files, extracted the data into a wrapper class, then serialized the data objects so they could be quickly loaded and used later on. Now I can easily run my model on any of the tide-gauge stations' data without having to waste time on formatting data. This will allow me to experiment with the other stations, and compare how the optimized hyperparameters differ between them.
Week 5:
This week I experimented some more with different covariance functions. In particular, I focused on different ways to represent the trend component. Initially, I used a dot-product covariance function for this purpose, which produces a distribution of straight lines. This worked relatively well, but an issue with using a straight line to represent the trend is that not all trends are perfectly linear. To fix this, I instead tried multiplying two dot-product together (to achieve a quadratic fit), using a radial basis function (which produces smooth curves), and mutiplying a radial basis function by a dot-product. Of these three methods, using the radial basis function by itself produced the best fits.
Week 6:
This week I worked on implementing a function that can decompose my model into its different components. One of the goals of this project was to analyze the different components of sea level variability, including the trend, the seasonal cycle, and the interanual components. By creating this function, it will allow me to take a model for any station and isolate the component that we are interested in looking at. For instance, if I am interested in looking at the trend component for a station, and I don't want to concern myself with the seasonal cycle, I can use this function to isolate and graph the trend by itself.
Week 7:
This week I implemented a function that calculates the trend at each tide gauge station. This works by first using the decomposition function to isolate the smooth trend prediction, which would then be used for calculating the trend. Because the smooth trend component is represented by a radial basis function and is not perfectly linear, some extra work needed to go into calculating the trend. This was done by essentially taking the derivative at equally-spaced samples throughout the prediction, and averaging over them all. Using this method, I was able to create a map of the stations' trends, giving a visual representation of sea level change throughout the U.S. coastlines.
Week 8:
This week I worked on creating a seasonal amplitude map. Like the trend map, this map uses the latitude and longitude of each station to map its location onto a plot of the United States. Instead of mapping trends, however, this map plots the seasonal amplitude, which is the difference between the lowest and highest sea levels of a seasonal cycle. This was done by first using the decomposition function to get the seasonal cycle prediction. Since this prediction is not influenced by the trend component, we can average over each month to get an average seasonal cycle, and then find the maximum and minimum values to get the amplitude.
Week 9:
This week was largely focused on writing my final paper for this project. In preparation for this, I worked on deciding which stations I would discuss in the paper, and reformatting some of my diagrams so that my predictions didn't rely on a datum. Besides this, I also created a table that includes all 121 stations, and shows their fitted hyperparameters.

Additional Information