||Climatology and Cluster Analysis: Self-Organizing Maps (SOMs)
This summer, my research project was done under the supervision of Dr. Benjamin Lintner
of the Department of Environmental Sciences and Natural Resources (ENR) and with the assistance of
graduate student, Max Pike. The focus of our project was to identify the causes of biases that are associated with global climate models (GCMs).
Additional objectives included improving comprhension of climate with a preferred focus on precipitation patterns in the Intertropical Convergence Zone (ITCZ) and South
Pacfic Convergence Zone (SPCZ) and how the
two could be related.
- Week 1:
- This week, the majority of my focus was upon expanding my background knowledge of SOMs. I looked at a couple of
YouTube videos that really helped break down the concept of what all goes into it. I'm also combing over the code that
Max gave me in an effort to better understand how exactly it works in Matlab. If all goes well, hopefully I'll be able to tinker
with the code a bit and examine some parameters of the SPCZ and ITCZ. This will help me get an idea of what I'd like to
focus on for my research project and the remainder of my summer. Also looking over some scientific journals concerning SOMs and the SOM toolbox
- Week 2:
- While examining, the code, I noticed in the comments that there is a reference to a scientific journal called "Performance Evaluation for Self-Organing Map for Feature Extraction" by JGR. I'm looking into it this week and it definitely
is shedding more light on how the parameters for the SOM code are chosen or set. Also the code mentions K-means clusters, which I looked at in comparison
to SOMs and it's given me a bit of persepctive on how such concepts are used in self-training environments to make sense of multivariable processes
and scale high-dimensional data down to lower dimensions. This proves to be especially helpful since climate data has so many attributes such as day, time, year, latitude and longitude coords, etc.
What I have learned so far: research is exactly that--research! To explore something, you first have to know what exactly you're looking at and how to look at it.
- Week 3:
- This week, I have been working more closely with the SOM toolbox for Matlab. So far everything seems to be going well. I am just tinkering with the code a bit to see what different parameters do and how it effects the output of the SOM. In the future, hopefully I will be able to use the SOM toolbox on another data set of particular interest.
- Below, I have posted some figures of output from an SOM analysis done on NASA's Tropical Rainfall Measurement Mission (TRMM) observational data. As you can see, the more map nodes that a user chooses to have for the output layer, the more detailed individual map nodes are.
- Week 4:
- This week, I was focused more on comparing TRMM obersvational data and model data from the CMIP5 ensemble to begin to determine how similar the two are. I'm expecting a bit of correlation between the two since modeled data attempts to simulate what occurs naturally, however I want to see where they differ as well. I'm hoping that differences between the two will yield more insight into how exactly global climate model (GCM) parameters interact with one another. This approach is a bit different than how I drafted it in my head to be. Initially, I thought from a logical perspective that a code for GCMs could just be cracked open and parameters could be manipulated but realistically it doesn't work that way. GCMs are much more complex in terms of functionality.
- Week 5:
- Half of the distance home! This week, I finally got the CMIP5 data up and running with the help of Max. For the most part, the figures look good however, there is an anomaly that occurs at about 10 degrees north to 20 degrees south (10 to -20) and at a longitudinal range of 220 degrees east to 270 degrees east (220 to 270). I have to double check and make sure these coordinates are accurate, I believe the SOM toolbox code manipulates their degree representation in its functionaility but I will consult with Max and find out. If so, I will change the aforementioned information. Anyyways, I have posted the 2x2 SOMs done on TRMM data and CMPI5(specifically the CMCC model)data. The CMCC model is one of the climate models that the CMIP5 ensemble uses.
Right-click on the image you would like to see and select "View Image" to see the full picture. Notice that the pink-purple regions on both maps represent areas where rainfall is more dense. The figure on the left is the TRMM data and on the right is the CMIP5 data. Additionally, note the anomaly that occurs at the aforementioned coordinates in the CMIP5 SOM. For the most part, these two figures do not look exactly alike but when it comes to portraying the ITCZ and SPCZ both do an accurate job (however, TRMM does a better job resolution-wise and anomaly-wise than CMIP5 due to the fact it is observational data, meaning observed data that actually occured). The ITCZ is the oval-like shape that occurs at about 10 degrees north to the equator (0 degrees). The SPCZ is the diagnolly-oriented shape tht that occurs around the equator to about 20 degrees south.
Realistically, you will not see these two shapes in nature. The ITCZ and SPCZ are two climate features that occur; they are large-scale rainbands that occur duing El Nino conditions in the southern hemisphere. El Nino conditions are signified by the warming of sea surface temperatures, which makes Max and I speculate that maybe the anomaly in the CMIP5 SOM is a result of the data trying to display some sort of feature or process related to El Nino. The next step for me will be to conduct an SOM analysis on the TRMM data and CMIP5 model data for the Southern Hemisphere winter (June through August). The two figures below represent the SOM analysis conducted on the years 1999 to 2005 for the Southern Hemisphere's summer (December through February).
- Week 6:
- More progress is made! Max has been working on being able to plot the difference between two map nodes from a given SOM and he finally got some results. This week I reproduced the same results, plotting the differences between SOM map nodes from a 2x2 map. Below, I have posted the results. The colors serve no significance in terms of the amount of precipitation but rather showcase the differences between the two map nodes. For the sake of the image size and length of this entry, I've only posted the SOM that contains all four map nodes and the comparison done between map nodes three (bottom left of the leftmost figure) and four (bottom right of the left figure). Notice how the figure to the right highlights the SPCZ feature for both map nodes and how they differ in orientation. Additionally, while the figure highlights some of the features that both maps nodes have, not all features for each node are shown.
- Week 7:
- There isn't entirely too much going on this week in terms of SOM research. However, Max has stumbled across a program called GOAT that is pretty good at taking TRMM data and plotting it. Additionally, it comes with a nice GUI that allows you to specify a weather variable you would like to examine (as opposed to having to directly edit the code (which is what we have been doing for the SOM toolbox). Currently, he's working on a way to reformat the model data so that he can read it into GOAT. Other than that, I have two oral presentations to give this week so I'm getting prepped for that.
- Week 8:
- Last week (this update is a week past due), our group generated figures to analyze the difference between the SOM analysis done on the TRMM and CMCC data sets. In order to do this, we generated a difference plot and a scatter plot thatexamines an SOM map node from each data set. We chose two SOM map nodes that visually looked similar and used the difference plot and scatterplot to analyze them from a qualitative and quantative perspective. Below are the figures produced for each. At the top is the difference plot, which shows the variances in precipitation density between the two SOM map nodes. The colors in this plot do not represent amounts of precipitation but rather which features belong to which SOM map node. The scatterplot on the bottom shows the lack of correlation between daily precipitation amounts from the TRMM and CMCC data sets. The line of best fit is oriented more towards the CMCC axis, due to its larger precipitation amounts that give it more weight.
- The two SOM analyses done on the TRMM data set (top) and CMCC data set (bottom). For each node, there are a percentage of days mapped to that node.
- The difference plot(left) and scatterplot(right) for that compares TRMM SOM map node 4 and CMCC SOM map node 3.