About Me
Email: ihorng@sas.upenn.edu
Home Institution: University of Pennsylvania
Project: Identifying Motifs in Brain Connectome Data
Mentor:
Collaborator :
Email: ihorng@sas.upenn.edu
Home Institution: University of Pennsylvania
Project: Identifying Motifs in Brain Connectome Data
Mentor:
I enjoyed meeting the other students at orientation and other events. I met my fellow REU student collaborator, Liron Karpati, in person for the first time. We also met with our mentor, Dr. Jie Gao to discuss interesting problems and potential avenues to explore for our project, including social networks, and network analysis, differential privacy, algorithms for scheduling and planning, and clustering. We talked about brain connectome data and how topological data analysis might be applied. We also met with the graduate students in Dr. Gao’s team to brainstorm potential questions that might be interesting to explore. When analyzing brain connectome data for global features, the Szemeredi regularity lemma might be useful because it would help us partition a large graph into a bounded number of parts; this would allow a large graph to be viewed as many smaller random-looking graphs. Ultimately, this approach would lead our project to the realm of graph theory. Additionally, we began working on our slides for the introductory presentation that all REU students will be giving on Monday. We decided to focus on identifying predictive features in brain connectome data. Specifically, we want to explore two avenues: 1) local connectome features using motifs and 2) global connectome features using persistent homology.
On Monday, we gave our
intro presentation, which are also listed in the "Projects" section,
to the rest of the REU program, and our 4-minute presentation went well.
On Tuesday, we attended a interesting seminar by Mikhail Khovanov on “Regular languages and cobordisms of decorated manifolds.”
Throughout the rest of the week, I continued reading through papers and sources to gain a better understanding of
barcodes and persistent homology, specifically on how topological data analysis (TDA) can be applied in analyzing
brain connectome data. A few notable papers I read this week included Robert Ghrist's "Barcoes: The persistent topology of Data" [1],
"Brain network motifs are markers of loss and recovery of consciousness" [2],
"Topological data analysis of functional MRI connectivity in time and space domains"[3],
and "Topological data analysis of Human Brain Networks through order statistics" [4].
This week, we were also given the resting brain data that we will be using in our analysis.
The files to visualize the graph networks were given in a .gpickle, so we set up a github
repository and explored Jupyter Notebook in order to visualize the data. During a meeting
with our mentor, Dr. Gao gave a short lecture about persistent homology and why it is used.
She reviewed simplicial complexes, filtrations, homology, barcodes, and how to interpret barcodes.
From this, we saw how the barcodes might be used to give us an idea about the global structure of
the brain if we are able to find a filtration that will give us meaningful barcodes. It was suggested
that “Introduction to Persistent Homology” by Ziga Virk might be a good textbook to read to review
ideas from persistent homology and TDA. Reading papers this week, I found that one of the authors
used a software called javaplex to calculate barcodes and calculate other features in TDA, so I am
excited to start exploring the software. At our mentor meeting, we also discussed motifs and how
following the methodology given in [2] with our data could be a good starting point to better
understand brain network motifs. It was suggested that the book “Fundamentals of Brain Network
Analysis” by Fornito, Zalesky, and Bullmore might be a good reference for our project.
I started exploring the Javaplex software to become more comfortable with its features. After testing the software with command prompt and Eclipse, I found that Matlab was the better alternative. I downloaded Matlab and followed the directions in javaplex_tutorial. pdf to start playing around with the different commands and functions that the software provides. I spent the rest of the week completing the exercises in javaplex_tutorial.pdf using Matlab and reading chapters from the Persistent Homology book that Dr. Gao gave me in order to refresh my knowledge of persistent homology. After looking through all the examples in javaplex_tutorial.pdf, I started to code some of my own examples using the data we were given. We currently have a 1088x1088 correlation matrix, which details the distances between points/vertices. I was able to determine a set of points that satisfy the distances specified in the matrix, but this meant embedding the points in a 2d space, so I needed to determine a way to visualize our points in an abstract space. Since the goal is to create barcodes using the correlation matrix, I also thought about how to code the filtration that we want to use. With Dr. Gao’s advice, we came up with a high-level idea, and I’m working on trying to form a more concrete code to describe this process.
I explored the data types in javaplex, such as stream, in order to better understand how I can implement my filtration. Instead of using the standard Vietoris-Rips filtration, we hope to use the idea of clique simplices instead. Once the third 1-simplex (edge) has been connected to two other connected 1-simplices, then a solid 2-simplex (filled in triangle) should form. Since it is harder to visualize simplices above dimension 3, we will test the code on simplices of dimension 0,1, and 2 first. I coded a filtration by first creating a stream and then inputting the relevant vertices and edges, and I tested this on a small part of our correlation matrix. Once I could confirm that my code was properly creating simplices at the correct filtration values, I gradually tested it on larger parts of our correlation matrix. This resulted in some interesting barcodes for smaller sized matrices, but the code gave an OutOfMemoryError when I tried to use it for a matrix greater than 100x100 even though I already specified the largest heap space for Matlab. At our meeting with Dr. Gao, we discussed this error and thought of ways to make the code more efficient, such as including a threshold or sorting the correlation values.
On Monday, we attended a meeting with Dr. Stamoulis and her team of researchers who are also working with our ABCD dataset. We discussed papers and briefly presented our research projects. I continued working on the code for the filtration in order to make it more efficient to run on larger matrices. I created functions to use a threshold on the matrix and store cell values with their positions in arrays that can be easily accessed. After a lot of trial and error testing the code on small data sets to ensure that it was working properly, I realized that I also needed to update my filtration definitions for how the 2-simplices (filled in triangles) are formed. This was very successful, and I was able to create barcodes for the 500x500 matrix, but memory errors appeared again when I used a larger sized matrix. So, I tested different variations of my function to see if the run time could be reduced. At our meeting with Dr. Gao, she suggested another algorithm for forming the 2-simplicies in my code, and we thought about approaches to looking at higher dimensional simplices. Additionally, I created a function in order to visualize the simplicies being created as the filtration value increased, and this allowed us to see the edges forming, being connected, and forming triangles and larger shapes as more edges were added.
I continued rewriting the code to create 2-simplicies more efficiently and tested it on smaller matrices and then the full 1088x1088 matrix. While this was more efficient, changing the filtration value allowed the code to finally run and produce barcodes based on the full matrix. I continued reading papers to see statistical methods that others have used in analyzing barcodes, such as area under the curve, and understanding what loops in the functional brain network might mean. Additionally, I explored another software, Gudhi. After downloading anaconda and playing around with Google Colab and Jupiter Lab, I generated barcodes using the Vietoris Rips Filtration on our data.
Liron and I attended Dr. Stamoulis’s reading group and learned more about methods for analyzing hub networks. I created randomized networks that preserve edge weight, degree, and strength distributions. Barcodes were generated from the random graphs in order to compare to the barcodes generated from the data in our study. At the meeting with Professor Gao, we presented our observation that some participants’ brain networks seem to differ from others, and information from our barcodes and path analysis both support this finding. After downloading MicroGL and BrainNet Viewer, we also met with Callie from Dr. Catherine’s team in order to discuss how to visualize brain connectomes. On Thursday, everyone went on a field trip to Nokia Bell Labs. It was a very fun trip, and we even got to eat apples from a descendent of Newton’s apple tree!
Liron and I worked on our presentation. I discovered a paper (Siezmore et al.) which had a very interesting representative cycle visual that showed the most persistent features of a barcode using a mouse connectome. They used a program called Julia to compute this, so I downloaded Julia to figure out how to create the representative cycles. At our meeting with Prof. Gao, I presented the Ricci Curvature Plots I coded using the open source brain data on NeuroData. We discussed using Ricci Curvature as the filtration values for barcodes, which might reveal further patterns in our data. However, our brain connectome data was too large, so I decided that using the C. Elegans neural network would be a good starting point. With Liron, we calculated the curvature values for the C. Elegans and processed the data into a proper adjacency matrix. Using this matrix, I then generated the dimension 0 and 1 barcodes based on the Ricci Curvature values. We had another meeting with Prof. Gao, Prof. Luo, and their graduate students where we gave our presentation and took in their edits and suggestions. On Friday, we presented our slides! It went well, and it was so interesting to hear all the projects that the other REU students have been working on all summer! We were sad to see some of the REU students leave to go to Prague but we wished them the best.
On Monday, we met with Dr. Stamoulis’s team and presented our slides. We heard more about their projects, and they also shared their feedback and thoughts about our project. I continued checking my Matlab code to make sure that it was accurately generating the barcodes. For the rest of the week, I worked on writing my final report.
I am extremely grateful to Dr. Jie Gao for her mentorship and useful discussions. This work was carried out while I was a participant in the 2022 DIMACS REU program at Rutgers University, supported by NSF HDR TRIPODS award CCF-1934924.