DIMACS
DIMACS REU 2014

General Information

me
Student: Eric Brugel
Office: Hill 270
School: Rutgers University
E-mail: brugel18@gmail.com
Project: Baysian inference on SNP arrays.

Project Description

Create an algorithm using bayesian inference to analyze SNP arrays.


Weekly Log

Week 1:
This week I choose a topic and began reading background material. I also prepared the first presentation.
Week 2:
I continued reading background material to get an overview of the current algorithms that analyze SNP arrays. I searched for ways to get the raw data from .cel and .cdf files with R. I also looked into the graphlab api as a possible platform to implement our gibbs sampling algorithm.
Week 3:
This week I learned the basics of R and the bioconductor packages affy and affxparser to read the raw data files. Once the raw data was extracted from the .cel and .cdf files we joined the data with an annotation file and create a csv. Then I created a script to make various plots. After plotting the data there was discrepancies between our intensity plots and the plots in the paper ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations. After further investigation we found that bioconductor was not outputting some values perhaps as a preprocessing step.
Week 4:
This week we used affemetrix power tools instead of the bioconductor package to extract the data because of the difficulties last week in extracting data without preprocessing and the script wasnt efficent. I also had to remake the plotting script since the data was in a different format. The plots matched what we expected and we began discussing the model we are going to use to analyze the data and its implementation. I also continued to do background reading.

Presentations


Additional Information