My project deals with DNA looping; specifically, I am trying to devise refined potential functions that could relatively predict how likely it is for a certain DNA molecule to achieve a particular configuration in space. My faculty adviser is Professor Wilma K.Olson. I am using the Olson group's data base of DNA step parameters obtained from the National Protein Data Base, compiled by graduate student Guohui Zheng.
The data base catalogues the measurements of thousands of base-pair .steps. in protein-DNA complexes; a .step. refers to two adjacent Watson-Crick base pairs in a double-stranded DNA molecule. Each pair in the step can be approximated by a plane (called a .slab.), so to determine a step's position in space six parameters are required: three translational parameters (.shift., .slide., and .rise.) and three rotational parameters (.tilt., .roll., and .twist.) that describe translations along and rotations about the x, y, and z axes, respectively. A local coordinate frame is imposed on each step with origin centered in the .middle plane. of the step between the slabs.
Ten empirically-derived potential functions have already been created. The .potential. of a step with a certain set of the six step parameters is defined as the Mahalanobis distance from that six-dimensional point to the mean of the data points corresponding to that step type. This method is ideal for data sets that are unimodal with a normal distribution, but some of the step types exhibit bimodality; I am working with various clustering techniques such as hierarchical and K-means algorithms, to classify the data and devise potential functions that more accurately assess the data distribution.