##
REU 2007

My project deals with DNA looping; specifically, I am trying to
devise refined potential functions that could
relatively predict how likely it is for a certain DNA molecule to
achieve a particular configuration in space. My
faculty adviser is Professor Wilma K.Olson. I
am using the
Olson group's data base of DNA step parameters obtained from the
National Protein Data Base, compiled by graduate
student Guohui Zheng.

The data base catalogues the measurements of thousands of base-pair
.steps. in protein-DNA complexes; a .step.
refers to two adjacent Watson-Crick base pairs in a double-stranded
DNA molecule. Each pair in the step can be
approximated by a plane (called a .slab.), so to determine a step's
position in space six parameters are
required: three translational parameters (.shift., .slide., and
.rise.) and three rotational parameters (.tilt.,
.roll., and .twist.) that describe translations along and rotations
about the x, y, and z axes, respectively. A
local coordinate frame is imposed on each step with origin centered
in the .middle plane. of the step between the
slabs.

Ten empirically-derived potential functions have already been
created. The .potential. of a step with a certain
set of the six step parameters is defined as the Mahalanobis
distance from that six-dimensional point to the mean
of the data points corresponding to that step type. This method is
ideal for data sets that are unimodal with a
normal distribution, but some of the step types exhibit bimodality;
I am working with various clustering
techniques such as hierarchical and K-means algorithms, to classify
the data and devise potential functions that
more accurately assess the data distribution.