Basic Information:

Name: Melissa Mitchell

My School: University of Detroit-Mercy

Project Name: Author Identification

Faculty Advisor: Paul Kantor

Partner: Jordanna Chord


Project Description:

My project is on author identification, mainly the KDD (knowledge, discovery, dissemination) challenge. Identifying authors could be done by studying the style of writing. However that will not be helpful in this case for three main reasons. First, most of the papers in the database have more than one author. Secondly, the database does not have the complete paper - it just has the abstract. Finally, the authors are most likely going to only write about one area of the biological sciences so technical words pertaining to that area will provide useful attribution information.

For this project we are given a database (BioBase) with the following information: title of paper, keywords, names of author(s), addresses, and an abstract. For the first part of the project, we need to determine if one person is the same as another. For instance, Li is a common Chinese name that appears in the database. We need to differeniate between Y.Li-3 and not Y.Li-3. We will look at information that may separate one author from another such as subject, collaborations, or interests. We will apply a tool called BBR (Bayesian Binary Regression) software to this problem.

Progress:

Check out our progress here

References:

Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification, second edition. John Wiley and Sons, Inc. 2001.

Useful Links:

2005 REU Participant Calendar See what we are doing every week!

BBR information Learn more about Bayesian Logistic Regression software