Basic Information
Project Title
Automated Entity Resolution
Research Student
Paul Bonamy
Home Institution
Lake Superior State University
Project Mentor
Dr. Paul Kantor
Project Description

Entity resolution is a common problem in everyday life. We are presented with a set of characteristics (shaggy brown hair, glasses, ugly tie, rumpled shirt) and wish to determine what we are looking at (hey, it's Bob from Accounting). The average person performs does this sort of thing literally hundreds of times a day, doing everything from identifying people to figuring out what’s being served for lunch, all without giving the problem much in the way of conscious thought.

This project relates specifically to a computerized form of entity resolution known as Author Identification. Given written documents by a set of authors, we wish to match each document to its author. Ideally, we wish to be able to identify an author’s work across a variety of topics, rather than being limited to a single field.

Author Identification has applications ranging from checking for plagiarism in student assignments to tracking conspirators across posting boards scattered around the Internet. Automated entity resolution has also been shown to be useful for tasks like identifying the gender of a writer or telling fiction from non-fiction.

Click here for a brief description of how automated entity resolution works

Progress

Check out the latest progress here

Resources

All of my scripts, as well as the Compass corpus, are available here

Useful Links
Bayesian Logistic Regression
Software for performing 1-of-2 classification (Material belongs to one of two authors)
Bayesian Multinomial Regression
Software for performing 1-of-k classification (Material belongs to one of k many authors)
2007 REU Calendar
The schedule of events for this summer's REU participants