Project Description
Given a user query, search engines generally return a very sizeable collection of possible answers. Clustering has been proposed as a tool to partition the possible answer set into more manageable subsets of related results. There is no current agreement on the preferred mode of presentation of these clusters. Currently, most search engines display the set of results on almost a pure textual form. However, relatively recently we have witnessed some timid attempts to use some graphical representations. This study is a first step to elucidate when and why text appears to outperform graphics for certain fundamental clustering related tasks. To this end we designed three interfaces to display flat clusters of user queries. The interfaces are enhanced with mechanisms by which users provide feedback about the relevance of a cluster for a pre-specified input query. Subsequently, users are asked to provide a name for a given cluster that best describes the cluster contents. In this project, we will analyze the results obtained from a web user study that is planned to run from Feb to May 2008.

Presentation 1 (ppt)

Presentation 2 (ppt)

Project Log

Week 1: Met mentor, read background information of project.

Week 2:Organized the data for each query to make it possible to do the statistical analysis. Prepared for the first presentation on Friday, June 11.

Week 3:Collected the data recorded from the web page of 'Name That Cluster'. In order to do the tests on a per query data base, we grouped the queries that were evaluated with the three interfaces by different users, and got triples for 16 queries which were evaluated by different users with the 3 interfaces.

Week 4:Continued the data analysis on a per query data base, found the details in users' variety behaviors according to the length of the Exploration times, Evaluation times and Naming times they spent per interface, also a linear correlation between Exploration times and Evaluation times.

Week 5:Dealt with the outliers on a per cluster data base, and test for a linear correlation between ClusterFitRatings and Name Ratings.

Week 6:Tested to see if there were any relationships between ClusterFit/Name Ratings and Evaluation/Naming times under a per cluster data base.

Week 7:Prepared for the final presentation on Friday, July 16.

Week 8:Wrote the report.

Shuang Wu
About me:
My name is Shuang Wu
goes to the University Of Connecticut
majors in Applied Math
minors in Statistics

E-mail Address:


Name That Cluster: Text vs. Graphics