About Me / Program Description
My name is James Long. I am a rising senior majoring in mathematics and statistics at Columbia University. This summer, I am conducting statistical inference research through the DIMACS center at Rutgers University, New Brunswick. The program runs from June 4 to July 27, 2007. My research supervisor is statistics Professor John Kolassa. The research is funded by the NSF.

About the Project
Project Outline
I am conducting research in factor analysis. Specifically I am assessing the consequences of breaking certain model assumptions when using factor analysis. These model assumptions are frequently broken in practice and little research has been conducted into how breaking the assumptions may alter type I error rates and power. Depending on time and initial results, I may develop procedures to make factor analysis more robust to deviations in assumptions about the initial data.

More about Factor Analysis
Factor analysis seeks to explain the relationship between a large number of observed variables, usually called manifest variables, in a data set. With factor analysis, researchers can identify several factors, often called latent variables, that account for most of the correlation in the observed data. Maximum likelihood factor analysis assumes multivariate normal manifest variables and continuous latent variables. I am testing how type I error rates and power change when these assumptions are violated.

Progress / Results

Week 1:   I conducted background reading into the factor analysis model as well as other latent class models. I wrote a SAS macro to compute type I error rates for factor analysis with dichotomous manifest variables and no latent variables. The hypotheses are:
H0: No common factors
Ha: At least 1 common factor
The macro revealed that factor analysis computes correct type I error under for manifest variables with certain non-normal distributions (in this case bernoulli random variables). See below for macro.

Week 2:   I developed code to test for type II error (see below for macro) with the same hypotheses as above. More backgroup reading.

Week 3:   I improved both macros. The type I error macro is available here and the type II error macro is available here. Also after conducting more simulations with the type I error macro I noticed that type I error rates rose above the specified level as the number of manifest variables increased. This indicates that the factor analysis model may produce erroneous results when applied to certain sets of data from the latent class model. In week 4 I will investigate this phenomenon further.

Week 4:   I am now testing for type I error when there are two latent classes or 1 latent factor. The hypotheses are:
H0: No common factors
Ha: At least 1 common factor

Week 5:   I was concerned that the likelihood ratio statistic, even when generated by a data set that satisfies model assumptions, may not follow a chi-squared distribution. I found a critical values for the first hypotheses using monte carlo simulation. For the most part the MC critical values agree with the chi-squared distribution. An exception to this rule occurs with large numbers of manifest variables and small sample sizes. This is most likely what caused the inflated type I error rates in Week 3. I also downloaded latent class analysis procedure for sas from The Methodology Center at Penn State University. I will use this software to compare power between factor analysis and latent class analysis for data that comes from the latent class model.

Week 6:   Conducted power comparisons between latent class analysis (proc lca) and factor analysis (proc factor). Factor analysis hypothesis tests have roughly the same power as latent class analysis power. This result, combined with results above showing that type I error rates are close to nominal levels, indicates that researchers may not need to be too concerned about model assumptions requiring multivariate normality for manifest variables. I started writing up results.

Week 7:   Continue writing up results. I improved many of the macros. They no longer get random seeds from the clock so the results are now reproducible.

Week 8:   I ran more simulations for power and type I error. I worked on the paper. Professor Kolassa said we should submit it to the Journal of Multivariate Analysis. I will continue working on the paper through August.

Note: I am no longer updating this webpage. Please see http://www.columbia.edu/~jpl2116/factor_analysis.html for updates.

Last updated: August 16, 2007