Procedures of Exploratory Factor Analysis

Let's work our way up the right-hand side of the "SEM Pyramid of Success," examining how the Pearson correlation gives rise to exploratory factor analysis (EFA).

Starting out with a fairly large set of variables (usually single items), EFA will arrange the variables into subsets, where the variables within each subset are strongly correlated with each other. These subsets are organized along axes (the plural of "axis," not the "axe" like a hatchet).

You could have a one-factor (one-dimensional) solution, in which case all the variables will be capable of being located along a single line (e.g., across from left to right, with low scores to the left and high scores to the right). Or there could be a two-factor (two-dimensional) solution, where the axes are across and up-and-down. Three-factor (three-dimensional) solutions are harder to describe verbally, so let's look at a picture. These examples hold only as long as the axes are orthogonal (at 90-degree angles) to each other (which denotes completely uncorrelated factors), an issue to which we'll return. Solutions can also exceed three factors, but we cannot visualize four spatial dimensions (at least I can't).

In conducting factor analyses with a program such as SPSS, there are three main steps, at each of which a decision has to be made:

(1) One must first decide what extraction method to use (i.e., how to "pull out" the dimensions). The two best-known approaches are Principal Axis Factoring (PAF; also known as common factor analysis) and Principal Components Analysis (PCA). There's only one difference, computationally, between PAF and PCA, as described in this document, yet some authors portray the two techniques as being very different (further, PCA is technically not a form of factor analysis, but many researchers treat it as such).

(ADDED 9/11/18). This EFA tutorial from Columbia University's Mailman School of Public Health provides an intuitive illustration of the distinction between PAF and PCA. As shown in Figure 4 of the document, there are three potential sources of variation on a variable (or, more loosely, three reasons why someone obtains his or her total score on a variable). Let's use an example from the music-liking items in our SPSS practice dataset. Each of the 11 items lists a music style (e.g., big band, bluegrass, classical, jazz) and asks the respondent how much he/she likes it. Let's look specifically at liking for classical music. Someone's liking score for classical music will emerge from some combination of: (a) his or her liking of musical in general (corresponding to "common variance" in Figure 4 of the Columbia document); (b) reasons the person likes classical music that don't pertain to other musical styles such as jazz, blues, etc. (e.g., he or she studied great composers in European history, corresponding to unique or "specific variance" in Figure 4); and (c) any kind of random measurement error such as the person misunderstanding the survey item or accidentally selecting an unintended answer choice (corresponding to "error variance" in Figure 4). As shown by the faint red and purple ovals in Figure 4, PCA seeks to explain variance from all three boxes, whereas PAF only seeks to explain common variance (the first box). Hence, PAF begins with R-squares from a series of multiple-regression analyses predicting liking for each style of music, one at a time, from the remaining styles of music. Doing so reveals the amount of variance common to the music styles. On the other hand, PCA takes it as a given that it is "trying" to explain 100% of the variance in all of the variables.   

(2) Second, one must decide how many factors to retain. There is no absolute, definitive answer to this question. There are various tests, including the Kaiser Criterion (how many factors or components have eigenvalues greater than or equal to 1.00) and Scree Test (an "elbow curve," where one looks for drop-off in the sizes of the eigenvalues).

The book Does Measurement Measure Up?, by John Henshaw, addresses the indeterminacy of factor analysis in the context of intelligence testing as follows: "Statistical analyses of intelligence test data... have been performed for a long time. Given the same set of data, one can make a convincing, statistically sound argument for a single, overriding intelligence (sometimes called the g factor) or an equally sound argument for multiple intelligences. In Frames of Mind, Howard Gardner argues that 'when it comes to the interpretation of intelligence testing, we are faced with an issue of taste or preference rather than one on which scientific closure is likely to be reached' " (p. 95).

(3) The axes from the original solution will not necessarily come close to sets of data points (loosely speaking, it's like the best-fitting line in a correlational plot). The axes can be rotated to put them into better alignment with the data points. The third decision, therefore, involves the choice of rotation method. Two classes of rotation methods are orthogonal (as described above) and oblique (in which the axes are free to intersect at other than 90-degree angles, which allows the factors to be correlated with each other). Mathworks has a web document on factor rotation, including a nice color-coded depiction of orthogonal and oblique rotation. (As of January 2015, the graphics do not show up in the Mathworks document; however, I had previously saved a copy of the factor-rotation diagram, which I reproduce below.)


 From Mathworks, Factor Analysis

***

The particular combination of Principal Components Analysis for extraction, the Kaiser Criterion to determine the number of factors, and orthogonal rotation (specifically one called Varimax) is known as the "Little Jiffy" routine, presumably because it works quickly. I've always been a Little Jiffy guy myself (and have written a song about it, below), but in recent years, Little Jiffy has been criticized, both collectively and in terms of its individual steps.

An article by K.J. Preacher and R.C. MacCallum (2003) entitled "Repairing Tom Swift’s Electric Factor Analysis Machine" (explanation of "Tom Swift" reference) gives the following pieces of advice (shown in italics, with my comments inserted in between):

Three recommendations are made regarding the use of exploratory techniques like EFA and PCA. First, it is strongly recommended that PCA be avoided unless the researcher is specifically interested in data reduction... If the researcher wishes to identify factors that account for correlations among [measured variables], it is generally more appropriate to use EFA than PCA...

Another article we'll discuss (Russell, 2002, Personality and Social Psychology Bulletin) concurs that PAF is preferable to PCA, although it acknowledges that the solutions produced by the two extraction techniques are sometimes very similar. Also, data reduction (i.e., wanting to present results in terms of, say, three factor-based subscales instead of 30 original items) seems to be a respectable goal, for which PCA appears appropriate.

Second, it is recommended that a combination of criteria be used to determine the appropriate number of factors to retain... Use of the Kaiser criterion as the sole decision rule should be avoided altogether, although this criterion may be used as one piece of information in conjunction with other means of determining the number of factors to retain.

I concur with this, and Russell's recommendation seems consistent with this.

Third, it is recommended that the mechanical use of orthogonal varimax rotation be avoided... The use of orthogonal rotation methods, in general, is rarely defensible because factors are rarely if ever uncorrelated in empirical studies. Rather, researchers should use oblique rotation methods.

As we'll see, Russell has some interesting suggestions in this area.

One final area, discussed in the Russell article, concerns how to create subscales or indices based on your factor analysis. Knowing that certain items align well with a particular factor (i.e., having high factor loadings), we can either multiply each item by its factor loading before summing the items (hypothetically, e.g., [.35 X Item 1] + [.42 X Item 2] + [.50 X Item 3].......) or just add the items up with equal (or "unit") weighting (Item 1 + Item 2 + Item 3). Russell recommends the latter. It should be noted that, if one obtains a varimax-rotated solution, the newly created subscales will only have zero correlation (independence or orthogonality) with each other if the items are weighted by exact factor scores in creating the subscales.

I have created a diagram to explicate factor scoring in greater detail. Here's another perspective on the issue of factor scores (see the heading "Factor Scores/Scale Scores" when the new page opens).

Here's the Little Jiffy song.

Little Jiffy
Lyrics by Alan Reifman
(May be sung to the tune of “Desperado,” Frey/Henley)

“Little Jiffy,” you know your status is iffy,
Some top statisticians, think that you’re no good,
You are so simple, the users just take the defaults,
And thus halts the process, of finding structure,

(Bridge)
You are a three-stage, procedure,
For making, your data concise,
And upon some simple guidelines, you do rest,

But for each step, in the routine,
Little Jiffy’s not so precise,
And the experts say, other choices are best…

For extraction, you use Principal Components,
While all your opponents, advocate P-A-F,
You use Kaiser’s test, to tell the number of factors,
While all your detractors, support the Scree test,

On your behalf, some researchers claim,
Components and factors, yield almost the same,
But computers give, several more options today,
With “Little Jiffy,” you don’t have to stay,
You can experiment, with different ways…

Varimax is used, to implement your rotation,
There’s no correlation, among your axes,
If one goes oblique, like critics urge that you ought to,
The items you brought, ooh (yes, the items that you brought, ooh),
The items you brought, ooh... (Pause)
Will fall close to the lines…

***

Finally, here are some references for students wishing to pursue EFA in greater detail.

Conway, J.M., & Huffcutt, A.I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6, 147-168.

Henson, R.K., & Roberts, J.K. (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measurement, 66, 393-416.

Path Analysis: Tracing Rules and Re-Deriving Correlation Coefficients

Below are two photos from the recent lectures on path analysis (thanks again to Sothy). The first photo is more conceptual, on how to identify the relevant sequences for multiplying path coefficients.


One of our 2015 students (BL), came up with a great analogy to understand correlated cause (lower-right corner of photo). W and Y can be thought of as the two parents in a family, and X and Z as the two children. If parent W influences child X, and parent Y influences child Z, then X and Z will receive some of the same influence because the two parents' childrearing practices are likely correlated (path q).

The second photo shows an actual example.


Because the model was saturated (every possible linkage that could have been included, was included), the correlation between Age and Income implied by the tracings in the model is identical (within rounding) to the known, input correlation between Age and Income.

MARCH 2013 UPDATE: This manuscript provides an overview of tracing rules (see Section 2.4).

MAY 2009 UPDATE: A newly released Australian study indeed shows a positive relationship between height and earnings.

Least Squares Principle

(Updated March 1, 2016)

Here is a photo from a previous class session, showing what I wrote on the board regarding the least-squares principle. In the illustration below, we're using SAT scores (x-axis) to predict students' first-year college grades (y-axis).  The Wikipedia's entries on least-squares and residuals may also be helpful.


I continue to find websites that offer useful explanations of these concepts:

Multiple Regression, Standardized/Unstandardized Coefficients

Today, we’ll go over the left side of the SEM Pyramid of Success, from the correlation to multiple regression to path analysis, up to the brink of SEM. An important distinction applicable to all of these techniques is between standardized and unstandardized relationships.

The distinction is probably best illustrated, at this point, with multiple regression. Just to remind everyone, in multiple regression we test how well a number of predictor (independent) variables relate to an outcome (dependent) variable. For example, we could use (a) educational attainment, (b) experience on the job, and (c) performance evaluation as predictors of past-year earnings (outcome). The relationship between each predictor and earnings is computed holding constant the effect of the other predictors (e.g., assuming all respondents were equal in their educational attainment and experience on the job, are higher performance evaluations associated with higher earnings?).

[ADDED December 28, 2007: The following PowerPoint slide show provides an extensive review of multiple regression. I noticed an apparent error on the slide entitled, "The Overall Test...," occurring with slides numbered in the high teens to 20, so for discussion of null hypotheses, you should focus on the slide, with numbering in the 40's, that's titled "Test for Individual Terms."]

For each predictor variable in a multiple-regression analysis, the output will provide an unstandardized regression coefficient (usually depicted with the letter B) and a standardized coefficient (usually depicted with the Greek letter Beta, β). Unstandardized results are probably more straightforward to understand, so let’s discuss them first.

Unstandardized relationships are expressed in terms of the variables' original, raw units. Educational attainment would probably be measured in years of education, whereas earnings would probably be measured in dollars. Thus, the unstandardized (B) coefficient for educational attainment could be something like 2000. This would tell us that, for each increment of one raw unit (year) of education, projected earnings would increase by 2000 raw units of income (dollars).

Standardized results represent what happens after all of the variables (predictors and outcome) have initially been converted into z-scores (formula). As you'll recall from your earlier stat classes, z scores convey information in standard-deviation (SD) units; for example, someone who has a z score of +1 on a variable is one SD above the sample mean on that variable (to review SD's, see here and here). If we were measuring respondents' number of miles run per week in an athlete sample, the mean might be, say, 50 miles/week, with an SD of 10. Therefore, an athlete who ran 60 miles/week in training would be at z = +1, or 1 SD above the mean.

Another nice feature of z scores is that, if the data are distributed normally, you can relate them to a person's percentile ranking in the distribution. For example, someone with a z score of +1 on a given variable (84th percentile) is 34 percentile points ahead of someone who has a z score of 0 (50th percentile).

Going back to our example of predicting people's earnings, years of experience may have a standardized regression coefficient (β) of .40. This finding would tell us that, for each increment of one SD of years experience, projected earnings would increase by .40 SD's of income.

To recap to this point:

Unstandardized relationships say that for a one-raw-unit increment on a predictor, the outcome variable increases (or if B is negative, decreases) by a number of its raw units corresponding to what the B coefficient is.

Standardized relationships say that for a one-standard deviation increment on a predictor, the outcome variable increases (or decreases) by some number of SD's corresponding to what the β coefficient is.

When should you use the unstandardized solution and when should you use the standardized one? My own view is as follows: If the raw units are generally familiar (e.g., years, dollars, inches, miles, pounds), I'd go with the unstandardized solution. However, if the variables' raw units are not well-known in everyday usage (e.g., on a marital-satisfaction inventory with a maximum score of 50, what does one point really convey?), then I'd use the standardized solution.

This framework for unstandardized and standardardized solutions applies not only to multiple regression, but also to path analysis and SEM. What is not widely known is that the Pearson r, itself, is a statistic based on standardized variables. The correlation has an unstandardized "cousin," the covariance. The formula for converting between correlations and covariances, which is pretty simple, is shown in this document.

Update (1/19/07): Discussion during our previous class brought out an additional point that I didn't mention in my above write-up (thanks to Kristina).

Within the same regression equation, the different predictor variables' unstandardized B coefficients are not directly comparable to each other, because the raw units for each are (usually) different. In other words, the largest B coefficient will not necessarily be the most significant, as it must be judged in connection with its standard error (B/SE = t, which is used to test for statistical significance).

On the other hand, with standardized analyses, all variables have been converted to a common metric, namely standard-deviation (z-score) units, so the β coefficients can meaningfully be compared in magnitude. In this case, whichever predictor variable has the largest β (in absolute value) can be said to have the most potent relationship to the dependent variable, and this predictor will also have the greatest significance (smallest p value).

Added 4/12/15: Phil Ender has a concise overview of key issues in multiple regression.

SEM Pyramid of Success

(Updated January 21, 2017)

What we'll be covering in the first few class sessions is how SEM represents a culmination of earlier statistical techniques, building from the very basic Pearson's correlation coefficient (r) on up through more elaborate techniques, finally ending at SEM.

John Wooden, who coached men's basketball at UCLA from 1948-1975, winning 10 NCAA championships and garnering accolades for his broader teachings, developed a "Pyramid of Success," which is a guide not only for athletics, but for living a good all-around life.

I grew up a huge UCLA sports fan and went there for undergraduate college (1980-1984). Inspired as I was by Coach Wooden (who died in 2010, just a few months short of his 100th birthday), I created what I call the "Structural Equation Modeling Pyramid of Success," which is shown below.



















As we'll see later in the course, as complex as some of our structural equation models can get, the results can always be traced back to simple Pearson correlations.