Hyeri Hong1 and Walter P Vispoel2*
Received: March 22, 2024; Published: April 04, 2024
*Corresponding author: Walter P Vispoel, University of Iowa, USA
DOI: 10.26717/BJSTR.2024.55.008778
In this brief article, we summarize the characteristics of three approaches to factor analyzing data: exploratory factor analysis, confirmatory factor analysis, and exploratory structural equation modeling, and describe results from an empirical study that favored the use of exploratory structural equation modeling over the other approaches. We also direct readers to resources with computer code and further details for conducting the illustrated analyses.
Abbreviations: EFA: Exploratory Factor Analysis; CFI: Comparative Fit Index; TLI: Tucker-Lewis Index; RMSEA: Root Mean Square Error of Approximation; ESEM: Exploratory Structural Equation Modeling
Since its inception in the early years of the 20th century (Spearman [1,2]), factor analysis has been used extensively in applied research across numerous disciplines. The fundamental purpose of factor analysis is to establish the number and nature of latent variables or factors that explain associations among observed scores. A factor is an unobservable variable that affects more than one observed score and accounts for correlations among those scores. Common applications of factor analysis are to determine whether interrelationships among observed indicators can be accounted for by a smaller number of underlying latent constructs (Brown [3]), investigate convergent and discriminant validity for such indicators and constructs (Millon [4]; Van de Vijver & Leung [5]), serve as a method of data reduction (Cox, et al. [6]), and provide a mechanism for investigating and testing theoretical models (Matsunaga [7]). Over the years, Thurstone’s [8] “simple structure” common factor model has attracted the most attention. Within this model, each observed score or indicator is represented by one or more common factors and an error term. The variability of each indicator score is divided into two parts: common variance shared among indicators (i.e., communality) and unique variance that is specific to the indicator or due to random measurement error (i.e., uniqueness). The common factor model encompasses two primary types of factor analyses: Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA; Joreskog [9,10]). The goal of both approaches is to use a reduced number of distinguishable latent variables or factors to explain observed associations among the indicators examined (Brown [3,11]).
EFA is driven by data in the sense that it does not require that the number of factors or the pattern of relationships among latent factors and indicators be explicitly specified in advance. EFA is typically used in preliminary research to determine the number of common factors needed and identify which observed scores are the best indicators of the latent factors in relation to factor loadings that represent the relationships between observed scores and latent factors. The number of common factors can be determined based on a variety of methods (see, e.g., Brown [3,11]). Ideally, each observed score would be more EFA is driven by data in the sense that it does not require that the number of factors or the pattern of relationships among latent factors and indicators be explicitly specified in advance. EFA is typically used in preliminary research to determine the number of common factors needed and identify which observed scores are the best indicators of the latent factors in relation to factor loadings that represent the relationships between observed scores and latent factors. The number of common factors can be determined based on a variety of methods (see, e.g., Brown [3,11]). Ideally, each observed score would be more
CFA is an extensively used structural equation method for testing theoretical models that represent relationships between observed scores and latent factors. In contrast to EFA, CFA is theory-driven and highly restrictive in that the researcher must specify the number of factors and how observed indicators relate to those factors. In most common applications of this method, researchers allow each indicator to load on one targeted latent factor but not on other factors to achieve the clearest simple structure. CFA is suitable for directly testing theoretical models and is more parsimonious than EFA (Brown [3,11]).
ESEM is a more recent approach to factor analysis (Asparouhov Muthén, et al. [12-15]) intended to overcome the highly restrictive nature of CFAs by allowing indicators to have non-zero (i.e., weak, or typically negligible) loadings on non-targeted factors, while still retaining the advantage of CFA in directly testing prespecified theoretical models. ESEM is more data-driven than CFA by allowing indicator scores to load on all factors but expecting those scores to load noticeably higher on targeted than on non-targeted factors. Recent studies have revealed that, when samples are of adequate size, ESEMs produce better model fits and more precise parameter estimates than do CFAs (Asparouhov & Muthén [12, 16-22]).
To illustrate advantages of ESEMs, we include selected results from a recent dissertation study by the first author (Hong [23]) in Table 1. The data reported represent responses from 447,500 residents in the United States (39% male, 61% female; mean age = 24.93), who completed the International Personality Item Pool NEO 120 questionnaire (IPIP-NEO-120) that we obtained from a publicly accessible university website (https://osf.io/tbmh5/) created to enhance research into personality-related constructs (Johnson [24]). The IPIP-NEO-120 has 120 items that measure the Big Five personality domain constructs: Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Openness to Experience. Each domain scale has 24 items with six nested 4-item facet subscales (see the note to Table 1 for names of all facets within each domain). We conducted separate correlated multifactor CFA and ESEM analyses for each personality domain with factors corresponding to the six facets included within a given domain. In Table 1, we report three model fit statistics for each analysis. These include the Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), and Root Mean Square Error of Approximation (RMSEA). In keeping with guidelines suggested by Hu and Bentler [25], we considered fits to be respectively acceptable and excellent with values of 0.90 and 0.95 or higher for CFIs and TLIs and values of 0.08 and 0.06 or lower for RMSEAs. As can be seen in Table 1, ESEMs provided noticeably better model fits than CFAs in all instances and matched or exceeded all criteria for excellent fits with only one exception (the TLI = 0.947 for Extraversion).
Note: Subscale facets for the personality domains described in the table include: Trust, Morality, Altruism, Cooperation, Modesty, and Sympathy for Agreeableness; Self-Efficacy, Orderliness, Dutifulness, Achievement-Striving, Self-Discipline, and Cautiousness for Conscientiousness; Friendliness, Gregariousness, Assertiveness, Activity Level, Excitement-Seeking, and Cheerfulness for Extraversion; Anxiety, Anger, Depression, Self-Consciousness, Immoderation, and Vulnerability for Neuroticism; and Imagination, Artistic Interests, Emotionality, Adventurousness, Intellect, and Liberalism for Openness to Experience. CFA: confirmatory factor analysis; ESEM: exploratory structural equation modeling; CFI: comparative fit index; TLI: Tucker–Lewis index; RMSEA: root mean square error of approximation. All analyses were based on correlated multifactor models using maximum likelihood parameter estimation. *Within the CFAs, all off-target loadings are set equal to zero.
For CFAs, in contrast, CFIs or TLIs never reached levels for excellent fits and failed to achieve acceptable fits across fit indices within three of the five personality domains (Agreeableness, Extraversion, and Openness to Experience). As would be desired, factor loadings for non-targeted factors in the ESEMs, on average, were negligible in size, ranging from 0.014 to 0.042 over the five personality domains.
Our intent in this brief article was to introduce readers to possible benefits of applying ESEMs within scientific research studies by combining the best aspects of EFAs and CFAs. To facilitate applications of ESEMs, routines for analyzing such models are now available in the computer packages Mplus (Muthén & Muthén [26]) and R (Prokofieva et al. [27]). Examples and computer code for analyzing the multifactor ESEMs described here as well as hierarchical and bifactor ESEMs can be found in Hong [23] and (Hong, et al. [28]. For more in-depth information about the nature and relative advantages of ESEMs over other procedures, we direct readers to the comprehensive treatment of such models in (Marsh, et al. [13]).