Mplus Latent Class Analysis Continuous Variables

Purpose:The following page will explain how to perform a latent class analysis in Mplus, one with categorical variables and the other with a mix of categorical and continuous variables. A mixture model with categorical variables is called latent class analysis, whereas a mixture model with only continuous variables is called alatent profile analysis (Oberski, 2016).

Note: Mplus version 8 was used for these examples. Download all the files for this portion of this seminar.

1.0 Basic latent class analysis model

Latent class analysis is used to classify individuals into homogeneous subgroups. Individual differences in observed item response patterns are explained by differences in latent class membership (Geiser, 2013). For the case with only dichotomous variables \(X=\{0,1\}\), the latent class analysis (LCA) model for a single item can be written as:

$$ P(X_{vi} =1) = \sum_{g=1}^{G} \pi_{g} \pi_{ig} $$

where \(P(X_{vi}=1)\) denotes the unconditional probability that a randomly selected individual \(v\) obtained a score of \(X=1\) on item \(i\), \((i=1,\cdots,I)\) and the parameter

$$ \pi_{ig} = P(X_{vi} = 1 | G = g) $$

is the conditional solution probability. Since the sum of the two conditional probabilities equals one,

$$ P(X_{vi} = 0 | G = g) = 1-\pi_{ig}. $$

The class size parameter \(\pi_g\) indicates the unconditional probability of belonging to latent class \(g\), \((g = 1, \cdots, G)\), and the sum of all class-size parameters is 1, i.e.,

$$ \sum_{g=1}^{G} \pi_{g} = 1. $$

We will illustrate a simple latent class analysis (LCA) using the mplus73recode.dat dataset and see if we can identify two classes based on four binary variables. For example, the variable u1 indicates whether the student was in honors math in seventh grade (1=yes; 0=no); the variable u2 indicates whether the student was in honors math in eighth grade; rc3 indicates whether the student was in honors math in ninth grade; and rc4 indicates whether the student was in honors math in tenth grade. We specify that two latent classes should be extracted, and we expect that these classes will differentiate students who have a particularly high aptitude in math from those who do not.

In the syntax below, the title statement is used to remind us what analysis we are running. The data statement tells Mplus where the text data file is located. The variables statement tells Mplus the names of the variables in the text file (these names are not listed at the top of the text file); the usevariables statement tells Mplus which variables we will be using in this analysis; the classes statement indicates the number of classes we wish to extract; and the categorical statement tells Mplus which variables are categorical.

By specifying mixture on the analysis statement, we tell Mplus that our data are a mixture of two subpopulations. We use the savedata statement save to class membership information to a text file called lca73classes.txt. We will save the class probabilities (cprob) in this file, and the file will be a free format text file. We can open this file in another program and look at the class membership probabilities and class assignment. The plot statement requests that we would like get all possible plots (type 3), graphs where the values are connected by a line. The (*) at the end of the series statement requests integer values starting with 0 and increasing by 1.

                title:  This is an example of LCA with binary latent class indicators  data:       file is mplus73recode.dat;  variable:       names are u1-u4 rc3 rc4 x1-x10;     usevariables = u1 u2 rc3 rc4;     classes = c (2);     categorical = u1 u2 rc3 rc4;  analysis:       type=mixture;  savedata:       file is lca73classes.txt ;     save is cprob;     format is free;  plot:       type is plot3;     series is u1 u2 rc3 rc4(*);              

Below is the resulting output.

SUMMARY OF ANALYSIS  Number of groups                                                 1 Number of observations                                         500  Number of dependent variables                                    4 Number of independent variables                                  0 Number of continuous latent variables                            0 Number of categorical latent variables                           1  Observed dependent variables    Binary and ordered categorical (ordinal)    U1          U2          RC3         RC4  Categorical latent variables    C   Estimator                                                      MLR Information matrix                                        OBSERVED Optimization Specifications for the Quasi-Newton Algorithm for Continuous Outcomes   Maximum number of iterations                                 100   Convergence criterion                                  0.100D-05 Optimization Specifications for the EM Algorithm   Maximum number of iterations                                 500   Convergence criteria     Loglikelihood change                                 0.100D-06     Relative loglikelihood change                        0.100D-06     Derivative                                           0.100D-05 Optimization Specifications for the M step of the EM Algorithm for Categorical Latent variables   Number of M step iterations                                    1   M step convergence criterion                           0.100D-05   Basis for M step termination                           ITERATION Optimization Specifications for the M step of the EM Algorithm for Censored, Binary or Ordered Categorical (Ordinal), Unordered Categorical (Nominal) and Count Outcomes   Number of M step iterations                                    1   M step convergence criterion                           0.100D-05   Basis for M step termination                           ITERATION   Maximum value for logit thresholds                            15   Minimum value for logit thresholds                           -15   Minimum expected cell size for chi-square              0.100D-01 Optimization algorithm                                         EMA Random Starts Specifications   Number of initial stage random starts                         10   Number of final stage optimizations                            2   Number of initial stage iterations                            10   Initial stage convergence criterion                    0.100D+01   Random starts scale                                    0.500D+01   Random seed for generating random starts                       0 Link                                                         LOGIT  Input data file(s)   d:datamplus73recode.dat Input data format  FREE   SUMMARY OF CATEGORICAL DATA PROPORTIONS      U1       Category 1    0.678       Category 2    0.322     U2       Category 1    0.686       Category 2    0.314     RC3       Category 1    0.678       Category 2    0.322     RC4       Category 1    0.666       Category 2    0.334   RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES  Final stage loglikelihood values at local maxima, seeds, and initial stage start numbers:              -965.244  253358           2             -965.244  285380           1  THE MODEL ESTIMATION TERMINATED NORMALLY  TESTS OF MODEL FIT  Loglikelihood            H0 Value                        -965.244           H0 Scaling Correction Factor       1.013             for MLR  Information Criteria            Number of Free Parameters              9           Akaike (AIC)                    1948.488           Bayesian (BIC)                  1986.420           Sample-Size Adjusted BIC        1957.853             (n* = (n + 2) / 24)  Chi-Square Test of Model Fit for the Binary and Ordered Categorical (Ordinal) Outcomes            Pearson Chi-Square            Value                              6.287           Degrees of Freedom                     6           P-Value                           0.3918            Likelihood Ratio Chi-Square            Value                              5.605           Degrees of Freedom                     6           P-Value                           0.4688  FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL      Latent    Classes         1                136.38034          0.27276                2                363.61966          0.72724                FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON ESTIMATED POSTERIOR PROBABILITIES      Latent    Classes         1        136.38059          0.27276        2        363.61941          0.72724   CLASSIFICATION QUALITY       Entropy                         0.904  CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP  Class Counts and Proportions      Latent    Classes         1                127          0.25400                2                373          0.74600                Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column)             1        2      1   0.986    0.014     2   0.030    0.970  MODEL RESULTS                                                      Two-Tailed                     Estimate       S.E.  Est./S.E.    P-Value  Latent Class 1   Thresholds     U1$1              -2.063      0.373     -5.536      0.000     U2$1              -1.724      0.300     -5.755      0.000     RC3$1             -2.331      0.390     -5.985      0.000     RC4$1             -2.078      0.320     -6.490      0.000  Latent Class 2   Thresholds     U1$1               2.091      0.182     11.502      0.000     U2$1               2.056      0.180     11.401      0.000     RC3$1              2.187      0.203     10.760      0.000     RC4$1              1.937      0.183     10.613      0.000  Categorical Latent Variables   Means     C#1               -0.981      0.116     -8.468      0.000   RESULTS IN PROBABILITY SCALE  Latent Class 1   U1     Category 1                0.113      0.037      3.024      0.002                Category 2                0.887      0.037     23.800      0.000                U2     Category 1                0.151      0.038      3.934      0.000                Category 2                0.849      0.038     22.056      0.000                RC3     Category 1                0.089      0.031      2.817      0.005                Category 2                0.911      0.031     28.987      0.000                RC4     Category 1                0.111      0.032      3.514      0.000                Category 2                0.889      0.032     28.072      0.000                Latent Class 2   U1     Category 1                0.890      0.018     50.016      0.000                Category 2                0.110      0.018      6.181      0.000                U2     Category 1                0.887      0.018     48.873      0.000                Category 2                0.113      0.018      6.256      0.000                RC3     Category 1                0.899      0.018     48.748      0.000                Category 2                0.101      0.018      5.472      0.000                RC4     Category 1                0.874      0.020     43.498      0.000                Category 2                0.126      0.020      6.267      0.000                LATENT CLASS ODDS RATIO RESULTS  Latent Class 1 Compared to Latent Class 2   U1     Category > 1      63.673     25.877      2.461      0.014  U2     Category > 1      43.796     14.941      2.931      0.003  RC3     Category > 1      91.672     38.990      2.351      0.019  RC4     Category > 1      55.439     20.032      2.768      0.006  QUALITY OF NUMERICAL RESULTS       Condition Number for the Information Matrix              0.600E-01        (ratio of smallest to largest eigenvalue)  PLOT INFORMATION  The following plots are available:    Histograms (sample values)   Scatterplots (sample values)   Sample proportions   Estimated probabilities  SAVEDATA INFORMATION    Order of variables      U1     U2     RC3     RC4     CPROB1     CPROB2     C    Save file     lca73classes.txt    Save file format           Free    Save file record length    5000              

To view the graphs, click on Graph and then View Graphs.  From the list, we selected Estimated Probabilities.

Image lca74_1

The graph above corresponds to the table in the output entitled "Results in Probability Scale".  As you can see in the title bar of the graph, the plotted points are for category 2.  The y-axis is the probability, and the x-axis gives the four binary predictor variables.  The variable u1 is called 0, the variable u2 is called 1, the variable rc3 is called 2, and the variable rc4 is called 3.  The labeling of the x-axis starts at 0 and increases in increments of 1 because of the way we specified the series statement.  We used simple syntax that did not yield a simple labeling of the x-axis.

We can see from the legend in the middle of the graph that 27.3% of this sample of students is in latent class 1, while 72.7% of the sample of students is in latent class 2.  This information can be found in the table in the output entitled "Final Class Counts and Proportions for the latent Classes Based on the Estimated Model".

The red line indicates latent class 1, which we believe is the class containing the gifted math students.  Students in latent class 1 have a probability of 0.887 of having a value of 1 on the variable u1 (being in honors math in seventh grade).  The green line indicates latent class 2, which we believe is the class containing the regular math students.  The probability that a student in latent class 2 has value of 1 on the variable u1 is .110.  The probability that a student in latent class 1 has a value of 1 on the variable u2 (being in honors math in the eighth grade) is 0.849, while the probability that a student in latent class 2 has a value of 1 on the variable u2 is only 0.113.  As  you can see from the graph, the students in latent class 1 have a high probability of having a value on all of the binary variables.  Remember that a value of 1 on these variables indicates that the student was in honors math in that grade.

If we look at the the first few cases in the outputted file that we requested, we can see that the output and graph correspond to this file. The outputted text file does not contain variable names, but you can find this information in the output in the table entitled "Savedata Information" (towards the end of the output).  This tells us that the first four variables are the observed binary variables from our mplus73recode data file, the next variable is class probability 1, then class probability 2, and the last variable (called c), is the assigned class membership.  The first two students have very high probabilities for class 1 and low probabilities for class 2, and they are assigned to class 1.  The last two students whose data are listed below were in no honors math classes; they have 0 probability of being in class 1, a 1.0 probability of being in class 2, and they are in class 2.

                1.000      1.000      1.000      0.000      0.963      0.037      1.000      1.000      0.000      1.000      1.000      0.971      0.029      1.000      0.000      0.000      0.000      0.000      0.000      1.000      2.000      1.000      1.000      1.000      1.000      0.999      0.001      1.000      1.000      1.000      1.000      1.000      0.999      0.001      1.000      0.000      1.000      0.000      0.000      0.004      0.996      2.000      1.000      1.000      1.000      1.000      0.999      0.001      1.000      0.000      0.000      0.000      0.000      0.000      1.000      2.000      0.000      0.000      0.000      1.000      0.006      0.994      2.000      0.000      0.000      0.000      0.000      0.000      1.000      2.000      0.000      0.000      0.000      1.000      0.006      0.994      2.000      0.000      0.000      0.000      0.000      0.000      1.000      2.000      0.000      0.000      0.000      0.000      0.000      1.000      2.000      1.000      1.000      1.000      1.000      0.999      0.001      1.000      0.000      0.000      0.000      0.000      0.000      1.000      2.000      0.000      0.000      0.000      0.000      0.000      1.000      2.000

2.0  Using both categorical and continuous predictor variables

When modeling latent variables, you can use any combination of categorical and continuous variables.  In this example, we will use both categorical and continuous variables.

                title:  Both categorical and continuous variables  data:        file is mplus73recode.dat;  variable:       names are u1-u4 rc3 rc4 x1-x10;     usevar are u1 u2 rc3 rc4 x1 - x5;     categorical are u1 u2 rc3 rc4;     classes = grp (2);  analysis:       type = mixture; plot:     type is plot3;     series is  x1-x3(*);              

As you can see, the syntax is very similar to the previous example.  We have five continuous variables listed on the usevariables statement (which was shorted to usevar).  The name of the classes was changed to grp (you can name it anything that you want), and we again asked for plots.  Please note that when you request plots, you can specify plots for either categorical or continuous variable, but not for both.  Also, the types of plots available depend on the model specified.  If you specify the model such that the latent classes are determined by one set of predictors and the class membership is determined by a different set of predictors, then you can get a larger variety of graphs.

Below is the abbreviated output.

*** WARNING in MODEL command   All variables are uncorrelated with all other variables within class.   Check that this is what is intended.    1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS  Both categorical and continuous variables  SUMMARY OF ANALYSIS  Number of groups                                                 1 Number of observations                                         500  Number of dependent variables                                    9 Number of independent variables                                  0 Number of continuous latent variables                            0 Number of categorical latent variables                           1  Observed dependent variables    Continuous    X1          X2          X3          X4          X5    Binary and ordered categorical (ordinal)    U1          U2          RC3         RC4  Categorical latent variables    GRP  THE MODEL ESTIMATION TERMINATED NORMALLY  TESTS OF MODEL FIT  Loglikelihood            H0 Value                       -4567.250           H0 Scaling Correction Factor       0.987             for MLR  Information Criteria            Number of Free Parameters             24           Akaike (AIC)                    9182.500           Bayesian (BIC)                  9283.651           Sample-Size Adjusted BIC        9207.474             (n* = (n + 2) / 24)  Chi-Square Test of Model Fit for the Binary and Ordered Categorical (Ordinal) Outcomes            Pearson Chi-Square            Value                              7.629           Degrees of Freedom                     6           P-Value                           0.2665            Likelihood Ratio Chi-Square            Value                              6.974           Degrees of Freedom                     6           P-Value                           0.3233  FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL      Latent    Classes         1        367.57723          0.73515        2        132.42277          0.26485  FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON ESTIMATED POSTERIOR PROBABILITIES      Latent    Classes         1        367.57724          0.73515        2        132.42276          0.26485   CLASSIFICATION QUALITY       Entropy                         0.998  CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP  Class Counts and Proportions      Latent    Classes         1              368          0.73600        2              132          0.26400   Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column)             1        2      1   0.999    0.001     2   0.000    1.000  MODEL RESULTS                                                      Two-Tailed                     Estimate       S.E.  Est./S.E.    P-Value  Latent Class 1   Means     X1                -2.058      0.055    -37.120      0.000     X2                -2.061      0.051    -40.653      0.000     X3                -0.987      0.055    -18.069      0.000     X4                -0.990      0.052    -19.020      0.000     X5                -0.040      0.053     -0.759      0.448   Thresholds     U1$1               2.021      0.162     12.454      0.000     U2$1               2.075      0.166     12.521      0.000     RC3$1              2.075      0.166     12.526      0.000     RC4$1              1.930      0.157     12.279      0.000   Variances     X1                 1.116      0.073     15.348      0.000     X2                 0.956      0.058     16.600      0.000     X3                 1.031      0.059     17.382      0.000     X4                 0.946      0.060     15.722      0.000     X5                 1.064      0.067     15.762      0.000  Latent Class 2   Means     X1                 1.988      0.091     21.874      0.000     X2                 1.971      0.087     22.659      0.000     X3                 0.987      0.081     12.249      0.000     X4                 0.829      0.080     10.424      0.000     X5                 0.097      0.095      1.022      0.307   Thresholds     U1$1              -2.102      0.283     -7.440      0.000     U2$1              -1.955      0.266     -7.353      0.000     RC3$1             -2.268      0.302     -7.516      0.000     RC4$1             -2.306      0.303     -7.617      0.000   Variances     X1                 1.116      0.073     15.348      0.000     X2                 0.956      0.058     16.600      0.000     X3                 1.031      0.059     17.382      0.000     X4                 0.946      0.060     15.722      0.000     X5                 1.064      0.067     15.762      0.000  Categorical Latent Variables   Means     GRP#1              1.021      0.102     10.056      0.000  RESULTS IN PROBABILITY SCALE  Latent Class 1   U1     Category 1         0.883      0.017     52.667      0.000     Category 2         0.117      0.017      6.977      0.000  U2     Category 1         0.888      0.016     54.098      0.000     Category 2         0.112      0.016      6.792      0.000  RC3     Category 1         0.888      0.016     54.116      0.000     Category 2         0.112      0.016      6.794      0.000  RC4     Category 1         0.873      0.017     50.202      0.000     Category 2         0.127      0.017      7.284      0.000  Latent Class 2   U1     Category 1         0.109      0.027      3.972      0.000     Category 2         0.891      0.027     32.500      0.000  U2     Category 1         0.124      0.029      4.294      0.000     Category 2         0.876      0.029     30.330      0.000  RC3     Category 1         0.094      0.026      3.657      0.000     Category 2         0.906      0.026     35.325      0.000  RC4     Category 1         0.091      0.025      3.632      0.000     Category 2         0.909      0.025     36.447      0.000  LATENT CLASS ODDS RATIO RESULTS  Latent Class 1 Compared to Latent Class 2   U1     Category > 1       0.016      0.005      3.066      0.002  U2     Category > 1       0.018      0.006      3.187      0.001  RC3     Category > 1       0.013      0.004      2.906      0.004  RC4     Category > 1       0.014      0.005      2.930      0.003              

References

Geiser, C. (2013). Methodology in the social sciences. Data analysis with Mplus. New York: Guilford Press.

Oberski, D. (2016). Mixture models: Latent profile and latent class analysis. In Modern statistical methods for HCI (pp. 275-287). Springer, Cham.

hinesflosse1940.blogspot.com

Source: https://stats.oarc.ucla.edu/mplus/seminars/mplus-class-notes/lca/

0 Response to "Mplus Latent Class Analysis Continuous Variables"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel