The 2014 Redesign of the Survey of Income and Program Participation: An Assessment (2018)

Chapter: Appendix A: Joint Distribution of Topic Flags

Previous Chapter: References
Suggested Citation: "Appendix A: Joint Distribution of Topic Flags." National Academies of Sciences, Engineering, and Medicine. 2018. The 2014 Redesign of the Survey of Income and Program Participation: An Assessment. Washington, DC: The National Academies Press. doi: 10.17226/24864.

Appendix A

Joint Distribution of Topic Flags

The first step in the multiple imputation process is to estimate the joint probability distribution of the full set of topic flags, conditioned on available survey and administrative data. Accounting for the uncertainty in the estimation of model parameters for the distribution is typically accomplished by alternating draws from a posterior distribution of the parameters θ, given the fully observed data X, observed data Yobs, and the most recent draw of the missing data Yimp(t):

θ(t) ~ p(θYobs,Yimp(t),X)

where a draw from the missing data, given the most recent draw of θ(t), is

Yimp(t+1) ~ p(YmisYobs,X,θ(t))

(see Rubin, 1987).

In practice, difficulties can arise when Ymis is multidimensional and

p(YmisYobs,X,θ) = p(Y1mis,…,YkmisYobs,X,θ)

is high-dimensional (large k) and/or not in a known form (e.g., if Y1mis,…,Ykmis are a mix of different distributions without known joint form). If missingness is monotonic (i.e., if Y2 is missing only when Y1 is missing, Y3 is miss-

Suggested Citation: "Appendix A: Joint Distribution of Topic Flags." National Academies of Sciences, Engineering, and Medicine. 2018. The 2014 Redesign of the Survey of Income and Program Participation: An Assessment. Washington, DC: The National Academies Press. doi: 10.17226/24864.

ing only when Y1 and Y2 are missing, and so forth, through Yk), the joint distribution can be decomposed as:

p(Y1mis,…,YkmisYobs,X,θ) =
p(Y1misYobs,X,θ)p(Y2mis,…,YkmisYobs,Y1mis,X,θ)
p
(YkmisYobs,Y1mis,…,Yk–1misX,θ)

and draws are obtained such as:

Y1imp(t+1) ~ p(Y1misYobs,X,θ(t)), Y2imp(t+1) ~ p(Y2misYobs,Y1imp(t+1)X,θ(t)), etc.

However, the missingness in SIPP is not monotonic because the 25 topic flags display many different patterns of missing data. Hence sequential regression multiple imputation (SRMI; see Raghunathan et al., 2001) provides an alternative imputation by replacing the direct draw of Yimp(t+1) ~ p(YmisYobs,X,θ(t)) with a series of conditional imputations:

Y1imp(t+1) ~ p(Y1misYobs,Y2imp(t),…,Ykimp(t),X,θ(t)) through
Ykimp(t+1) ~ p(YkmisYobs,Y1imp(t+1),…,Yk–1imp(t+1),X,θ(t)).

The SIPP processing system implements SRMI using T = 5 iterations. Imputation for the topic flags is conducted using logistic regression models, stratified on demographic factors, where subject matter experts designed the details of the model for each content flag. An important point, discussed in Chapter 5, is that although SIPP documentation refers to SRMI (sequential regression multiple imputation), only a single imputation is provided.

Suggested Citation: "Appendix A: Joint Distribution of Topic Flags." National Academies of Sciences, Engineering, and Medicine. 2018. The 2014 Redesign of the Survey of Income and Program Participation: An Assessment. Washington, DC: The National Academies Press. doi: 10.17226/24864.
Page 191
Suggested Citation: "Appendix A: Joint Distribution of Topic Flags." National Academies of Sciences, Engineering, and Medicine. 2018. The 2014 Redesign of the Survey of Income and Program Participation: An Assessment. Washington, DC: The National Academies Press. doi: 10.17226/24864.
Page 192
Next Chapter: Appendix B: Biographical Sketches of Panel Members and Staff
Subscribe to Email from the National Academies
Keep up with all of the activities, publications, and events by subscribing to free updates by email.