Assignment Task
Overview
For the take-home project, you will use the provided simulated survey data to write a short paper that produces analyses similar to those in Dutz et al. (2021). The goal of your paper will be to test and account for nonresponse bias in a survey about labor market conditions. The survey randomized participation incentives and was also linked to full-population administrative data, and you will use these features to test for nonresponse bias and to correct for this bias under various approaches that point identify and bound population means for the outcomes of interest. In doing so, your task will be to clearly (and succinctly) develop your analysis and apply ideas closely related to those you’ve learned throughout the quarter. After first reading the project assignment, you are highly encouraged to read the paper.
The provided dataset consists of a random sample of US residents who are at least 18 years of age who were invited to participate in a survey about labor market outcomes. Individuals in the sample were randomized into one of two incentive levels: low (Z= 0) and high (Z = 1). Participation in the survey is denoted by R. The survey was conducted over two time periods T= 0,1. For participants, the variable ‘time’ denotes the T in which the individual participated, and is otherwise NA for non-participants.
The survey elicited two outcomes, observed only for participants: 1. an indicator for the indi- vidual being on welfare (denoted ‘s welfare’), and 2. the individual’s weekly earnings (denoted ‘s earnings”). Additionally, in part to assess the validity of the survey, the surveyors partnered with the US Department of Health and Human Services to link all individuals in the sample to administrative data, which includes actual welfare receipt (denoted ‘a_welfare’) and weekly earnings (denoted ‘a earnings’), and a binary covariate which we will take to be education level (low or high, denoted ‘a educ’).
Part I: Introduction and Data
A (very!) short introduction describing the data and your key findings.
II: Participation rates and incentives
In this section, you are asked to investigate the setting and examine participation rates. At a minimum, you should:
- Examine the validity of random assignment of incentives
- Examine the effect of incentives on participation rates
- Examine the potential for misreporting that makes use of administrative data (can you also implement a test that does not require administrative data and only uses survey data?)
III: Non-response bias and selection
In this section, you will investigate whether your survey may be contaminated with non-response bias, and will also examine the potential for selection. At a minimum, you should:
- Define a test for non-response bias and implement it
- Define a test of selection using only survey data and implement it
- Discuss in what sense the test for selection is also a test for non-response bias
Investigate the extent to which non-response bias and selection may be due to selection on observables versus selection on unobservables Under a monotonicity assumption (which you should state), use survey data to identify mean outcomes for always-takers and compliers and discuss your findings. In the subsequent parts, you should focus your analysis to only use the administrative outcomes: ‘a welfare’ and ‘a earnings’.
IV: Correcting for non-response bias
In this section, you will implement various approaches to correcting for non-response bias. You will consider both bounds and point estimates of the population mean under various assumptions, and are not required to perform inference. Your analysis should include both assumptions that do not require an explicit selection model, and assumptions that require an explicit selection model. In implementing these methods for the two administrative outcomes, you should use ony data on participants. However, you should provide the true population mean as a reference to assess the performance of the methods. Without an explicit selection model Implement the following assumptions, making sure to discuss how you implement them and your findings: Worst-case bounds
V: Revisiting the selection model
Conclude by considering the double threshold model introduced in Section 6 of Dutz et al. (2021). In particular: Supposing selection follows the double threshold model, assess the validity of the various assumptions imposed in the previous section (your analysis may be analytical, similar in flavor to PSET3Q3)
Using both the assignment of incentives and the timing of participation, show evidence in favor of the claim that individuals who participate later differ from individuals who partcipate with higher ince