Classes 8-9: Diff-in-diff

# In-person session 8

**October 11, 2021**

]

---

# Plan for today

.box-3.medium.sp-after-half[Econ Nobel!]

.box-6.medium.sp-after-half[Sensitivity analysis]

.box-5.medium.sp-after-half[Diff-in-diff FAQs]

---

layout: false
name: econ-nobel
class: center middle section-title section-title-3 animated fadeIn

# Econ Nobel

---

---

.center[
<figure>
 <img src="img/08-class/2021-nobel-winners.jpg" alt="2021 econ Nobel winners" title="2021 econ Nobel winners" width="55%">
</figure>
]

???

- Card (and Krueger): NJ/PA minimum wage + the beginning of this whole credibility revolution thing
- Angrist: MHE and MM and making causal inference accessible
- Imbens: A ton of CI stuff + attempting to bridge DAG world with situation-based world

- https://twitter.com/NobelPrize/status/1447502627114205187 - PA/NJ
- https://twitter.com/MaxCRoser/status/1447505582450151431
- https://twitter.com/Stanford/status/1447549033539637248

---

.center[
<figure>
 <img src="img/08-class/alan-krueger.jpg" alt="Alan Krueger" title="Alan Krueger" width="80%">
</figure>
]

---

.center[
<figure>
 <img src="img/08-class/pa-nj-nobel.jpg" alt="Nobel PA/NJ" title="Nobel PA/NJ" width="57%">
</figure>
]

---

.center[
<figure>
 <img src="img/08-class/blackwell-casual.png" alt="NPR casual inference" title="NPR casual inference" width="100%">
</figure>
]

???

https://twitter.com/matt_blackwell/status/1447518447882096642

---

layout: false
name: sensitivity
class: center middle section-title section-title-6 animated fadeIn

# Sensitivity analysis

---

---

.box-6.medium.sp-after[How do we know when we've got the right confounders in our DAG?]

.box-6.medium[How do we solve the fact that we have so many unknowns in our DAG?]

---

.center[
<figure>
 <img src="img/08-class/2020-2021-meme-garnick-1.jpg" alt="OVB" title="OVB" width="40%">
</figure>
]

???

https://owenozier.github.io/teaching/2020-2021-memes

---

layout: false
name: faqs
class: center middle section-title section-title-5 animated fadeIn

# Diff-in-diff FAQs

---

---

.box-5.large[Design-based vs. model-based inference]

---

---

# Identification strategies

.box-inv-5.small.sp-after[The goal of *all* these methods is to isolate (or **identify**) the arrow between treatment → outcome]

.box-inv-5.less-medium[Model-based identification]

.float-left.center[.box-5[DAGs] .box-5[Matching] .box-5[Inverse probability weighting]]

.box-inv-5.less-medium.sp-before[Design-based identification]

.float-left.center[.box-5[Randomized controlled trials] .box-5[Difference-in-differences]]

.float-left.center[.box-5[Regression discontinuity] .box-5[Instrumental variables]]

---

# Model-based identification

.pull-left[
<figure>
 <img src="04-slides_files/figure-html/edu-earn-adjust-1.png" alt="Education earnings DAG" title="Education earnings DAG" width="100%">
</figure>
]

.box-inv-5.small[Everything that needs to be adjusted is measurable; no unobserved confounding]

.box-inv-5.small[**Big assumption!**]

.box-inv-5.tiny[This is why lots of people don't like DAG-based adjustment]
]

---

# Design-based identification

.box-inv-5.small[Use randomization to remove confounding]

.center[
<figure>
 <img src="05-slides_files/figure-html/experimental-dag-1.png" alt="RCT DAG" title="RCT DAG" width="60%">
</figure>
]
]

.box-inv-5.small[Use before/after & treatment/control differences to remove confounding]

.center[
<figure>
 <img src="08-slides_files/figure-html/min-wage-dag-1.png" alt="Diff-in-diff DAG" title="Diff-in-diff DAG" width="90%">
</figure>
]
]

---

---

.box-5.large[Which is better or more credible? RCTs, quasi experiments, or DAG-based models?]

---

.center[
<figure>
 <img src="img/08-class/causality-continuum.png" alt="The (wrong!) causality continuum" title="The (wrong!) causality continuum" width="90%">
</figure>
]

---

.box-5.huge[There's no hierarchy!]

---

.box-5.large[Can we talk more about interaction terms and how to interpret them?]

.box-5[Are interaction effects in regression always more accurate of a difference than running a "regular" regression without them?]

---

.box-5.large[Can causal effects be negative or are they always positive?]

---

.center[
<figure>
 <img src="img/08-class/lambeth-southwark-vauxhall.jpg" alt="Lambeth and Southwark-Vauxhall" title="Lambeth and Southwark-Vauxhall" width="70%">
</figure>
]

---

]

---

.box-5.large[Multiple adjustment sets]

???

DAG from exam 1 and why there are multiple adjustment sets

---

.box-5.large[Where do we get all this data?]

---

.center[
<figure>
 <img src="img/08-class/file-not-found.png" alt="The Verge 'File Not Found'" title="The Verge 'File Not Found'" width="100%">
</figure>
]

???

https://www.theverge.com/22684730/students-file-folder-directory-structure-education-gen-z

- Absolute vs. relative paths
- Best practices for project structures
- File types - jpg/png/pdf, CSV vs. Excel, Word vs. txt vs. md vs. Rmd
- Downloading files from the internet and using them in Rmd - dagitty specifically

---

.box-5.large[Project structures]

.box-inv-5[[Another approach](https://datacarpentry.org/R-ecology-lesson/00-before-we-start.html#Organizing_your_working_directory)]

---

.box-5.large[File types]

.box-inv-5[[Image types slides](https://datavizs21.classes.andrewheiss.com/slides/02-slides.html#57)]

---

.center[
<figure>
 <img src="img/08-class/bedtime-math.png" alt="Bedtime math" title="Bedtime math" width="45%">
</figure>
]

---

.center[
<figure>
 <img src="img/08-class/bedtime-math-diff-diff.png" alt="Bedtime math diff-in-diff" title="Bedtime math diff-in-diff" width="100%">
</figure>
]

---

.box-5.medium[If the control group changes in the same way, and the causal effect was zero, would we say that the treatment didn't work?]

---

.box-5.medium[When doing your subtracting to get your differences in the matrix, is it better to do the vertical or horizontal subtractions?]

.box-5.medium[Are there situations where one is preferable to the other?]

---

.box-5.medium[Why are we learning two ways to do diff-in-diff? (2x2 matrix vs. `lm()`)]

---

.box-5.less-medium[What group level is best for comparison? For example, if we are looking at policy change in NJ, is it best to compare with just one or two similar states? How similar do the populations need to be?]

.box-5.medium.sp-after[Wouldn't matching be better?]

.box-5.less-medium[Do we have to think about balance when dealing with observational data in diff in diff?]

.box-inv-5[[Two-way fixed effects (TWFE)](https://www.andrewheiss.com/blog/2021/08/25/twfe-diagnostics/)]

???

- Multiple states/groups are possible - that's TWFE
- Wouldn't matching be better? Sure, if you're doing state-level stuff. But their data was restaurant level

- Balance: Maybe. With just two states/villages/countries/whatever, yes. With lots, the state/year fixed effects pick up those trends for you

---

.box-5.large[Minimum legal drinking age]

---

`$$\begin{aligned}
\text{Mortality}\ =&\ \beta_0 + \beta_1\ \text{Alabama} + \beta_2\ \text{After 1975}\ + \\
&\ \beta_3\ (\text{Alabama} \times \text{After 1975})
\end{aligned}$$`

---

`$$\text{Mortality}\ =\ \beta_0 + \beta_1\ \text{Treatment} + \beta_2\ \text{State} + \beta_3\ \text{Year}$$`

---

`$$\begin{aligned}
\text{Mortality}\ =&\ \beta_0 + \beta_1\ \text{Treatment} + \beta_2\ \text{State}\ + \\
&\ \beta_3\ \text{Year} + \beta_4\ (\text{State} \times \text{Year})
\end{aligned}$$`

---

.center[
<figure>
 <img src="img/08-class/mm-tbl-5-2.png" alt="Mastering Metrics Table 5.2" title="Mastering Metrics Table 5.2" width="55%">
</figure>
]

---

.center[
<figure>
 <img src="img/08-class/mm-fig-5-4.png" alt="Mastering Metrics Figure 5.4" title="Mastering Metrics Figure 5.4" width="65%">
</figure>
]

---

.center[
<figure>
 <img src="img/08-class/mm-fig-5-5.png" alt="Mastering Metrics Figure 5.5" title="Mastering Metrics Figure 5.5" width="65%">
</figure>
]

---

.center[
<figure>
 <img src="img/08-class/mm-fig-5-6.png" alt="Mastering Metrics Figure 5.6" title="Mastering Metrics Figure 5.6" width="65%">
</figure>
]

---

.box-5.large[What happened to confounding??]

.box-5.large[Now we're only looking at just two "confounders"?]

???

The parallel trends assumption takes care of that

---

.box-5.medium[Is it reasonable to conduct sensitivity analysis when working with diff in diff?]

---

.box-5.large[How do we play with time to check for parallel trends?]

---

.box-5.large[What about this staggered treatment stuff?]

???

This is good for ethical reasons!

Blog post