# 7-2SLS回归后诊断

## 7-2SLS回归后诊断

据有的作者所言，2SLS 的诊断，通常来说是一个容易被忽视的话题，但在书：[Regression Diagnostics | Wiley Series in Probability and Statistics](https://onlinelibrary.wiley.com/doi/book/10.1002/0471725153)，作者也提到了一些策略。

#### 7.1-三种ivreg默认诊断

因为我的能力有限，就直接参考ivreg 模型诊断的结果：

* weak instruments：这个是检验我们的工具变量是不是一个好的工具变量，其与我们的内源变量（自变量，暴露因素X）之间的关系；原假设是weak，所以我们希望这个统计量越大越好，p越小越好，从而拒绝原假设。可见我们结果显示，nearcollege 这个工具变量是一个强工具变量。
* Wu–Hausman test：这个是检验内生性的，就是检验我们的自变量是不是和残差有关。在本例中显著，拒绝原假设，认为存在内生变量，即education 是内生变量。一般认为，如果回归变量是外源的，则OLS 与2SLS 显示一致；而如果是内源的，则通常2SLS 的结果会更加准确。在本例中，Wu-Hausman 检验显著，说明存在内生变量，OLS 的结果为29.655，2SLS 结果为107.57，二者不一致，且我们更倾向于接受2SLS 的结果。
* Sargan test：这个检验只有在工具变量的个数超过内生变量的个数的时候才有，如果这个检验显著的话就说明至少有一个工具变量是不行的。在我们的案例里，仅有一个工具变量，不考虑。

#### 7.2-异常数据诊断

异常数据诊断（Unusual-Data Diagnostics），比如通过遍历的去除每个样本数据，看最终对回归拟合结果与原来拟合结果的影响。

ivreg的内建方法提供了一个便捷的操作：

```
ivreg_iv <- ivreg(wage ~ education | nearcollege, data = my_data)
ivreg:::influence.ivreg(ivreg_iv)

> names(tmp)
[1] "model"        "coefficients" "dfbeta"      
[4] "sigma"        "dffits"       "cookd"       
[7] "hatvalues"    "rstudent"     "df.residual" 
```

其会返回多个检验方法计算出的结果列表。

亦或是我们直接查看随机去除某个样本后的拟合差异：

```
> car::compareCoefs(ols, tsls2, tsls2.20)
Calls:
1: lm(formula = wage ~ education, data = my_data)
2: lm(formula = my_data$wage ~ d.hat)
3: lm(formula = my_data$wage ~ d.hat, subset = -20)

            Model 1 Model 2 Model 3
(Intercept)   183.9  -849.5  -849.2
SE             23.1   162.7   162.7
                                   
education     29.66                
SE             1.71                
                                   
d.hat                 107.6   107.5
SE                     12.3    12.3
```

### 7.3-非线性诊断

具体内容参考：[Diagnostics for 2SLS Regression • ivreg (john-d-fox.github.io)](https://john-d-fox.github.io/ivreg/articles/Diagnostics-for-2SLS-Regression.html#nonlinearity-diagnostics-1)

非线性诊断（Nonlinearity Diagnostics）。这一步大大超出了我的能力，因此仅仅展示一些R 的可视化分析结果：

```
# Nonlinearity Diagnostics
car::crPlots(ivreg_iv, smooth=list(span=1))
```

![](https://gitee.com/mugpeng/my-gallery-01/raw/master/img01_picgo2/20220111091350.png)

此外还有crPlots。

### 7.4-其他内容

我们还可以可视化的展示一些结果：

```
par(mfrow=c(2, 2))
plot(ivreg_iv)
```

![](https://gitee.com/mugpeng/my-gallery-01/raw/master/img01_picgo2/20220111084008.png)

亦或是比较不同的拟合结果：

```
# ols
ols <- lm(formula = wage ~ education, data = my_data)

# stage 1

tsls1 <- lm(formula = education ~ nearcollege, data = my_data)
summary(tsls1)

d.hat <- fitted.values(tsls1) # 获得每一项的预测值

# stage 2
tsls2 <- lm(formula = my_data$wage ~ d.hat)
summary(tsls2)

car::compareCoefs(ols, tsls2)

Calls:
1: lm(formula = wage ~ education, data = my_data)
2: lm(formula = my_data$wage ~ d.hat)

            Model 1 Model 2
(Intercept)   183.9  -849.5
SE             23.1   162.7
                           
education     29.66        
SE             1.71        
                           
d.hat                 107.6
SE                     12.3
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://peng-6.gitbook.io/mendelian_randomization/shi-zhan/72sls-hui-gui-hou-zhen-duan.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.