# 6-R语言最小二乘法实战

## 6-R语言最小二乘法实战

其实R 包`ivreg`已经提供了方法进行最小二乘法（2SLS），不过为了了解整个R 的操作过程，这里我还是使用最基本的lm 进行二次拟合，实现整个过程。

### 6.1-准备与获得数据

这里我使用ivreg 中的数据集：

```
my_packages<- c("ggplot2", "tidyverse", 
                "RColorBrewer", "paletteer", "ivreg")
tmp <- sapply(my_packages, function(x) library(x, character.only = T)); rm(tmp, my_packages)
data("SchoolingReturns", package = "ivreg")
my_data <- SchoolingReturns[, 1:8]
glimpse(my_data)

Rows: 3,010
Columns: 8
$ wage        <dbl> 548, 481, 721, 250, 729, 500, 565, 608…
$ education   <dbl> 7, 12, 12, 11, 12, 12, 18, 14, 12, 12,…
$ experience  <dbl> 16, 9, 16, 10, 16, 8, 9, 9, 10, 11, 13…
$ ethnicity   <fct> afam, other, other, other, other, othe…
$ smsa        <fct> yes, yes, yes, yes, yes, yes, yes, yes…
$ south       <fct> no, no, no, no, no, no, no, no, no, no…
$ age         <dbl> 29, 27, 34, 27, 34, 26, 33, 29, 28, 29…
$ nearcollege <fct> no, no, no, yes, yes, yes, yes, yes, y…
```

### 6.2-两次回归找拟合值

首先是第一阶段，通过工具变量，对暴露因素求回归：

```
# stage 1

tsls1 <- lm(formula = education ~ nearcollege, data = my_data)
summary(tsls1)

Call:
lm(formula = education ~ nearcollege, data = my_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.698  -1.527  -0.527   2.473   5.302 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    12.69801    0.08564 148.269  < 2e-16 ***
nearcollegeyes  0.82902    0.10370   7.994 1.84e-15 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.649 on 3008 degrees of freedom
Multiple R-squared:  0.02081,	Adjusted R-squared:  0.02048 
F-statistic: 63.91 on 1 and 3008 DF,  p-value: 1.838e-15
```

接下来获得每一项的预测值：

```
d.hat <- fitted.values(tsls1) # 获得每一项的预测值
```

利用第一阶段的预测值结果，进行第二阶段回归分析：

```
# stage 2
tsls2 <- lm(formula = my_data$wage ~ d.hat)
summary(tsls2)

Call:
lm(formula = my_data$wage ~ d.hat)

Residuals:
    Min      1Q  Median      3Q     Max 
-505.64 -180.64  -35.05  121.50 1798.36 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -849.50     162.70  -5.221  1.9e-07 ***
d.hat         107.57      12.26   8.773  < 2e-16 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 259.7 on 3008 degrees of freedom
Multiple R-squared:  0.02495,	Adjusted R-squared:  0.02463 
F-statistic: 76.97 on 1 and 3008 DF,  p-value: < 2.2e-16
```

### 6.3-ivreg包实现

不难发现，ivreg 的拟合结果，实际上就是我们6.2 步骤里计算的结果：

```
# ivreg
ivreg_iv <- ivreg(wage ~ education | nearcollege, data = my_data)
summary(ivreg_iv)

> summary(ivreg_iv)

Call:
ivreg(formula = wage ~ education | nearcollege, data = my_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-867.231 -205.763   -9.156  198.343 1673.630 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -849.5      204.3  -4.157 3.31e-05 ***
education      107.6       15.4   6.985 3.48e-12 ***

Diagnostic tests:
                  df1  df2 statistic  p-value    
Weak instruments    1 3008     63.91 1.84e-15 ***
Wu-Hausman          1 3007     44.89 2.48e-11 ***
Sargan              0   NA        NA       NA    
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 326.2 on 3008 degrees of freedom
Multiple R-Squared: -0.538,	Adjusted R-squared: -0.5385 
Wald test:  48.8 on 1 and 3008 DF,  p-value: 3.482e-12 
```

除此之外，你可能还会注意到ivreg 的summary 结果中，默认配置了三种Diagnostic tests，下面会进行简单介绍。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://peng-6.gitbook.io/mendelian_randomization/shi-zhan/6r-yu-yan-zui-xiao-er-cheng-fa-shi-zhan.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
