6-R语言最小二乘法实战

6-R语言最小二乘法实战

其实R 包ivreg已经提供了方法进行最小二乘法(2SLS),不过为了了解整个R 的操作过程,这里我还是使用最基本的lm 进行二次拟合,实现整个过程。

6.1-准备与获得数据

这里我使用ivreg 中的数据集:

my_packages<- c("ggplot2", "tidyverse", 
                "RColorBrewer", "paletteer", "ivreg")
tmp <- sapply(my_packages, function(x) library(x, character.only = T)); rm(tmp, my_packages)
data("SchoolingReturns", package = "ivreg")
my_data <- SchoolingReturns[, 1:8]
glimpse(my_data)

Rows: 3,010
Columns: 8
$ wage        <dbl> 548, 481, 721, 250, 729, 500, 565, 608…
$ education   <dbl> 7, 12, 12, 11, 12, 12, 18, 14, 12, 12,…
$ experience  <dbl> 16, 9, 16, 10, 16, 8, 9, 9, 10, 11, 13…
$ ethnicity   <fct> afam, other, other, other, other, othe…
$ smsa        <fct> yes, yes, yes, yes, yes, yes, yes, yes…
$ south       <fct> no, no, no, no, no, no, no, no, no, no…
$ age         <dbl> 29, 27, 34, 27, 34, 26, 33, 29, 28, 29…
$ nearcollege <fct> no, no, no, yes, yes, yes, yes, yes, y…

6.2-两次回归找拟合值

首先是第一阶段,通过工具变量,对暴露因素求回归:

# stage 1

tsls1 <- lm(formula = education ~ nearcollege, data = my_data)
summary(tsls1)

Call:
lm(formula = education ~ nearcollege, data = my_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.698  -1.527  -0.527   2.473   5.302 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    12.69801    0.08564 148.269  < 2e-16 ***
nearcollegeyes  0.82902    0.10370   7.994 1.84e-15 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.649 on 3008 degrees of freedom
Multiple R-squared:  0.02081,	Adjusted R-squared:  0.02048 
F-statistic: 63.91 on 1 and 3008 DF,  p-value: 1.838e-15

接下来获得每一项的预测值:

d.hat <- fitted.values(tsls1) # 获得每一项的预测值

利用第一阶段的预测值结果,进行第二阶段回归分析:

# stage 2
tsls2 <- lm(formula = my_data$wage ~ d.hat)
summary(tsls2)

Call:
lm(formula = my_data$wage ~ d.hat)

Residuals:
    Min      1Q  Median      3Q     Max 
-505.64 -180.64  -35.05  121.50 1798.36 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -849.50     162.70  -5.221  1.9e-07 ***
d.hat         107.57      12.26   8.773  < 2e-16 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 259.7 on 3008 degrees of freedom
Multiple R-squared:  0.02495,	Adjusted R-squared:  0.02463 
F-statistic: 76.97 on 1 and 3008 DF,  p-value: < 2.2e-16

6.3-ivreg包实现

不难发现,ivreg 的拟合结果,实际上就是我们6.2 步骤里计算的结果:

# ivreg
ivreg_iv <- ivreg(wage ~ education | nearcollege, data = my_data)
summary(ivreg_iv)

> summary(ivreg_iv)

Call:
ivreg(formula = wage ~ education | nearcollege, data = my_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-867.231 -205.763   -9.156  198.343 1673.630 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -849.5      204.3  -4.157 3.31e-05 ***
education      107.6       15.4   6.985 3.48e-12 ***

Diagnostic tests:
                  df1  df2 statistic  p-value    
Weak instruments    1 3008     63.91 1.84e-15 ***
Wu-Hausman          1 3007     44.89 2.48e-11 ***
Sargan              0   NA        NA       NA    
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 326.2 on 3008 degrees of freedom
Multiple R-Squared: -0.538,	Adjusted R-squared: -0.5385 
Wald test:  48.8 on 1 and 3008 DF,  p-value: 3.482e-12 

除此之外,你可能还会注意到ivreg 的summary 结果中,默认配置了三种Diagnostic tests,下面会进行简单介绍。

Last updated