6-R语言最小二乘法实战
6-R语言最小二乘法实战
其实R 包ivreg
已经提供了方法进行最小二乘法(2SLS),不过为了了解整个R 的操作过程,这里我还是使用最基本的lm 进行二次拟合,实现整个过程。
6.1-准备与获得数据
这里我使用ivreg 中的数据集:
my_packages<- c("ggplot2", "tidyverse",
"RColorBrewer", "paletteer", "ivreg")
tmp <- sapply(my_packages, function(x) library(x, character.only = T)); rm(tmp, my_packages)
data("SchoolingReturns", package = "ivreg")
my_data <- SchoolingReturns[, 1:8]
glimpse(my_data)
Rows: 3,010
Columns: 8
$ wage <dbl> 548, 481, 721, 250, 729, 500, 565, 608…
$ education <dbl> 7, 12, 12, 11, 12, 12, 18, 14, 12, 12,…
$ experience <dbl> 16, 9, 16, 10, 16, 8, 9, 9, 10, 11, 13…
$ ethnicity <fct> afam, other, other, other, other, othe…
$ smsa <fct> yes, yes, yes, yes, yes, yes, yes, yes…
$ south <fct> no, no, no, no, no, no, no, no, no, no…
$ age <dbl> 29, 27, 34, 27, 34, 26, 33, 29, 28, 29…
$ nearcollege <fct> no, no, no, yes, yes, yes, yes, yes, y…
6.2-两次回归找拟合值
首先是第一阶段,通过工具变量,对暴露因素求回归:
# stage 1
tsls1 <- lm(formula = education ~ nearcollege, data = my_data)
summary(tsls1)
Call:
lm(formula = education ~ nearcollege, data = my_data)
Residuals:
Min 1Q Median 3Q Max
-11.698 -1.527 -0.527 2.473 5.302
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.69801 0.08564 148.269 < 2e-16 ***
nearcollegeyes 0.82902 0.10370 7.994 1.84e-15 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.649 on 3008 degrees of freedom
Multiple R-squared: 0.02081, Adjusted R-squared: 0.02048
F-statistic: 63.91 on 1 and 3008 DF, p-value: 1.838e-15
接下来获得每一项的预测值:
d.hat <- fitted.values(tsls1) # 获得每一项的预测值
利用第一阶段的预测值结果,进行第二阶段回归分析:
# stage 2
tsls2 <- lm(formula = my_data$wage ~ d.hat)
summary(tsls2)
Call:
lm(formula = my_data$wage ~ d.hat)
Residuals:
Min 1Q Median 3Q Max
-505.64 -180.64 -35.05 121.50 1798.36
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -849.50 162.70 -5.221 1.9e-07 ***
d.hat 107.57 12.26 8.773 < 2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 259.7 on 3008 degrees of freedom
Multiple R-squared: 0.02495, Adjusted R-squared: 0.02463
F-statistic: 76.97 on 1 and 3008 DF, p-value: < 2.2e-16
6.3-ivreg包实现
不难发现,ivreg 的拟合结果,实际上就是我们6.2 步骤里计算的结果:
# ivreg
ivreg_iv <- ivreg(wage ~ education | nearcollege, data = my_data)
summary(ivreg_iv)
> summary(ivreg_iv)
Call:
ivreg(formula = wage ~ education | nearcollege, data = my_data)
Residuals:
Min 1Q Median 3Q Max
-867.231 -205.763 -9.156 198.343 1673.630
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -849.5 204.3 -4.157 3.31e-05 ***
education 107.6 15.4 6.985 3.48e-12 ***
Diagnostic tests:
df1 df2 statistic p-value
Weak instruments 1 3008 63.91 1.84e-15 ***
Wu-Hausman 1 3007 44.89 2.48e-11 ***
Sargan 0 NA NA NA
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 326.2 on 3008 degrees of freedom
Multiple R-Squared: -0.538, Adjusted R-squared: -0.5385
Wald test: 48.8 on 1 and 3008 DF, p-value: 3.482e-12
除此之外,你可能还会注意到ivreg 的summary 结果中,默认配置了三种Diagnostic tests,下面会进行简单介绍。
Last updated