方差膨胀因子 vif
方差膨胀因子 VIF:Variance inflation factor
Variance inflation factor
In statistics, the
variance inflation factor (VIF) quantifies the severity of
multicollinearity
(多重共线性)in an ordinary
least squares regression(普通最小二乘回归)
analysis. It provides an index that measures how much the variance
A measure of the amount of multicollinearity in a set of multiple regression variables. The presence of multicollinearity within the set of independent variables can cause a number of problems in the understanding the significance of individual independent variables in the regression model. Using variance inflation factors helps to identify multicollinearity issues so that the model can be adjusted.
Investopedia Says:
The variance inflation factor allows a quick
measure of how much a variable is contributing to the standard
error (回归参数的标准差)
Definition
Consider the following linear model with k independent variables:
Y = β0 + β1 X1 + β2 X 2 + ... + βk Xk + ε.
The standard error of the estimate of βj is the square root of the j+1, j+1 element of s2(X′X)−1, where s is the standard error of the estimate (SEE) (note that SEE2 is an unbiased estimator of the true variance of the error term, σ2); X is the regression design matrix — a matrix such that Xi, j+1 is the value of the jth independent variable for the ith case or observation, and such that Xi, 1 equals 1 for all i. It turns out that the square of this standard error, the estimated variance of the estimate of βj, can be equivalently expressed as
http://s9/middle/6e59e373gb3786ae6cfe8&690
where
Rj2 is the multiple
R2 for the regression of Xj
on the other covariates (a regression that does not involve the
response variable Y). This identity separates the influences
of several distinct factors on the variance of the coefficient
estimate:
·
·
·
The remaining term, 1
Calculation and analysis
The VIF can be calculated and analyzed in three steps:
Step one
Calculate k different VIFs, one for each
Xi by first running an ordinary least square
regression that has Xi as a function of all the
other explanatory variables in the first equation.
If i = 1, for example, the equation would be
http://s14/middle/6e59e373gb37873371b6d&690
where
c0 is a constant and e is the error
term
Step two
Then, calculate the VIF factor for
http://s9/middle/6e59e373g7858d88dbcd8&690
where
R2i is the coefficient
of determination (决定系数)of the regression equation in step
one.
Step three
Analyze the magnitude of multicollinearity
by considering the size of thehttp://s1/middle/6e59e373gb37877cc8c30&690
then
multicollinearity is high. Also 10 has been proposed (see Kutner
book referenced below) as a cut off value.
Some software calculates the tolerance which is just the reciprocal of the VIF. The choice of which to use is a matter of personal preference of the researcher.
Interpretation
The square root of the variance inflation factor tells you how much larger the standard error is, compared with what it would be if that variable were uncorrelated with the other independent variables in the equation.
Example
If the variance inflation factor of an independent variable
were
References
·
Longnecker, M.T & Ott, R.L
· Studenmund, A.H: Using Econometrics: A practical guide, 5th Edition, page 258–259. Pearson International Edition, 2006.
· Hair JF, Anderson R, Tatham RL, Black WC: Multivariate Data Analysis. Prentice Hall: Upper Saddle River, N.J. 2006.
· Marquardt, D.W. 1970 "Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation", Technometrics 12(3), 591, 605–07
· Allison, P.D. Multiple Regression: a primer, page 142. Pine Forge Press: Thousand Oaks, C.A. 1999.
· Kutner, Nachtsheim, Neter, Applied Linear Regression Models, 4th edition, McGraw-Hill Irwin, 2004.
ps:
PS:使用Eviews6不能直接计算VIF,可以分别首先计算出各个Rk2 ,再计算VIF值
ls LNINCOME LNPG LNPNC LNPUC C
genr VIF1=1/(1-.870007)
ls LNPG LNINCOME
genr VIF2=1/(1-.919054)
ls LNPNC LNPG LNINCOME
genr VIF3=1/(1-.986568)
ls LNPUC LNPNC LNPG LNINCOME
genr vif4=1/(1-.988127)
另一软件Stata提供了VIF的计算结果,所以尽量使用这种较容易的办法获得。
原文地址:http://blog.sina.com.cn/s/blog_6e59e3730100vvdh.html
补充:
1。计算每个自变量的vif时,都是将要计算的自变量x1作为因变量,其它自变量依然是自变量,做回归后,得到 multiple R-Squared ,也就是复相关系数平方R2i,
再得到x1对应的vif,以此类推,得到其它自变量作为的vif.若存在vif>10,则以此剔除最高vif对应的自变量,直到所有自变量的vif都小于10.
2. Rj2 is the multiple R2 for the regression of Xj on the other covariates,即Rj2是一元线性回归或多元线性回归后的结果中的 multiple R-Squared 。 实际上就是对应的coefficient of determination,用来评价数据对line或curve的拟合程度。在一元回归中,就是自变量x和预测的因变量y相关系数 ( correlation coefficient)的平方;在多元回归中,就是复相关系数(coefficient of multiple correlation)的平方。
3。相关系数计算:
http://upload.wikimedia.org/math/6/8/6/68654488b517714870216d44f1ce8459.pngvif" />
复相关系数计算公式:r=sqrt(SSR/SSY),其中SSR,SSY分别为回归平方和,总平方和。SSR=SSY+SSE,SSE为误差平方和。

加载中…