加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

方差膨胀因子 vif

(2014-01-04 17:50:52)

方差膨胀因子 VIF:Variance inflation factor方差膨胀因子 <wbr>vif

Variance inflation factor

In statistics, the variance inflation factor (VIF) quantifies the severity of multicollinearity (多重共线性)in an ordinary least squares regression(普通最小二乘回归) analysis. It provides an index that measures how much the variance  (方差)(the square of the estimate's standard deviation (标准差)) of an estimated regression coefficient is increased because of collinearity.

A measure of the amount of multicollinearity in a set of multiple regression variables. The presence of multicollinearity within the set of independent variables can cause a number of problems in the understanding the significance of individual independent variables in the regression model. Using variance inflation factors helps to identify multicollinearity issues so that the model can be adjusted.

Investopedia Says:

The variance inflation factor allows a quick measure of how much a variable is contributing to the standard error (回归参数的标准差) in the regression.??? When significant multicollinearity issues exist, the variance inflation factor will be very large for the variables involved. After these variables are identified, there are several approaches that can be used to eliminate or combine collinear variables, resolving the multicollinearity issue.  

Definition

Consider the following linear model with k independent variables:

Y = β0 + β1 X1 + β2 X 2 + ... + βk Xk + ε.

The standard error of the estimate of βj is the square root of the j+1, j+1 element of s2(XX)1, where s is the standard error of the estimate (SEE) (note that SEE2 is an unbiased estimator of the true variance of the error term, σ2); X is the regression design matrix — a matrix such that Xi, j+1 is the value of the jth independent variable for the ith case or observation, and such that Xi, 1 equals 1 for all i. It turns out that the square of this standard error, the estimated variance of the estimate of βj, can be equivalently expressed as

http://s9/middle/6e59e373gb3786ae6cfe8&690

where Rj2 is the multiple R2 for the regression of Xj on the other covariates (a regression that does not involve the response variable Y). This identity separates the influences of several distinct factors on the variance of the coefficient estimate:

·         s2: greater scatter in the data around the regression surface leads to proportionately more variance in the coefficient estimates

·         n: greater sample size results in proportionately less variance in the coefficient estimates

·        http://s6/middle/6e59e373gb3786bb6a815&690 : greater variability in a particular covariate leads to proportionately less variance in the corresponding coefficient estimate

The remaining term, (1  Rj2) is the VIF. It reflects all other factors that influence the uncertainty in the coefficient estimates. The VIF equals 1 when the vector Xj is orthogonal to each column of the design matrix for the regression of Xj on the other covariates. By contrast, the VIF is greater than 1 when the vector Xj is not orthogonal to all columns of the design matrix for the regression of Xj on the other covariates. Finally, note that the VIF is invariant to the scaling of the variables (that is, we could scale each variable Xj by a constant cj without changing the VIF).

Calculation and analysis

The VIF can be calculated and analyzed in three steps:

Step one

Calculate k different VIFs, one for each Xi by first running an ordinary least square regression that has Xi as a function of all the other explanatory variables in the first equation.
If i = 1, for example, the equation would be

http://s14/middle/6e59e373gb37873371b6d&690
where c0 is a constant and e is the error term (误差项).

Step two

Then, calculate the VIF factor for  with the following formula:

http://s9/middle/6e59e373g7858d88dbcd8&690

where R2i is the coefficient of determination (决定系数)of the regression equation in step one.

Step three

Analyze the magnitude of multicollinearity by considering the size of thehttp://s1/middle/6e59e373gb37877cc8c30&690 . A common rule of thumb is that if http://s3/middle/6e59e373gb37878c930e2&690

then multicollinearity is high. Also 10 has been proposed (see Kutner book referenced below) as a cut off value.

Some software calculates the tolerance which is just the reciprocal of the VIF. The choice of which to use is a matter of personal preference of the researcher.

Interpretation

The square root of the variance inflation factor tells you how much larger the standard error is, compared with what it would be if that variable were uncorrelated with the other independent variables in the equation.

Example
If the variance inflation factor of an independent variable were

 5.27 (√5.27 2.3) this means that the standard error for the coefficient of that independent variable is 2.3 times as large as it would be if that independent variable were uncorrelated with the other independent variables.

References

· Longnecker, M.T & Ott, R.L :A First Course in Statistical Methods, page 615. Thomson Brooks/Cole, 2004.

· Studenmund, A.H: Using Econometrics: A practical guide, 5th Edition, page 258–259. Pearson International Edition, 2006.

· Hair JF, Anderson R, Tatham RL, Black WC: Multivariate Data Analysis. Prentice Hall: Upper Saddle River, N.J. 2006.

· Marquardt, D.W. 1970 "Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation", Technometrics 12(3), 591, 605–07

· Allison, P.D. Multiple Regression: a primer, page 142. Pine Forge Press: Thousand Oaks, C.A. 1999.

· Kutner, Nachtsheim, Neter, Applied Linear Regression Models, 4th edition, McGraw-Hill Irwin, 2004.

ps:

PS:使用Eviews6不能直接计算VIF,可以分别首先计算出各个Rk2 ,再计算VIF

ls LNINCOME LNPG LNPNC LNPUC C

genr VIF1=1/(1-.870007)

ls LNPG LNINCOME  LNPNC LNPUC C

genr VIF2=1/(1-.919054)

ls LNPNC LNPG LNINCOME   LNPUC C

genr VIF3=1/(1-.986568)

ls LNPUC LNPNC LNPG LNINCOME    C

genr vif4=1/(1-.988127)

另一软件Stata提供了VIF的计算结果,所以尽量使用这种较容易的办法获得。

原文地址:http://blog.sina.com.cn/s/blog_6e59e3730100vvdh.html

 

 

补充:

1。计算每个自变量的vif时,都是将要计算的自变量x1作为因变量,其它自变量依然是自变量,做回归后,得到 multiple R-Squared ,也就是复相关系数平方R2i

再得到x1对应的vif,以此类推,得到其它自变量作为的vif.若存在vif>10,则以此剔除最高vif对应的自变量,直到所有自变量的vif都小于10.

2. Rj2 is the multiple R2 for the regression of Xj on the other covariates,即Rj2是一元线性回归或多元线性回归后的结果中的 multiple R-Squared 。 实际上就是对应的coefficient of determination,用来评价数据对line或curve的拟合程度。在一元回归中,就是自变量x和预测的因变量y相关系数 ( correlation coefficient)的平方;在多元回归中,就是复相关系数(coefficient of multiple correlation)的平方。

3。相关系数计算:

http://upload.wikimedia.org/math/6/8/6/68654488b517714870216d44f1ce8459.pngvif" />

复相关系数计算公式:r=sqrt(SSR/SSY),其中SSR,SSY分别为回归平方和,总平方和。SSR=SSY+SSE,SSE为误差平方和。

 

 

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有