加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

[转载]迭代重复加权最小二乘法(IRLS)

(2014-04-18 10:33:03)
标签:

转载

分类: CVPRML

1. Theory

http://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares (迭代重复加权最小二乘法)

http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_sect033.htm (SAS Online Help)

 

2. Implementation in R

The following R code uses IRLS algorithm to estimate parameters of Logistic Regression model, and the sample data we used is from Machine Learning Ex4 - Logistic Regression. It is a dataset representing 40 students who were admitted to college and 40 students who were not admitted, and their corresponding grades for 2 exams. Our mission, is to build a binary classification model that estimates college admission chances based on a student's scores on two exams(test1 and test2).

 

To run the codes in R, please follow these steps:

 

1. Download the csv file by clickling the URL:

http://spreadsheets.google.com/pub?key=0AnypY27pPCJydC1vRVEzM1VJQnNneFo5dWNzR1F5Umc&output=csv

 

2. Run the R scripts to build the model

# reading data
mydata = read.csv("c://stat/ex4.csv", header = TRUE)


# plots
plot(mydata$test1[mydata$admitted == 0], mydata$test2[mydata$admitted == 0], xlab="test1", ylab="test2", , col="red")
points(mydata$test1[mydata$admitted == 1], mydata$test2[mydata$admitted == 1], col="blue", pch=2)
legend("bottomright", c("not admitted", "admitted"), pch=c(1, 2), col=c("red", "blue") )

 

http://s16/bmiddle/8db50cf7tcf9d98c043ff&690

# build the model
g <- function(x) { return(log(x/(1 - x))) } 

 

g.inv <- function(y) { return(exp(y)/(1 + exp(y))) } 

 

g.prime = function(x) {
return(1/(x * (1 - x)))
}


V = function(mu) {
return(mu * (1 - mu))
}


niter = 10

mu = rep(mean(mydata$admitted), length(mydata$admitted))


for (i in 1:niter) {

Z = g(mu) + g.prime(mu) * (mydata$admitted - mu)

W = 1/(g.prime(mu)^2 * V(mu))

Z.lm = lm(Z ~ test1 + test2, weights = W, data = mydata)

eta = predict(Z.lm)

mu = g.inv(eta)

beta = Z.lm$coef

}

 

# print estimated parameters 

print(beta)


(Intercept) test1 test2
-16.3787434 0.1483408 0.1589085

 

# get 2 points (that will define a line)
x1 = c(min(mydata$test1), max(mydata$test1))

 

# when ax0 + bx2 + cx3 = 0 is the middle(decision boundary line),
# so given x1 from sample data, solving to x2, we get:
x2 = (-1/beta[3]) * ((beta[2] * x1) + beta[1])


# plot
plot(x1,x2, type='l', xlab="test1", ylab="test2")
points(mydata$test1[mydata$admitted == 0], mydata$test2[mydata$admitted == 0], col="red")
points(mydata$test1[mydata$admitted == 1], mydata$test2[mydata$admitted == 1], col="blue", pch=2)
legend("bottomright", c("not admitted", "admitted"), pch=c(1, 2), col=c("red", "blue") )

 

http://s16/mw690/8db50cf7tcf9d30e8fc4f&690

 

The original R code is from

http://www.stanford.edu/class/stats191/logistic.html, and I changed it a little.

 


3.  Implementation in SAS

proc import datafile="c:statex4.csv"
   out=ex4
   dbms=csv
   replace;
   getnames=yes;
run;

 

proc logistic data=ex4 des;
   model admitted=test1 test2/tech=fisher;
run;quit;

 

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept 1 -16.3787 3.6559 20.0717 <.0001
test1 1 0.1483 0.0408 13.2009 0.0003
test2 1 0.1589 0.0416 14.5613 0.0001

 

 

Association of Predicted Probabilities and
Observed Responses
Percent Concordant 89.5 Somers' D 0.791
Percent Discordant 10.4 Gamma 0.792
Percent Tied 0.1 Tau-a 0.401
Pairs 1600 c 0.896

0

  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有