使用支持向量机完成数据分类

分类: 分类算法 |
导入e1071包
使用svm函数训练支持向量机,trainset数据集作为输入数据集,churn是分类类别。
使用svm函数训练支持向量机,trainset数据集作为输入数据集,churn是分类类别。
> model=svm(churn~.,data=trainset,kernel="radial",cost=1,gamma=1/ncol(trainset))
利用summary获得建好的分类模型的所有信息:
> summary(model) Call: svm(formula = churn ~ ., data = trainset, kernel = "radial", cost = 1, gamma = 1/ncol(trainset)) Parameters: SVM-Type: C-classification SVM-Kernel: radial cost: 1 gamma: 0.05882353 Number of Support Vectors: 348 ( 140 208 ) Number of Classes: 2 Levels: yes no
使用iris数据集,调用subset函数获得iris数据集中的species值为setosa或virginica的阳历,选择样例在petal.length、petal.width、class列的投影:
> iris=read.csv("D://Rdata/iris.csv")
> str(iris) 'data.frame': 150 obs. of 5 variables: $ sepal.length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ speal.width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ petal.length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ petal.width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ class : Factor w/ 3 levels "Iris-setosa",..: 1 1 1 1 1 1 1 1 1 1 ...
> iris.subset=subset(iris,select=c("petal.length", + "petal.width", + "class"), + class %in% c("Iris-setosa","Iris-versicolor"))
> plot(x=iris.subset$petal.length,y=iris.subset$petal.width, + col=iris.subset$class,pch=19)
http://s14/mw690/006ySAqhzy78UgfzwUB0d&690
将惩罚因子设为1,利用iris.subset数据集训练SVM
> svm.model=svm(class~.,data=iris.subset,kernel="linear", + cost=1,scale=F)
将支持向量机用蓝色的圈标注出来:
> points(iris.subset[svm.model$index,c(1,2)],col="blue",cex=2)
http://s2/mw690/006ySAqhzy78Ugqmcy561&690
加分隔线
> w=t(svm.model$coefs) %*% svm.model$SV > b=-svm.model$rho > abline(a=-b/w[1,2],b=-w[1,1]/w[1,2],col="red",lty=5)
http://s10/mw690/006ySAqhzy78UguCtbre9&690
将惩罚因子设为10000,重新训练一个SVM分类器:
> plot(x=iris.subset$petal.length,y=iris.subset$petal.width, + col=iris.subset$class,pch=19) > svm.model=svm(class~.,data=iris.subset,kernel="linear", + cost=10000,scale=F) > points(iris.subset[svm.model$index,c(1,2)],col="blue",cex=2) > w=t(svm.model$coefs) %*% svm.model$SV > b=-svm.model$rho > abline(a=-b/w[1,2],b=-w[1,1]/w[1,2],col="red",lty=5)
SVM模型的可视化
data(iris)
model.iris=svm(Species~.,iris)
plot(model.iris,iris,Petal.Width~Petal.Length,
slice=list(Sepal.Width=3,Sepal.Length=4))
http://s16/mw690/006ySAqhzy78UhQhv1d5f&690
调用plot函数,绘制SVM对象model,x轴和y轴分别为tatal_intl_charge和total_day_minutes,
> plot(model,trainset,total_day_minutes~total_intl_charge)
http://s9/mw690/006ySAqhzy78Ui1ig9qc8&690
基于支持向量机训练模型实现类预测,churn数据集
> svm.pred=predict(model,testset[,!names(testset) %in% c("churn")])
> svm.table=table(svm.pred,testset$churn) > svm.table svm.pred yes no yes 24 5 no 53 418
调用classAgreement计算分类一致性系数:
> classAgreement(svm.table) $diag [1] 0.884 $kappa [1] 0.4024807 $rand [1] 0.794501 $crand [1] 0.3443065 调用confusionMatrix基于分类表评测预测性能
> confusionMatrix(svm.table) Confusion Matrix and Statistics svm.pred yes no yes 24 5 no 53 418 Accuracy : 0.884 95% CI : (0.8526, 0.9107) No Information Rate : 0.846 P-Value [Acc > NIR] : 0.009104 Kappa : 0.4025 Mcnemar's Test P-Value : 6.769e-10 Sensitivity : 0.3117 Specificity : 0.9882 Pos Pred Value : 0.8276 Neg Pred Value : 0.8875 Prevalence : 0.1540 Detection Rate : 0.0480 Detection Prevalence : 0.0580 Balanced Accuracy : 0.6499 'Positive' Class : yes
调整支持向量机,调用tune.svm调整,
> tuned=tune.svm(churn~.,data=trainset,gamma=10^(-6:-1), + cost=10^(1:2))
使用summary函数得到调整后的模型相关信息:
> summary(tuned) Parameter tuning of ‘svm’: - sampling method: 10-fold cross validation - best parameters: gamma cost 0.01 100 - best performance: 0.09605051 - Detailed performance results: gamma cost error dispersion 1 1e-06 10 0.15610101 0.04512672 2 1e-05 10 0.15610101 0.04512672 3 1e-04 10 0.15610101 0.04512672 4 1e-03 10 0.15610101 0.04512672 5 1e-02 10 0.10207071 0.02648941 6 1e-01 10 0.11010101 0.01879928 7 1e-06 100 0.15610101 0.04512672 8 1e-05 100 0.15610101 0.04512672 9 1e-04 100 0.15610101 0.04512672 10 1e-03 100 0.12911111 0.03439782 11 1e-02 100 0.09605051 0.02623349 12 1e-01 100 0.12110101 0.01957463
使用由tuning函数得到的最佳参数设置支持向量机
> model.tuned=svm(churn~.,data=trainset,gamma=tuned$best.parameters$gamma, + cost=tuned$best.parameters$cost) > summary(model.tuned) Call: svm(formula = churn ~ ., data = trainset, gamma = tuned$best.parameters$gamma, cost = tuned$best.parameters$cost) Parameters: SVM-Type: C-classification SVM-Kernel: radial cost: 100 gamma: 0.01 Number of Support Vectors: 254 ( 108 146 ) Number of Classes: 2 Levels: yes no
调用predict函数基于刚配置好的SVM模型进行类标号的预测
> svm.tuned.pred=predict(model.tuned,testset[,!names(testset) %in% c("churn")])
基于测试数据集的预测类别和世纪类别产生分类表:
> svm.tuned.table=table(svm.tuned.pred,testset$churn) > svm.tuned.table svm.tuned.pred yes no yes 35 15 no 42 408
调用classAgreement函数得到相关系数完成算法性能评测:
> classAgreement(svm.tuned.table) $diag [1] 0.886 $kappa [1] 0.4892473 $rand [1] 0.7975872 $crand [1] 0.4171314
> confusionMatrix(svm.tuned.table) Confusion Matrix and Statistics svm.tuned.pred yes no yes 35 15 no 42 408 Accuracy : 0.886 95% CI : (0.8548, 0.9125) No Information Rate : 0.846 P-Value [Acc > NIR] : 0.0062932 Kappa : 0.4892 Mcnemar's Test P-Value : 0.0005736 Sensitivity : 0.4545 Specificity : 0.9645 Pos Pred Value : 0.7000 Neg Pred Value : 0.9067 Prevalence : 0.1540 Detection Rate : 0.0700 Detection Prevalence : 0.1000 Balanced Accuracy : 0.7095 'Positive' Class : yes