决策树与随机森林的R语言实现
1.用party包构建决策树
以iris数据集为例。
用ctree()建立决策树,用predict()对新数据进行预测。
训练集与测试集划分:
[ruby] view plain copy
> str(iris)
'data.frame': 150 obs.
of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5
4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4
2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4
1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3
0.2 0.2 0.1 ...
$
Species
: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
...
> set.seed(1234)
> ind <- sample(2, nrow(iris),
replace=TRUE, prob=c(0.7, 0.3))
> trainData <- iris[ind==1,]
> testData <- iris[ind==2,]
用默认参数来建立决策树:
[ruby] view plain copy
> library(party)
> myFormula <- Species ~ Sepal.Length
Sepal.Width Petal.Length Petal.Width
> iris_ctree <- ctree(myFormula,
data=trainData)
> # check the prediction
> table(predict(iris_ctree), trainData$Species)
setosa versicolor virginica
setosa
40
0
0
versicolor
0
37
3
virginica
0
1
31
输出规则并绘制已构建好的决策树以便查看。
[ruby] view plain copy
> print(iris_ctree)
Conditional inference tree with 4 terminal nodes
Response: Species
Inputs: Sepal.Length, Sepal.Width, Petal.Length,
Petal.Width
Number of
observations: 112