Adv Quant: Decision Trees in R

Classification, Regression, and Conditional Tree Growth Algorithms

The variables used for tree growth algorithms are the log of benign prostatic hyperplasia amount (lbph), log of prostate-specific antigen (lpsa), Gleason score (gleason), log of capsular penetration (lcp) and log of the cancer volume (lcavol) to understand and predict tumor spread (seminal vesicle invasion=svi).

Results

5db3f1.PNG

Figure 1: Visualization of cross-validation results, for the classification tree (left) and regression tree (right).

5db3f2

Figure 2: Classification tree (left), regression tree (center), and conditional tree (right).

5db3f3.PNG

Figure 3: Summarization of tree data: (a) classification tree, (b) regression tree, and (c) conditional tree.

Discussion

For the classification tree growth algorithm, the head node is the seminal vesicle invasion which helps show the tumor spread in this dataset, and the cross-validation results show that there is only one split in the tree, with an x-value relative value for the first split of 0.71429 (Figure 1 & Figure 3a), and an x-value standard deviation of 0.16957 (Figure 3a).  The variable that was used to split the tree was the log of capsular penetration (Figure 2), when the log of capsular penetration at <1.791.

Next, for the regression tree growth algorithm, there are three leaf nodes, because the algorithm split the data three times.  In this case, the relative error for the first split is 1.00931, and a standard deviation of 0.18969 and at the second split the relative error is 0.69007 and a standard deviation of 0.15773 (Figure 1 & Figure 3b).  The tree was split at first at the log of capsular penetration at <1.791, and with the log of prostate specific antigen value at <2.993 (Figure 2).  It is interesting that the first split occurred at the same value for these two different tree growth algorithm, but that the relative errors and standard deviations were different and that the regression tree created one more level.

Finally, the conditional tree growth algorithm produced a split at <1.749 of the log capsular penetration at the 0.001 significance level and <2.973 for the log of prostate specific antigen also at the 0.001 significance level (Figure 2 & Figure 3c).  The results are similar to the regression tree, with the same number of leaf nodes and values in which they are split against, but more information is gained from the conditional tree growth algorithm than the classification and regression tree growth algorithm.

Code

#

### ———————————————————————————————————-

## Use the prostate cancer dataset available in R, in which biopsy results are given for 97 men.

## Goal:  Predict tumor spread in this dataset of 97 men who had undergone a biopsy.

## The measures to be used for prediction are BPH=lbhp, PSA=lpsa, Gleason Score=gleason, CP=lcp,

## and size of prostate=lcavol.

### ———————————————————————————————————-

##

install.packages(“lasso2”)

library(lasso2)

data(“Prostate”)

install.packages(“rpart”)

library(rpart)

## Grow a classification tree

classification = rpart(svi~lbph+lpsa+gleason+lcp+lcavol, data=Prostate, method=”class”)

printcp(classification) # display the results

plotcp(classification)  # visualization cross-validation results

plot(classification, uniform = T, main=”Classification Tree for prostate cancer”) # plot tree

text(classification, use.n = T, all = T, cex=.8)                                  # create text on the tree

## Grow a regression tree

Regression = rpart(svi~lbph+lpsa+gleason+lcp+lcavol, data=Prostate, method=”anova”)

printcp(Regression) # display the results

plotcp(Regression)  # visualization cross-validation results

plot(Regression, uniform = T, main=”Regression Tree for prostate cancer”) # plot tree

text(Regression, use.n = T, all = T, cex=.8)                              # create text on the tree

install.packages(“party”)

library(party)

## Grow a conditional inference tree

conditional = ctree(svi~lbph+lpsa+gleason+lcp+lcavol, data=Prostate)

conditional # display the results

plot(conditional, main=”Conditional inference tree for prostate cancer”)

References

Advertisements