r - rpart not splitting obvious nodes -


i using data set of 54k records , 5 classes(pop) of 1 class insignicant. using caret package , following run rpart:

model <- train(pop ~ pe + chl_small, method = "rpart", data = training) 

and following tree:

n= 54259   node), split, n, loss, yval, (yprob)   * denotes terminal node  1) root 54259 38614 pico (0.0014 0.18 0.29 0.25 0.28)    2) pe< 5004 39537 23961 pico (0 0.22 0.39 2.5e-05 0.38)     4) chl_small< 32070.5 16948  2900 pico (0 0.00012 0.83 5.9e-05 0.17) *   5) chl_small>=32070.5 22589 10281 ultra (0 0.39 0.068 0 0.54) * 3) pe>=5004 14722  1113 synecho (0.0052 0.052 0.0047 0.92 0.013) * 

it obvious node 5 should further split, rpart not doing it. tried using cp = .001 cp =.1 , minbucket = 1000 additional parameters, no improvement.

appreciate on this.

try running model smaller cp=0.00001 or cp = -1. if still not splitting node means split not improve overall fit.

you can try changing splitting criteria default gini impurity information gain criterion: parms = list(split = "information")

if force split, might idea quick check: compare accuracy of training vs testing set original model , model small cp.

if difference between training vs testing smaller original model other model overfits data.


Comments

Popular posts from this blog

java - How to specify maven bin in eclipse maven plugin? -

single sign on - Logging into Plone site with credentials passed through HTTP -

php - Why does AJAX not process login form? -