r - rpart not splitting obvious nodes -
i using data set of 54k records , 5 classes(pop) of 1 class insignicant. using caret package , following run rpart:
model <- train(pop ~ pe + chl_small, method = "rpart", data = training) and following tree:
n= 54259 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 54259 38614 pico (0.0014 0.18 0.29 0.25 0.28) 2) pe< 5004 39537 23961 pico (0 0.22 0.39 2.5e-05 0.38) 4) chl_small< 32070.5 16948 2900 pico (0 0.00012 0.83 5.9e-05 0.17) * 5) chl_small>=32070.5 22589 10281 ultra (0 0.39 0.068 0 0.54) * 3) pe>=5004 14722 1113 synecho (0.0052 0.052 0.0047 0.92 0.013) * it obvious node 5 should further split, rpart not doing it. tried using cp = .001 cp =.1 , minbucket = 1000 additional parameters, no improvement.
appreciate on this.
try running model smaller cp=0.00001 or cp = -1. if still not splitting node means split not improve overall fit.
you can try changing splitting criteria default gini impurity information gain criterion: parms = list(split = "information")
if force split, might idea quick check: compare accuracy of training vs testing set original model , model small cp.
if difference between training vs testing smaller original model other model overfits data.
Comments
Post a Comment