machine learning - mixed predicator types for Random forest -

March 15, 2013

i trying build classification model using random forest data set 5 predicator variables. 2 predicator variable of continuous type, 1 can real value in interval of [0, 1000] while other 1 can real value of [-10, 10]; 1 predicator variable of integer values of [10000, 15000]. in addition, 2 remaining predicator variables of categorical values, i.e., { a, b, c, d, e f} , {ny, la, chicago}. there procedures required pre-processing these different predicator types?

many of exhaustive search algorithms biased towards variables many values. separating variable selection , split selection process seems this, described in this paper. have package implemented in r well. don't know of way avoid mixed type data using more common approaches. however, despite fact problem results in bias, in experience predictive performance hasn't been hugely different, mileage may vary. depends on doing. i'd simulations either way. same group has 2 bmc bioinformatics papers on conditional permutation importance discuss these issues.

Search This Blog

O9

machine learning - mixed predicator types for Random forest -

Comments

Post a Comment

Popular posts from this blog

java - How to specify maven bin in eclipse maven plugin? -

single sign on - Logging into Plone site with credentials passed through HTTP -

php - Why does AJAX not process login form? -