machine learning - mixed predicator types for Random forest -
i trying build classification model using random forest data set 5 predicator variables. 2 predicator variable of continuous type, 1 can real value in interval of [0, 1000] while other 1 can real value of [-10, 10]; 1 predicator variable of integer values of [10000, 15000]. in addition, 2 remaining predicator variables of categorical values, i.e., { a, b, c, d, e f} , {ny, la, chicago}. there procedures required pre-processing these different predicator types?
many of exhaustive search algorithms biased towards variables many values. separating variable selection , split selection process seems this, described in this paper. have package implemented in r well. don't know of way avoid mixed type data using more common approaches. however, despite fact problem results in bias, in experience predictive performance hasn't been hugely different, mileage may vary. depends on doing. i'd simulations either way. same group has 2 bmc bioinformatics papers on conditional permutation importance discuss these issues.
Comments
Post a Comment