python - Feature counts don't match -

August 15, 2010

i'm using scikit simple classification task. have test , train data set, shapes follows: train = (1000, 69917) , test = (1073, 49429). when like:

clf.fit(x_train, y_train) predicted = clf.predict(x_test)

i following error:

valueerror: x has 49429 features per sample; expecting 69917

since x_train used train model, during prediction stage model expect x_test have exact same feature dimension (i.e. number of columns).

you mentioned x_train , x_test produced using countvectorizer. cause of problem called fit (or fit_transform) twice, producing 2 different transformations. prevent happening, ensure there 1 call tofit:

from sklearn.feature_extraction.text import countvectorizer vec = countvectorizer() x_train = vec.fit_transform(x_train_raw) x_test = vec.transform(x_test_raw) # not fit_transform!

this way, test data transformed using exact same set of vocabulary learnt training data.

Search This Blog

O9

python - Feature counts don't match -

Comments

Post a Comment

Popular posts from this blog

java - How to specify maven bin in eclipse maven plugin? -

single sign on - Logging into Plone site with credentials passed through HTTP -

php - Why does AJAX not process login form? -