python - Feature counts don't match -
i'm using scikit simple classification task. have test , train data set, shapes follows: train = (1000, 69917) , test = (1073, 49429). when like:
clf.fit(x_train, y_train) predicted = clf.predict(x_test) i following error:
valueerror: x has 49429 features per sample; expecting 69917
since x_train used train model, during prediction stage model expect x_test have exact same feature dimension (i.e. number of columns).
you mentioned x_train , x_test produced using countvectorizer. cause of problem called fit (or fit_transform) twice, producing 2 different transformations. prevent happening, ensure there 1 call tofit:
from sklearn.feature_extraction.text import countvectorizer vec = countvectorizer() x_train = vec.fit_transform(x_train_raw) x_test = vec.transform(x_test_raw) # not fit_transform! this way, test data transformed using exact same set of vocabulary learnt training data.
Comments
Post a Comment