regression - Regress categorical variables in Matlab -
i have cell type variable 12 columns , 20000 rows. call atotal:
atotal= [aty1;aty2;aty3;aty4;aty5;aty6;aty7;aty8;aty9;aty10;aty11;aty12;aty13;aty14;aty15;aty16;aty17]; atotal={ 972 1 0 0 0 0 0 21 60 118 60110 2001 973 0 0 1 0 0 0 15 46 1496 60110 2001 980 0 0 0 0 1 0 4 68 142 40502 2001 994 1 0 0 0 0 0 13 33 86 81101 2001 995 0 0 0 1 0 0 9 55 183 31201 2001 1024 1 0 0 0 0 0 10 26 3 80803 2001}
i dependent , independent variables there:
y1=cell2mat(atotal(:,2)); x1=cell2mat(atotal(:,3));
and regress them. considering dependent variable y1 binary , independent variable x1 categorical variable, use follwoing code, still not sure if correct one.
mdl1 = fitlm(x1,y1,'categoricalvars',logical([1]));
then add more dummies , try same code:
x2=cell2mat(atotal(:,4)); x3=cell2mat(atotal(:,5)); x4=cell2mat(atotal(:,6)); x5=cell2mat(atotal(:,7)); mdl2 = fitlm(x1,x2,x3,x4,x5,y1,'categoricalvars',logical([1,2,3,4,5]));
but gives me lt of errors:
error using internal.stats.parseargs (line 42) parameter name must text. error in linearmodel.fit (line 849) [intercept,predictorvars,responsevar,weights,exclude, ... error in fitlm (line 117) model = linearmodel.fit(x,varargin{:});
could me? thank you
i think there 2 problems code.
the first problem fitlm expects following arguments:
mdl = fitlm(x,y,modelspec)
which means have collect predictor variables 1 matrix, , use first argument. should following:
x = [x1, x2, x3, x4, x5]; fitlm(x, y1, ...)
the second problem categoricalvars
argument fitlm
expects either logical vector (a vector 1 variable categorical, , 0 continuous) or numeric index vector. correct usage is:
x = [x1, x2, x3, x4, x5]; fitlm(x, y1, 'categoricalvars',logical([1,1,1,1,1]))
or
x = [x1, x2, x3, x4, x5]; fitlm(x, y1, 'categoricalvars', [1,2,3,4,5])
the above code snippets should work properly.
however consider declaring categorical variables categorical (if have matlab r2013b or above). in case following:
x1 = categorical(cell2mat(atotal(:,3))); x2 = categorical(cell2mat(atotal(:,4))); x3 = categorical(cell2mat(atotal(:,5))); x4 = categorical(cell2mat(atotal(:,6))); x5 = categorical(cell2mat(atotal(:,7))); x = [x1, x2, x3, x4, x5]; fitlm(x, y1)
the advantage of approach matlab knows xi
variables categorical, , treated accordingly, not have specify categoricalvars
argument every time want run regression.
finally, matlab documentation of fitlm
function lot of examples, check out too.
note: others have mentioned in comments, should consider running logit regression response variable binary. in case estimate model following way:
x = [x1, x2, x3, x4, x5]; fitglm(x, y1, 'distribution', 'binomial', 'link', 'logit')
however if sure understand logistic model is, assumptions , interpretation of coefficients.
Comments
Post a Comment