regression - Regress categorical variables in Matlab -


i have cell type variable 12 columns , 20000 rows. call atotal:

atotal= [aty1;aty2;aty3;aty4;aty5;aty6;aty7;aty8;aty9;aty10;aty11;aty12;aty13;aty14;aty15;aty16;aty17];  atotal={   972   1  0 0 0 0 0  21   60  118  60110  2001            973   0  0 1 0 0 0  15   46  1496 60110  2001            980   0  0 0 0 1 0  4    68  142  40502  2001            994   1  0 0 0 0 0  13   33  86   81101  2001            995   0  0 0 1 0 0  9    55  183  31201  2001            1024  1  0 0 0 0 0  10   26  3    80803  2001} 

i dependent , independent variables there:

y1=cell2mat(atotal(:,2)); x1=cell2mat(atotal(:,3)); 

and regress them. considering dependent variable y1 binary , independent variable x1 categorical variable, use follwoing code, still not sure if correct one.

mdl1 = fitlm(x1,y1,'categoricalvars',logical([1])); 

then add more dummies , try same code:

x2=cell2mat(atotal(:,4)); x3=cell2mat(atotal(:,5)); x4=cell2mat(atotal(:,6)); x5=cell2mat(atotal(:,7));  mdl2 = fitlm(x1,x2,x3,x4,x5,y1,'categoricalvars',logical([1,2,3,4,5])); 

but gives me lt of errors:

error using internal.stats.parseargs (line 42) parameter name must text.  error in linearmodel.fit (line 849)             [intercept,predictorvars,responsevar,weights,exclude, ...  error in fitlm (line 117) model = linearmodel.fit(x,varargin{:}); 

could me? thank you

i think there 2 problems code.

the first problem fitlm expects following arguments:

mdl = fitlm(x,y,modelspec) 

which means have collect predictor variables 1 matrix, , use first argument. should following:

x = [x1, x2, x3, x4, x5]; fitlm(x, y1, ...) 

the second problem categoricalvars argument fitlm expects either logical vector (a vector 1 variable categorical, , 0 continuous) or numeric index vector. correct usage is:

x = [x1, x2, x3, x4, x5]; fitlm(x, y1, 'categoricalvars',logical([1,1,1,1,1])) 

or

x = [x1, x2, x3, x4, x5]; fitlm(x, y1, 'categoricalvars', [1,2,3,4,5]) 

the above code snippets should work properly.

however consider declaring categorical variables categorical (if have matlab r2013b or above). in case following:

x1 = categorical(cell2mat(atotal(:,3))); x2 = categorical(cell2mat(atotal(:,4))); x3 = categorical(cell2mat(atotal(:,5))); x4 = categorical(cell2mat(atotal(:,6))); x5 = categorical(cell2mat(atotal(:,7)));  x = [x1, x2, x3, x4, x5]; fitlm(x, y1) 

the advantage of approach matlab knows xi variables categorical, , treated accordingly, not have specify categoricalvars argument every time want run regression.

finally, matlab documentation of fitlm function lot of examples, check out too.

note: others have mentioned in comments, should consider running logit regression response variable binary. in case estimate model following way:

x = [x1, x2, x3, x4, x5]; fitglm(x, y1, 'distribution', 'binomial', 'link', 'logit') 

however if sure understand logistic model is, assumptions , interpretation of coefficients.


Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

javascript - Highcharts multi-color line -

javascript - Enter key does not work in search box -