python - Pandas series - recording numerical changes -

August 15, 2011

i have panel dataframe many observations on individuals' location data on 10 years. looks this:

     personid     location_1991   location_1992  location_1993  location_1994  0    111          1               1             2              2  1    233          3               3             4              999   2    332          1               3             3               3  3    454          2               2             2               2              4    567          2               1             1               1

i want track transitions of each person creating variable each type of transition. i'd column mark whenever person transitions each location type. ideally like:

     personid     transition_to_1    transition_to_2   transition_to_3   transition_to_4        0    111          0                  1                 0                 0  1    233          0                  0                 0                 1   2    332          0                  0                 1                 0  3    454          0                  0                 0                 0              4    567          1                  0                 0                 0

so far, i've tried iterate through each row, , loop through each element in row check if same previous one. seems time intensive. there better way track change in values in each row of dataframe?

i did combination of first stacking columns, pivoting along them.

df = pd.dataframe(pd.read_clipboard()) df2 = pd.dataframe(df.set_index('personid').stack(), columns=['location']) df2.reset_index(inplace=true) df2.reset_index(inplace=true) df3 = df2.pivot(index='index', columns='location', values='personid') df3 = df3.fillna(0)

so far, looks this:

location  1    2    3    4    999 index                             0         111    0    0    0    0 1         111    0    0    0    0 2           0  111    0    0    0 3           0  111    0    0    0 4           0    0  233    0    0 5           0    0  233    0    0 6           0    0    0  233    0 7           0    0    0    0  233 8         332    0    0    0    0 9           0    0  332    0    0 10          0    0  332    0    0 11          0    0  332    0    0 12          0  454    0    0    0 13          0  454    0    0    0 14          0  454    0    0    0 15          0  454    0    0    0 16          0  567    0    0    0 17        567    0    0    0    0 18        567    0    0    0    0 19        567    0    0    0    0  df3['personid'] = df3.max(axis=0, skipna=true) df3 = df3.set_index('personid', drop=true) df3[df3 > 0] = 1

and there goes:

location  1    2    3    4    999 personid                          111         1    0    0    0    0 567         1    0    0    0    0 567         0    1    0    0    0 332         0    1    0    0    0 233         0    0    1    0    0 233         0    0    1    0    0 233         0    0    0    1    0 233         0    0    0    0    1 332         1    0    0    0    0 332         0    0    1    0    0 332         0    0    1    0    0 332         0    0    1    0    0 454         0    1    0    0    0 454         0    1    0    0    0 454         0    1    0    0    0 454         0    1    0    0    0 567         0    1    0    0    0 567         1    0    0    0    0 567         1    0    0    0    0 567         1    0    0    0    0

Search This Blog

O9

python - Pandas series - recording numerical changes -

Comments

Post a Comment

Popular posts from this blog

Error while updating a record in APEX screen -

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

ios - Xcode 5 "No such file or directory" -