python - Pandas series - recording numerical changes -
i have panel dataframe
many observations on individuals' location data on 10 years. looks this:
personid location_1991 location_1992 location_1993 location_1994 0 111 1 1 2 2 1 233 3 3 4 999 2 332 1 3 3 3 3 454 2 2 2 2 4 567 2 1 1 1
i want track transitions of each person creating variable each type of transition. i'd column mark whenever person transitions each location type. ideally like:
personid transition_to_1 transition_to_2 transition_to_3 transition_to_4 0 111 0 1 0 0 1 233 0 0 0 1 2 332 0 0 1 0 3 454 0 0 0 0 4 567 1 0 0 0
so far, i've tried iterate through each row, , loop through each element in row check if same previous one. seems time intensive. there better way track change in values in each row of dataframe?
i did combination of first stacking columns, pivoting along them.
df = pd.dataframe(pd.read_clipboard()) df2 = pd.dataframe(df.set_index('personid').stack(), columns=['location']) df2.reset_index(inplace=true) df2.reset_index(inplace=true) df3 = df2.pivot(index='index', columns='location', values='personid') df3 = df3.fillna(0)
so far, looks this:
location 1 2 3 4 999 index 0 111 0 0 0 0 1 111 0 0 0 0 2 0 111 0 0 0 3 0 111 0 0 0 4 0 0 233 0 0 5 0 0 233 0 0 6 0 0 0 233 0 7 0 0 0 0 233 8 332 0 0 0 0 9 0 0 332 0 0 10 0 0 332 0 0 11 0 0 332 0 0 12 0 454 0 0 0 13 0 454 0 0 0 14 0 454 0 0 0 15 0 454 0 0 0 16 0 567 0 0 0 17 567 0 0 0 0 18 567 0 0 0 0 19 567 0 0 0 0 df3['personid'] = df3.max(axis=0, skipna=true) df3 = df3.set_index('personid', drop=true) df3[df3 > 0] = 1
and there goes:
location 1 2 3 4 999 personid 111 1 0 0 0 0 567 1 0 0 0 0 567 0 1 0 0 0 332 0 1 0 0 0 233 0 0 1 0 0 233 0 0 1 0 0 233 0 0 0 1 0 233 0 0 0 0 1 332 1 0 0 0 0 332 0 0 1 0 0 332 0 0 1 0 0 332 0 0 1 0 0 454 0 1 0 0 0 454 0 1 0 0 0 454 0 1 0 0 0 454 0 1 0 0 0 567 0 1 0 0 0 567 1 0 0 0 0 567 1 0 0 0 0 567 1 0 0 0 0
Comments
Post a Comment