python memory allocation when using chr() -


i new python , want have list 2 elements first 1 integer between 0 , 2 billion, other 1 number between 0 10. have large number of these lists (billions).

suppose use chr() function add second argument list. example:

first_number = 123456678 second_number = chr(1) mylist = [first_number,second_number] 

in case how python allocate memory? assume second argument char , give (1 byte + overheads) or assume second argument string? if thinks string there way can define , enforce char or make how more memory efficient?

edit --> added more information why need data structure

here more information want do:

i have sparse weighted graph 2 billion edges , 25 million nodes. represent graph tried create dictionary (because need fast lookup) in keys nodes (as integers). these nodes represented number between 0 2 billion (there no relation between , number of edges). edges represented this: each of nodes (or keys in dictionary ) keeping list of list. each element of list of list list have explained above. first 1 represent other node , second argument represents weight of edge between key , first argument. example, graph contain 5 nodes, if have

 {1: [[2, 1], [3, 1], [4, 2], [5, 1]], 2: [[5, 1]], 3: [[5, 2]], 4: [[6, 1]], 5: [[6, 1]]} 

it means node 1 has 4 edges: 1 goes node 2 weight 1, 1 goes node 3, weight 1, 1 goes node 4 weight 2, etc.

i looking see if make more memory efficient making second argument of edge smaller.

using single character string take same amount memory small integer because cpython create 1 object of each value, , use object every time needs string or integer of value. using strings take bit more space, it'll insignificant.

but lets answer real question, how can reduce amount of memory python program uses? first i'll calculate how memory objects want create use. i'm using 64-bit version of python 2.7 numbers other 64-bit versions of python should similar.

starting off have 1 dict object, has 25 million nodes. python use 2^26 hash buckets dict of size, , each bucket 24 bytes. comes 1.5 gb dict itself.

the dict have 25 million keys, of them int objects, , each of them 24 bytes. comes total of 570 mb integers represent nodes. have 25 million list objects values. each list take 72 bytes plus 8 bytes per element in list. these lists have total of 2 billion elements, they'll take total of 16.6 gb.

each of these 2 billion list elements refer list object that's 2 elements long. comes whopping 164 gb. each of 2 element lists refer 2 different int objects. news, while appears total of 4 billion integer objects, it's 2 billion different integer objects. there 1 object created each of small integer values used in second element. that's total 44.7 gb of memory used integer objects referred first element.

that comes @ least 227 gb of memory you'll need data structure plan implement it. working through list i'll explain how possible reduce memory you'll need more practical.

the 44.7 gb of memory used int objects represent nodes in 2 element edge lists easiest deal with. since there 25 million nodes, don't need 2 billion different objects, 1 each node value. since you're using node values keys can reuse objects. that's 44.7 gb gone right there, , depending on how build data structure might not take effort ensure no redudant node value objects created. brings total down 183 gb.

next lets tackle 164 gb needed 2 element edge list objects. it's possible can share list objects happen have same node value , weighting, can better. eliminate edges lists flatting lists of lists. you'll have bit arithmetic access correct elements, unless have system huge amount of memory you're going have make compromises. list objects used dict values have double in length, increasing total size 16.1 gb 31.5 gb. makes net savings flatting lists nice 149 gb, bringing total down more reasonable 33.5 gb.

going farther trickier. 1 possibility use arrays. unlike lists elements don't refer other objects, value stored in each element. array.array object 56 bytes long plus size of elements in case 32-bit integers. adds 16.2 gb net savings of 15.3 gb. total 18.3 gb.

it's possible squeeze little more space taking advantage of fact weights small integers fit in single byte characters. create 2 array.array objects each node, 1 32-bit integers node values, , other 8-bit integers weights. because there 2 array objects, use tuple object hold pair. total size of these objects 13.6 gb. not big savings on single array don't need arithmetic access elements, need switch how index them. total down 15.66 gb.

finally last thing can think of save memory have 2 array.array objects. dict values become tuple objects refer 2 int objects. first index 2 arrays, second length. representation takes 11.6 gb of memory, small net decrease, total becoming 13.6 gb.

that final total of 13.6 gb should work on machine 16 gb of ram without swapping, won't leave room else.


Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

python - Django-cities exits with "killed" -

python - How to get a widget position inside it's layout in Kivy? -