python - Panda's read_csv always crashes on small file -


i trying import rather small (217 rows, 87 colums, 15k) csv file analysis in python using panda. file rather poorly structured, still import it, since raw data not want manipulate manually outside python (e.g. excel). unfortunately leads crash "the kernel appears have died. restart automatically".

https://www.wakari.io/sharing/bundle/uniquely/readcsv

did research indicated possible crashes read_csv, large files, not understand problem. crash happens both using local installation (anaconda 64-bit, ipython (py 2.7) notebook) , wakari.

can me? appreciated. lot!

code:

# have somehow ugly, illustrative csv file, not big, 217 rows, 87 colums. # file can downloaded @ http://www.win2day.at/download/lo_1986.csv  # in[1]:  file_csv = 'lo_1986.csv' f = open(file_csv, mode="r") x = 0 line in f:     print x, ": ", line     x = x + 1 f.close()   # i'd import csv python using pandas - lead crash: # "the kernel appears have died. restart automatically."  # in[ ]:  import pandas pd pd.read_csv(file_csv, delimiter=';')  # doing wrong? 

it because of invalid character (e.g. 0xe0) in file

if add encoding parameter read_csv() call, see stacktrace instead of segfault

>>> df = pandas.read_csv("/tmp/lo_1986.csv", delimiter=";", encoding="utf-8") traceback (most recent call last):   file "<stdin>", line 1, in <module>   file "/users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 400, in parser_f     return _read(filepath_or_buffer, kwds)   file "/users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 205, in _read     return parser.read()   file "/users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 608, in read     ret = self._engine.read(nrows)   file "/users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1028, in read     data = self._reader.read(nrows)   file "parser.pyx", line 706, in pandas.parser.textreader.read (pandas/parser.c:6745)   file "parser.pyx", line 728, in pandas.parser.textreader._read_low_memory (pandas/parser.c:6964)   file "parser.pyx", line 804, in pandas.parser.textreader._read_rows (pandas/parser.c:7780)   file "parser.pyx", line 890, in pandas.parser.textreader._convert_column_data (pandas/parser.c:8793)   file "parser.pyx", line 950, in pandas.parser.textreader._convert_tokens (pandas/parser.c:9484)   file "parser.pyx", line 1026, in pandas.parser.textreader._convert_with_dtype (pandas/parser.c:10642)   file "parser.pyx", line 1051, in pandas.parser.textreader._string_convert (pandas/parser.c:10905)   file "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas/parser.c:15657) unicodedecodeerror: 'utf8' codec can't decode byte 0xe0 in position 0: unexpected end of data 

you can preprocessing remove these characters before asking pandas read in file

attached picture highlight invalid characters in file

enter image description here


Comments

Popular posts from this blog

java - How to specify maven bin in eclipse maven plugin? -

single sign on - Logging into Plone site with credentials passed through HTTP -

php - Why does AJAX not process login form? -