python - Panda's read_csv always crashes on small file -
i trying import rather small (217 rows, 87 colums, 15k) csv file analysis in python using panda. file rather poorly structured, still import it, since raw data not want manipulate manually outside python (e.g. excel). unfortunately leads crash "the kernel appears have died. restart automatically".
https://www.wakari.io/sharing/bundle/uniquely/readcsv
did research indicated possible crashes read_csv, large files, not understand problem. crash happens both using local installation (anaconda 64-bit, ipython (py 2.7) notebook) , wakari.
can me? appreciated. lot!
code:
# have somehow ugly, illustrative csv file, not big, 217 rows, 87 colums. # file can downloaded @ http://www.win2day.at/download/lo_1986.csv # in[1]: file_csv = 'lo_1986.csv' f = open(file_csv, mode="r") x = 0 line in f: print x, ": ", line x = x + 1 f.close() # i'd import csv python using pandas - lead crash: # "the kernel appears have died. restart automatically." # in[ ]: import pandas pd pd.read_csv(file_csv, delimiter=';') # doing wrong?
it because of invalid character (e.g. 0xe0) in file
if add encoding parameter read_csv() call, see stacktrace instead of segfault
>>> df = pandas.read_csv("/tmp/lo_1986.csv", delimiter=";", encoding="utf-8") traceback (most recent call last): file "<stdin>", line 1, in <module> file "/users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 400, in parser_f return _read(filepath_or_buffer, kwds) file "/users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 205, in _read return parser.read() file "/users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 608, in read ret = self._engine.read(nrows) file "/users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1028, in read data = self._reader.read(nrows) file "parser.pyx", line 706, in pandas.parser.textreader.read (pandas/parser.c:6745) file "parser.pyx", line 728, in pandas.parser.textreader._read_low_memory (pandas/parser.c:6964) file "parser.pyx", line 804, in pandas.parser.textreader._read_rows (pandas/parser.c:7780) file "parser.pyx", line 890, in pandas.parser.textreader._convert_column_data (pandas/parser.c:8793) file "parser.pyx", line 950, in pandas.parser.textreader._convert_tokens (pandas/parser.c:9484) file "parser.pyx", line 1026, in pandas.parser.textreader._convert_with_dtype (pandas/parser.c:10642) file "parser.pyx", line 1051, in pandas.parser.textreader._string_convert (pandas/parser.c:10905) file "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas/parser.c:15657) unicodedecodeerror: 'utf8' codec can't decode byte 0xe0 in position 0: unexpected end of data you can preprocessing remove these characters before asking pandas read in file
attached picture highlight invalid characters in file

Comments
Post a Comment