io - Reading a very large file word by word in Python -
i have pretty large text files (>2g) process word word. files space-delimited text files no line breaks (all words in single line). want take each word, test if dictionary word (using enchant), , if so, write new file.
this code right now:
with open('big_file_of_words', 'r') in_file: open('output_file', 'w') out_file: words = in_file.read().split(' ') word in word: if d.check(word) == true: out_file.write("%s " % word) i looked @ lazy method reading big file in python, suggests using yield read in chunks, concerned using chunks of predetermined size split words in middle. basically, want chunks close specified size while splitting on spaces. suggestions?
combine last word of 1 chunk first of next:
def read_words(filename): last = "" open(filename) inp: while true: buf = inp.read(10240) if not buf: break words = (last+buf).split() last = words.pop() word in words: yield word yield last open('output.txt') output: word in read_words('input.txt'): if check(word): output.write("%s " % word)
Comments
Post a Comment