io - Reading a very large file word by word in Python -


i have pretty large text files (>2g) process word word. files space-delimited text files no line breaks (all words in single line). want take each word, test if dictionary word (using enchant), , if so, write new file.

this code right now:

with open('big_file_of_words', 'r') in_file:         open('output_file', 'w') out_file:             words = in_file.read().split(' ')             word in word:                 if d.check(word) == true:                     out_file.write("%s " % word) 

i looked @ lazy method reading big file in python, suggests using yield read in chunks, concerned using chunks of predetermined size split words in middle. basically, want chunks close specified size while splitting on spaces. suggestions?

combine last word of 1 chunk first of next:

def read_words(filename):     last = ""     open(filename) inp:         while true:             buf = inp.read(10240)             if not buf:                 break             words = (last+buf).split()             last = words.pop()             word in words:                 yield word         yield last  open('output.txt') output:     word in read_words('input.txt'):         if check(word):             output.write("%s " % word) 

Comments

Popular posts from this blog

java - How to specify maven bin in eclipse maven plugin? -

single sign on - Logging into Plone site with credentials passed through HTTP -

php - Why does AJAX not process login form? -