Saturday 15 February 2014

Simple Way of NOT reading last N lines of a file in Python -



Simple Way of NOT reading last N lines of a file in Python -

i'd read file line line, except lastly n lines. how know stop, without reaching end of file , tracking / discarding lastly n lines, in python? asking # lines = x, , looping (x-n) way go this?

what's simplest / pythonic way of doing this?

three different solutions:

1) quick , dirty, see john's answer:

with open(file_name) fid: lines = fid.readlines() line in lines[:-n_skip]: do_something_with(line)

the disadvantage of method have read lines in memory first, might problem big files.

2) 2 passes

process file twice, 1 time count number of lines n_lines, , in sec pass process first n_lines - n_skip lines:

# first pass count open(file_name) fid: n_lines = sum(1 line in fid) # sec pass open(file_name) fid: i_line in xrange(n_lines - n_skip): # nil if n_lines <= n_skip line = fid.readline() do_something_with(line)

the disadvantage of method have iterate on file twice, might slower in cases. thing, however, never have more 1 line in memory.

3) utilize buffer, similar serge's solution

in case want iterate on file once, know sure can process line i if know line i + n_skip exists. means have maintain n_skip lines in temporary buffer first. 1 way implement sort of fifo buffer (e.g. generator function implements circular buffer):

def fifo(it, n): buffer = [none] * n # preallocate buffer = 0 total = false item in it: # leaves lastly n items in buffer when iterator exhausted if full: yield buffer[i] # yield old item before storing new item buffer[i] = item = (i + 1) % n if == 0: # wrapped around @ to the lowest degree 1 time total = true

quick test range of numbers:

in [12]: in fifo(range(20), 5): ...: print i, 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

the way utilize file:

with open(file_name) fid: line in fifo(fid, n_skip): do_something_with(line)

note requires plenty memory temporary store n_skip lines, still improve reading lines in memory in first solution.

which 1 of these 3 methods best trade-off between code complexity, memory , speed, depends on exact application.

python file-io

No comments:

Post a Comment