Friday, 15 January 2010

Create new line based on each regex match in python -



Create new line based on each regex match in python -

i have input file contains info formatted follows:

a; b, c| derp derp "x1234567, y1234567, z1234567" derp derp a; b, c|

i utilize python parse multiple lines each item occurs between double quotes.

the output above illustration be:

a; b, c| derp derp x1234567 derp derp a; b, c|

a; b, c| derp derp y1234567 derp derp a; b, c|

a; b, c| derp derp z1234567 derp derp a; b, c|

so far have this:

import re prefix = re.compile ('^(.*?)"') pattern = re.compile('\"(.*?)([a-z]{1}[0-9]{7})(.*?)\"') suffix = re.compile ('"(.*?)$') i, line in enumerate(open('myfile.txt')): match in re.finditer(pattern, line): print prefix, match.group(), suffix

but seems homecoming first match of each of contents.

in situation it's alot more work (in opinion) utilize regex rather simple string , list manipulations. such:

#!/usr/bin/env pytohn open('myfile.txt','r') f: lines = readlines(f) line in lines: line = line.strip() start = line.find('"') end = line.find('"',start+1) info = line[start+1:end].split(',') info = [x.strip() x in data] x in data: print line[:start],x,line[end+1:]

here's found after taking @ code posted:

you're printing sre_pattern objects prefix , suffix in print line. should record matches prefix , suffix on every iteration of outer loop. calling match.group() homecoming entire match, not what's in parentheses. think want match.group(1) in cases. having pattern defined matches 1 string because searches sequentially through lines starting quotation mark followed rest of pattern. hence gets index first quotation mark, checks 1 time pattern, finds x1234567 moves on. i'm not sure why have backslashes before quotation marks in pattern, don't think special characters. in suffix, match first quotation mark not second, , suffix include stuff between quotation marks. the print statement insert spaces between items if utilize commas, should concatenate them using + instead.

and here ended regex:

#!/usr/bin/env python import re prefix = re.compile('^(.*?)"') quotes = re.compile('".*?(.*).*?"') pattern = re.compile('[a-z]{1}[0-9]{7}') suffix = re.compile('".*"(.*?)$') (i,line) in enumerate(open('myfile.txt')): pre = prefix.search(line).group(1) info = quotes.search(line).group(1) suf = suffix.search(line).group(1) match in re.finditer(pattern,data): print pre+match.group(0)+suf

hope helps, questions please ask. regex tricky beast @ best of times.

python regex

No comments:

Post a Comment