Breedlove: Create new line based on each regex match in python -

Friday, 15 January 2010

Create new line based on each regex match in python -

i have input file contains info formatted follows:

a; b, c| derp derp "x1234567, y1234567, z1234567" derp derp a; b, c|

i utilize python parse multiple lines each item occurs between double quotes.

the output above illustration be:

a; b, c| derp derp x1234567 derp derp a; b, c|

a; b, c| derp derp y1234567 derp derp a; b, c|

a; b, c| derp derp z1234567 derp derp a; b, c|

so far have this:

import re prefix = re.compile ('^(.*?)"') pattern = re.compile('\"(.*?)([a-z]{1}[0-9]{7})(.*?)\"') suffix = re.compile ('"(.*?)$') i, line in enumerate(open('myfile.txt')):     match in re.finditer(pattern, line):         print prefix, match.group(), suffix

but seems homecoming first match of each of contents.

in situation it's alot more work (in opinion) utilize regex rather simple string , list manipulations. such:

#!/usr/bin/env pytohn  open('myfile.txt','r') f:     lines = readlines(f)  line in lines:     line = line.strip()     start = line.find('"')     end = line.find('"',start+1)       info = line[start+1:end].split(',')       info = [x.strip() x in data]     x in data:         print line[:start],x,line[end+1:]

here's found after taking @ code posted:

you're printing sre_pattern objects prefix , suffix in print line. should record matches prefix , suffix on every iteration of outer loop. calling match.group() homecoming entire match, not what's in parentheses. think want match.group(1) in cases. having pattern defined matches 1 string because searches sequentially through lines starting quotation mark followed rest of pattern. hence gets index first quotation mark, checks 1 time pattern, finds x1234567 moves on. i'm not sure why have backslashes before quotation marks in pattern, don't think special characters. in suffix, match first quotation mark not second, , suffix include stuff between quotation marks. the print statement insert spaces between items if utilize commas, should concatenate them using + instead.

and here ended regex:

#!/usr/bin/env python  import re  prefix = re.compile('^(.*?)"') quotes = re.compile('".*?(.*).*?"') pattern = re.compile('[a-z]{1}[0-9]{7}') suffix = re.compile('".*"(.*?)$')  (i,line) in enumerate(open('myfile.txt')):     pre = prefix.search(line).group(1)       info = quotes.search(line).group(1)     suf = suffix.search(line).group(1)     match in re.finditer(pattern,data):         print pre+match.group(0)+suf

hope helps, questions please ask. regex tricky beast @ best of times.

python regex

Breedlove

Friday, 15 January 2010

Create new line based on each regex match in python -

No comments:

Post a Comment