Create new line based on each regex match in python -
i have input file contains info formatted follows:
a; b, c| derp derp "x1234567, y1234567, z1234567" derp derp a; b, c|
i utilize python parse multiple lines each item occurs between double quotes.
the output above illustration be:
a; b, c| derp derp x1234567 derp derp a; b, c|
a; b, c| derp derp y1234567 derp derp a; b, c|
a; b, c| derp derp z1234567 derp derp a; b, c|
so far have this:
import re prefix = re.compile ('^(.*?)"') pattern = re.compile('\"(.*?)([a-z]{1}[0-9]{7})(.*?)\"') suffix = re.compile ('"(.*?)$') i, line in enumerate(open('myfile.txt')): match in re.finditer(pattern, line): print prefix, match.group(), suffix
but seems homecoming first match of each of contents.
in situation it's alot more work (in opinion) utilize regex rather simple string , list manipulations. such:
#!/usr/bin/env pytohn open('myfile.txt','r') f: lines = readlines(f) line in lines: line = line.strip() start = line.find('"') end = line.find('"',start+1) info = line[start+1:end].split(',') info = [x.strip() x in data] x in data: print line[:start],x,line[end+1:]
here's found after taking @ code posted:
you're printingsre_pattern
objects prefix
, suffix
in print line. should record matches prefix , suffix on every iteration of outer loop. calling match.group()
homecoming entire match, not what's in parentheses. think want match.group(1)
in cases. having pattern
defined matches 1 string because searches sequentially through lines starting quotation mark followed rest of pattern. hence gets index first quotation mark, checks 1 time pattern, finds x1234567
moves on. i'm not sure why have backslashes before quotation marks in pattern
, don't think special characters. in suffix
, match first quotation mark not second, , suffix
include stuff between quotation marks. the print statement insert spaces between items if utilize commas, should concatenate them using +
instead. and here ended regex:
#!/usr/bin/env python import re prefix = re.compile('^(.*?)"') quotes = re.compile('".*?(.*).*?"') pattern = re.compile('[a-z]{1}[0-9]{7}') suffix = re.compile('".*"(.*?)$') (i,line) in enumerate(open('myfile.txt')): pre = prefix.search(line).group(1) info = quotes.search(line).group(1) suf = suffix.search(line).group(1) match in re.finditer(pattern,data): print pre+match.group(0)+suf
hope helps, questions please ask. regex tricky beast @ best of times.
python regex
No comments:
Post a Comment