python - pyparsing whitespace match issues -
i tried utilize pyparsing parse robotframework, text based dsl. sytnax next ( sorry, think it's little hard me describe in bnf). single line in robotframework may looks like:
library\tsshclient name\tnode
\t tab, , in robotframework, transparently transfered 2 " "(in fact, phone call str.replace('\t', ' ') replace tab, modified length of each line, len('\t') 1 len(' ') 2.). in robot, 2 , more whitespaces , '\t' used split token, if there 1 whitespaces between words, words considered token group.
library\tsshclient name\tnode
is splitted next tokens if parsed correctly:
['library', 'sshclient', 'with name', 'node']
as there 1 whitespace between "with" , "name", parser considers belong grouping syntax token.
here code:
parserelement.setdefaultwhitespacechars('\r\n\t ') source = "library\tsshclient name\tnode" each_line = optional(word(" ")).leavewhitespace().suppress() + \ caselesskeyword("library").suppress() + \ oneormore((word(alphas)) + white(max=1).setresultname('myvalue')) +\ skipto(lineend()) res = each_line.parsestring(source) print res.myvalue
questions:
1) set whitespaces, if want matched 2 or more whitespaces or 1 or more tab, thought code like: white(ws=' ', min=2)| white(ws='\t', min=1) fail, not specify whitespace value?
2) there way matched result index? tried setparseaction, seems not index callback. need both start , end index highlight word.
3) linestart , lineend means ? print these values, seems normal string, have write in front end of line like: linestart() + balabala... + lineend() ?
thanks, however, there restriction not replace '\t' ' '
from pyparsing import * source = "library\tsshclient\t\t\twith name s1" value = combine(oneormore(word(printables) | white(' ', max=1) + ~white())) #here seems whitespace has been set ' ', why result still match '\t'? linedefn = oneormore(value) res = linedefn.parsestring(source) print res
i got
['library sshclient', 'with name', 's1']
but expected ['library', 'sshclient', 'with name', 's1']
i flinch when whitespace creeps parsed tokens, constraints single spaces allowed, should workable. used next look define values have embedded single spaces:
# each value consists of printable words separated @ # single space (a space not followed space) value = combine(oneormore(word(printables) | white(' ',max=1) + ~white()))
with done, line 1 or more of these values:
linedefn = oneormore(value)
following example, including calling str.replace replace tabs pairs of spaces, code looks like:
data = "library\tsshclient name\tnode" # replace tabs 2 spaces info = data.replace('\t', ' ') print linedefn.parsestring(data)
giving:
['library', 'sshclient', 'with name', 'node']
to start , end locations of values in original string, wrap look in new pyparsing helper method locatedexpr
:
# utilize new locatedexpr value, start, , end location # each value linedefn = oneormore(locatedexpr(value))('values')
if parse , dump results:
print linedefn.parsestring(data).dump()
we get:
- values: [0]: [0, 'library', 7] - locn_end: 7 - locn_start: 0 - value: library [1]: [9, 'sshclient', 18] - locn_end: 18 - locn_start: 9 - value: sshclient [2]: [22, 'with name', 31] - locn_end: 31 - locn_start: 22 - value: name [3]: [33, 'node', 37] - locn_end: 37 - locn_start: 33 - value: node
linestart , lineend pyparsing look classes instances should match @ start , end of line. linestart has been hard work with, lineend predictable. in case, if read , parse line @ time, shouldn't need them - define contents of line expect. if want ensure parser has processed entire string (and not stopped short of end because of non-matching character), add together + lineend()
or + stringend()
end of parser, or add together argument parseall=true
phone call parsestring()
.
edit:
it easy forget pyparsing calls str.expandtabs default - have disable calling parsewithtabs. that, , explicitly disallowing tabs between value words resolves problem, , keeps values @ right character counts. see changes below:
from pyparsing import * tab = white('\t') # each value consists of printable words separated @ # single space (a space not followed space) value = combine(oneormore(~tab + (word(printables) | white(' ',max=1) + ~white()))) # each line has 1 or more of these values linedefn = oneormore(value) # not expand tabs before parsing linedefn.parsewithtabs() info = "library\tsshclient name\tnode" # replace tabs 2 spaces #data = data.replace('\t', ' ') print linedefn.parsestring(data) linedefn = oneormore(locatedexpr(value))('values') # not expand tabs before parsing linedefn.parsewithtabs() print linedefn.parsestring(data).dump()
python pyparsing
No comments:
Post a Comment