Breedlove: python - pyparsing whitespace match issues -

Saturday 15 January 2011

python - pyparsing whitespace match issues -

i tried utilize pyparsing parse robotframework, text based dsl. sytnax next ( sorry, think it's little hard me describe in bnf). single line in robotframework may looks like:

library\tsshclient name\tnode

\t tab, , in robotframework, transparently transfered 2 " "(in fact, phone call str.replace('\t', ' ') replace tab, modified length of each line, len('\t') 1 len(' ') 2.). in robot, 2 , more whitespaces , '\t' used split token, if there 1 whitespaces between words, words considered token group.

library\tsshclient name\tnode

is splitted next tokens if parsed correctly:

['library', 'sshclient', 'with name', 'node']

as there 1 whitespace between "with" , "name", parser considers belong grouping syntax token.

here code:

parserelement.setdefaultwhitespacechars('\r\n\t ') source = "library\tsshclient    name\tnode" each_line = optional(word(" ")).leavewhitespace().suppress() + \             caselesskeyword("library").suppress() + \             oneormore((word(alphas)) + white(max=1).setresultname('myvalue')) +\             skipto(lineend())  res = each_line.parsestring(source) print res.myvalue

questions:

1) set whitespaces, if want matched 2 or more whitespaces or 1 or more tab, thought code like: white(ws=' ', min=2)| white(ws='\t', min=1) fail, not specify whitespace value?

2) there way matched result index? tried setparseaction, seems not index callback. need both start , end index highlight word.

3) linestart , lineend means ? print these values, seems normal string, have write in front end of line like: linestart() + balabala... + lineend() ?

thanks, however, there restriction not replace '\t' ' '

from pyparsing import *  source = "library\tsshclient\t\t\twith name    s1"  value = combine(oneormore(word(printables) | white(' ', max=1) + ~white()))  #here seems whitespace has been set ' ', why result still match '\t'?  linedefn = oneormore(value)  res = linedefn.parsestring(source)  print res

i got

['library sshclient', 'with name', 's1']

but expected ['library', 'sshclient', 'with name', 's1']

i flinch when whitespace creeps parsed tokens, constraints single spaces allowed, should workable. used next look define values have embedded single spaces:

# each value consists of printable words separated @  # single space (a space not followed space) value = combine(oneormore(word(printables) | white(' ',max=1) + ~white()))

with done, line 1 or more of these values:

linedefn = oneormore(value)

following example, including calling str.replace replace tabs pairs of spaces, code looks like:

data = "library\tsshclient    name\tnode"  # replace tabs 2 spaces   info = data.replace('\t', '  ')  print linedefn.parsestring(data)

giving:

['library', 'sshclient', 'with name', 'node']

to start , end locations of values in original string, wrap look in new pyparsing helper method locatedexpr:

#  utilize new locatedexpr value, start, , end location  # each value linedefn = oneormore(locatedexpr(value))('values')

if parse , dump results:

print linedefn.parsestring(data).dump()

we get:

- values:    [0]:     [0, 'library', 7]     - locn_end: 7     - locn_start: 0     - value: library   [1]:     [9, 'sshclient', 18]     - locn_end: 18     - locn_start: 9     - value: sshclient   [2]:     [22, 'with name', 31]     - locn_end: 31     - locn_start: 22     - value: name   [3]:     [33, 'node', 37]     - locn_end: 37     - locn_start: 33     - value: node

linestart , lineend pyparsing look classes instances should match @ start , end of line. linestart has been hard work with, lineend predictable. in case, if read , parse line @ time, shouldn't need them - define contents of line expect. if want ensure parser has processed entire string (and not stopped short of end because of non-matching character), add together + lineend() or + stringend() end of parser, or add together argument parseall=true phone call parsestring().

edit:

it easy forget pyparsing calls str.expandtabs default - have disable calling parsewithtabs. that, , explicitly disallowing tabs between value words resolves problem, , keeps values @ right character counts. see changes below:

from pyparsing import * tab = white('\t')  # each value consists of printable words separated @  # single space (a space not followed space) value = combine(oneormore(~tab + (word(printables) | white(' ',max=1) + ~white())))  # each line has 1 or more of these values linedefn = oneormore(value) # not expand tabs before parsing linedefn.parsewithtabs()     info = "library\tsshclient    name\tnode"  # replace tabs 2 spaces #data = data.replace('\t', '  ')  print linedefn.parsestring(data)   linedefn = oneormore(locatedexpr(value))('values') # not expand tabs before parsing linedefn.parsewithtabs() print linedefn.parsestring(data).dump()

python pyparsing

Breedlove

Saturday 15 January 2011

python - pyparsing whitespace match issues -

No comments:

Post a Comment