Parsing huge XML file using Go -
we need parse huge xml file using go. we'd utilize sax-like event based algorithm using xml.newdecoder()
, decoder.token()
library calls. we've created appropriate struct types xml annotations. easy peasy far.
now, go through file , observe xml.startelement
tokens. , here comes problem. need decode attributes of starting token , go on content. if phone call token.decodeelement()
whole content "decoded" or skipped in our scenario.
how decode attributes of specific startelement
, go on element's body?
i parse wikipedia xml dumps (~50gb xml files) in go-wikiparse using plain struct/reflect decoding. it's super simple.
the strategy this:
first, read envelope token:
d := xml.newdecoder(r) _, err := d.token() if err != nil { homecoming nil, err }
e.g., <somedocument><billions-of-other-things/></somedocument>
give somedocument.
then, can struct decode next things in loop:
var item d.decode(&i)
not much ram, , it's super easy parse.
xml go sax
No comments:
Post a Comment