Thursday 15 August 2013

c++11 - C++: Parsing a string of numbers with parentheses in it -



c++11 - C++: Parsing a string of numbers with parentheses in it -

this seems trivial can't seem around this. have stl strings of format 2013 336 (02 dec) 04 (where 04 hour, that's irrelevant). i'd extract day of month (02 in example) , month hour.

i'm trying cleanly , avoid e.g. splitting string @ parentheses , working substrings etc. ideally i'd utilize stringstream , redirect variables. code i've got right is:

int year, dayofyear, day; std::string month, leftparenthesis, rightparenthesis; std::string examplestring = "2013 336 (02 dec) 04"; std::istringstream yeardaymonthhourstringstream( examplestring ); yeardaymonthhourstringstream >> year >> dayofyear >> leftparenthesis >> day >> month >> rightparenthesis >> hour;

it extracts year , dayofyear alright 2013 , 336 things start going badly. day 0, month , empty string, , hour 843076624.

leftparenthesis (02 contains day when seek omit leftparenthesis variable while redirecting yeardaymonthhourstringstream stream day 0.

any ideas on how deal this? don't know regular expressions (yet) and, admittedly, not sure if can afford larn them right (timewise).

edit ok, i've got it. although billionth time when create life much easier regex, guess it's time. anyway, worked was:

int year, dayofyear, day, month, hour, minute, revolution; std::string daystring, monthstring; yeardaymonthhourstringstream >> year >> dayofyear >> daystring >> monthstring >> hour; std::string::size_type sz; day = std::stod( daystring.substr( daystring.find("(")+1 ), &sz ); // convert day number using c++11 standard. ignore ( may @ beginning.

this still requires handling of monthstring, need alter number anyway, isn't huge disadvantage. not best thing can (regex) works , isn't dirty. knowledge vaguely portable , won't stop working new compilers. everyone.

the obvious solution is utilize regular expressions (either std::regex, in c++11, or boost::regex pre c++11). capture groups you're interested in, , utilize std::istringstream convert them if necessary. in case,

std::regex re( "\\s*\\d+\\s+\\d+\\s*\\((\\d+)\\s+([[:alpha:]]+))\\s*(\\d+)" );

should trick.

and regular expressions quite simple; take less time larn them implement alternative solution.

for alternative solution, you'd want read line character character, breaking tokens. along line:

std::vector<std::string> tokens; std::string currenttoken; char ch; while ( source.get(ch) && ch != '\n' ) { if ( std::isspace( static_cast<unsigned char>( ch ) ) ) { if ( !currenttoken.empty() ) { tokens.push_back( currenttoken ); currenttoken = ""; } } else if ( std::ispunct( static_cast<unsigned char>( ch ) ) ) { if ( !currenttoken.empty() ) { tokens.push_back( currenttoken ); currenttoken = ""; } currenttoken.push_back( ch ); } else if ( std::isalnum( static_cast<unsigned char>( ch ) ) ) { currenttoken.push_back( ch ); } else { // error: illegal character in line. you'll // want throw exception. } } if ( !currenttoken.empty() ) { tokens.push_back( currenttoken ); }

in case, sequence of alphanumeric characters 1 token, single punctuation character. go further, ensuring token either alpha, or digits, , maybe regrouping sequences of punctuation, seems sufficient problem.

once you've got list of tokens, can necessary verifications (parentheses in right places, etc.), , convert tokens you're interested in, if need converting.

edit:

fwiw: i've been experimenting using auto plus lambda means of defining nested functions. mind's not made whether it's thought or not: don't find results readable. in case:

auto pushtoken = [&]() { if ( !currenttoken.empty() ) { tokens.push_back( currenttoken ); currenttoken = ""; } }

just before loop, replace of if pushtoken(). (or create info construction tokens, currenttoken , pushtoken fellow member function. work in pre-c++11.)

edit:

one final remark, since op seems want exclusively std::istream: solution there add together mustmatch manipulator:

class mustmatch { char m_tomatch; public: mustmatch( char tomatch ) : m_tomatch( tomatch ) {} friend std::istream& operator>>( std::istream& source, mustmatch const& manip ) { char next; source >> next; // or source.get( next ) if don't want skip whitespace. if ( source && next != m_tomatch ) { source.setstate( std::ios_base::failbit ); } homecoming source; } }

as @angew has pointed out, you'd need >> months; typically, months represented class, you'd overload >> on this:

std::istream& operator>>( std::istream& source, month& object ) { // sentry takes care of skipping whitespace, etc. std::ostream::sentry guard( source ); if ( guard ) { std::streambuf* sb = source.rd(); std::string monthname; while ( std::isalpha( sb->sgetc() ) ) { monthname += sb->sbumpc(); } if ( !islegalmonthname( monthname ) ) { source.setstate( std::ios_base::failbit ); } else { object = month( monthname ); } } homecoming source; }

you could, of course, introduce many variants here: month name limited maximum of 3 characters, illustration (by making loop status monthname.size() < 3 && std::isalpha( sb->sgetc() )). if you're dealing months in way in code, writing month class , >> , << operators you'll have sooner or later anyway.

then like:

source >> year >> dayofyear >> mustmatch( '(' ) >> day >> month >> mustmatch( ')' ) >> hour; if ( !(source >> ws) || source.get() != eof ) { // format error... }

is needed. (the utilize of manipulators technique worth learning.)

c++ c++11 stringstream

No comments:

Post a Comment