Friday 15 March 2013

string - When Unicode decoding should be interrupted with an exception? -



string - When Unicode decoding should be interrupted with an exception? -

i'm working on bringing unicode back upwards narrow-string application, , while looking @ how carefree handle it's all-char * strings, not weighed downwards thinking of possibility of invalid string, made me think of following:

when decoding unicode, programmer presented 3 choices on how handle ill-formed strings — ignore decoding errors, stripping invalid characters out of resulting string, stumble on first decoding error, or replace can't decoded replacement characters.

i don't ignoring approach because of security reasons - it's easy create string might on first glance, becomes evil after stripping designed errors. replacing errors replacement characters much improve in case — might worse, there clear visual indication went not planned, replacement characters don't allow words merge different meaning.

but real-life use-cases of throwing exception or stopping decoding after first error? point of such "validation"? let's assume function got apparently invalid utf8 string - programmer supposed knowledge?

string unicode utf-8 utf8-decode

No comments:

Post a Comment