string - When Unicode decoding should be interrupted with an exception? -
i'm working on bringing unicode back upwards narrow-string application, , while looking @ how carefree handle it's all-char *
strings, not weighed downwards thinking of possibility of invalid string, made me think of following:
when decoding unicode, programmer presented 3 choices on how handle ill-formed strings — ignore decoding errors, stripping invalid characters out of resulting string, stumble on first decoding error, or replace can't decoded replacement characters.
i don't ignoring approach because of security reasons - it's easy create string might on first glance, becomes evil after stripping designed errors. replacing errors replacement characters much improve in case — might worse, there clear visual indication went not planned, replacement characters don't allow words merge different meaning.
but real-life use-cases of throwing exception or stopping decoding after first error? point of such "validation"? let's assume function got apparently invalid utf8 string - programmer supposed knowledge?
string unicode utf-8 utf8-decode
No comments:
Post a Comment