Thursday 15 January 2015

c - Print UTF-8 string in ICU -



c - Print UTF-8 string in ICU -

i discovered icu's ustdio.h , thought fun test. didn't take long see wasn't quite right.

python 3 supports utf-8 in string literals, statement like

print("90°")

is valid.

icu (in c api) provides u_printf() , u_printf_u(), latter of designed whatever uchar on system's implementation, @ to the lowest degree utf-16.

in effort test, tried printing out special character, grade symbol.

u_printf("90%c\n", 0xb0);

printed 90�, did following:

u_printf(u8"90%c\n", 0xb0); u_printf("90°\n"); u_printf(u8"90°\n"); u_printf_u(u"90%c\n", 0x00b0);

however, declaring character in utf-16 string literal got desired result.

u_printf_u(u"90°\n"); $ ./a.out 90°

i stick this, want utf-8 compliance; seems superior system. why aren't utf-8 string literals c11 compatible icu's u_printf()?

i able navigate issue creating string literal unicode characters in , passing char * argument printf().

the next code prints line josé 90°\n 4 times.

char *s = u8"josé 90°"; (int = 0; < strlen(s); ++i) putchar(s[i]); putchar('\n'); printf("%s\n", s); u_printf("%s\n", s); uerrorcode error = u_zero_error; u_init(&error); uchar *s16 = malloc(256*sizeof(uchar)); u_strfromutf8(s16, 256, null, s, strlen(s), &error); u_printf_u(u"%s\n", s16); free(s16);

the buffer s16 can used u_strtoutf8() go , compatible utf-8 functions. internal things in icu appear prefer utf-16 (i guess it's easier parse), you'll need convert before converting utf-8 homecoming caller.

c string encoding utf-8 icu

No comments:

Post a Comment