Saturday, 15 June 2013

python - When we write a string to a file, why do we need to care about the encoding? -



python - When we write a string to a file, why do we need to care about the encoding? -

i read python2 unicode howto , unicode in python, demystified understand python's unicode system, , met code this:

f = open('test.txt','w') f.write(uni.encode('utf-8')) f.close()

why unicode str needed encoded before written file?

i know default encoding ascii, there error because out of range.

but write file, isn't copying bits of uni in ram file, why programme need care encoding?

unicode characters abstract entities called code points , have several encodings such utf32, utf16 , utf8. take 6 bytes per character express of characters in single entity (and then, unicode has non-spacing characters 1 argue size bigger). maintain things confusing, many systems utilize "code pages" existed before unicode standardized different mappings between bits , characters display.

python's unicode characters utf16 in ram. right away see issue. if want write utf8, in-memory string isn't going work. python needs read in-memory utf16 string , write utf8 string.

there subtle issue intel based processors "little endian" unicode multi-byte encodings "big endian" (meaning bytes within word ordered differently). if want write utf-16, changes must made. because of little/big problem, mutual write bom (byte order mark) @ front end of strings encoders can guess format.

characters can expressed in many ways (encodings), should default be? historical thing really. since acsii way done thoughout history (well, unix history @ least), still default.

when writing non-binary data, have pass through sort of codec. that's cost pay time took multi-lingual computing mature , computing systems become powerful plenty deal it. no way commodore 64 have been able deal phoenician.

python encoding

No comments:

Post a Comment