Thursday 15 July 2010

python - UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 47: ordinal not in range(128) -



python - UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 47: ordinal not in range(128) -

i trying write info in stringio object using python , load info postgres database using psycopg2's copy_from() function.

first when did this, copy_from() throwing error: error: invalid byte sequence encoding "utf8": 0xc92 followed this question.

i figured out postgres database has utf8 encoding.

the file/stringio object writing info shows encoding following: setgid non-iso extended-ascii english language text, long lines, crlf line terminators

i tried encode every string writing intermediate file/stringio object utf8 format. used .encode(encoding='utf-8',errors='strict')) every string.

this error got now: unicodedecodeerror: 'ascii' codec can't decode byte 0x92 in position 47: ordinal not in range(128)

what mean? how prepare it?

edit: using python 2.7 pieces of code:

i read mysql database has info encoded in utf-8 per mysql workbench. few lines code writing info (that's obtained mysql db) stringio object:

# populate table_data variable rows delimited \n , columns delimited \t row_num=0 row in cursor.fetchall() : # separate rows in table new line delimiter if(row_num!=0): table_data.write("\n") col_num=0 cell in row: # separate cells in row tab delimiter if(col_num!=0): table_data.write("\t") table_data.write(cell.encode(encoding='utf-8',errors='strict')) col_num = col_num+1 row_num = row_num+1

this code writes postgres database stringio object table_data:

cursor = db_connection.cursor() cursor.copy_from(table_data, <postgres_table_name>)

the problem you're calling encode on str object.

a str byte string, representing text encoded in way utf-8. when phone call encode on that, first has decoded text, text can re-encoded. default, python calling s.decode(sys.getgetdefaultencoding()), , getdefaultencoding() returns 'ascii'.

so, you're talking utf-8 encoded text, decoding if ascii, re-encoding in utf-8.

the general solution explicitly phone call decode right encoding, instead of letting python utilize default, , encode result.

but when right encoding 1 want, easier solution skip .decode('utf-8').encode('utf-8') , utilize utf-8 str utf-8 str is.

or, alternatively, if mysql wrapper has feature allow specify encoding , unicode values char/varchar/text columns instead of str values (e.g., in mysqldb, pass use_unicode=true connect call, or charset='utf-8' if database old auto-detect it), that. you'll have unicode objects, , can phone call .encode('utf-8') on them.

in general, best way deal unicode problems lastly one—decode possible, processing in unicode, , encode late possible. either way, have consistent. don't phone call str on might unicode; don't concatenate str literal unicode or pass 1 replace method; etc. time mix , match, python going implicitly convert you, using default encoding, never want.

as side note, 1 of many things python 3.x's unicode changes help with. first, str unicode text, not encoded bytes. more importantly, if have encoded bytes, e.g., in bytes object, calling encode give attributeerror instead of trying silently decode can re-encode. and, similarly, trying mix , match unicode , bytes give obvious typeerror, instead of implicit conversion succeeds in cases , gives cryptic message encode or decode didn't inquire in others.

python postgresql python-2.7 encoding utf

No comments:

Post a Comment