python - UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 47: ordinal not in range(128) -
i trying write info in stringio object using python , load info postgres database using psycopg2's copy_from() function.
first when did this, copy_from() throwing error: error: invalid byte sequence encoding "utf8": 0xc92 followed this question.
i figured out postgres database has utf8 encoding.
the file/stringio object writing info shows encoding following: setgid non-iso extended-ascii english language text, long lines, crlf line terminators
i tried encode every string writing intermediate file/stringio object utf8 format. used .encode(encoding='utf-8',errors='strict')) every string.
this error got now: unicodedecodeerror: 'ascii' codec can't decode byte 0x92 in position 47: ordinal not in range(128)
what mean? how prepare it?
edit: using python 2.7 pieces of code:
i read mysql database has info encoded in utf-8 per mysql workbench. few lines code writing info (that's obtained mysql db) stringio object:
# populate table_data variable rows delimited \n , columns delimited \t row_num=0 row in cursor.fetchall() : # separate rows in table new line delimiter if(row_num!=0): table_data.write("\n") col_num=0 cell in row: # separate cells in row tab delimiter if(col_num!=0): table_data.write("\t") table_data.write(cell.encode(encoding='utf-8',errors='strict')) col_num = col_num+1 row_num = row_num+1
this code writes postgres database stringio object table_data:
cursor = db_connection.cursor() cursor.copy_from(table_data, <postgres_table_name>)
the problem you're calling encode
on str
object.
a str
byte string, representing text encoded in way utf-8. when phone call encode
on that, first has decoded text, text can re-encoded. default, python calling s.decode(sys.getgetdefaultencoding())
, , getdefaultencoding()
returns 'ascii'
.
so, you're talking utf-8 encoded text, decoding if ascii, re-encoding in utf-8.
the general solution explicitly phone call decode
right encoding, instead of letting python utilize default, , encode
result.
but when right encoding 1 want, easier solution skip .decode('utf-8').encode('utf-8')
, utilize utf-8 str
utf-8 str
is.
or, alternatively, if mysql wrapper has feature allow specify encoding , unicode
values char
/varchar
/text
columns instead of str
values (e.g., in mysqldb, pass use_unicode=true
connect
call, or charset='utf-8'
if database old auto-detect it), that. you'll have unicode
objects, , can phone call .encode('utf-8')
on them.
in general, best way deal unicode problems lastly one—decode possible, processing in unicode, , encode late possible. either way, have consistent. don't phone call str
on might unicode
; don't concatenate str
literal unicode
or pass 1 replace
method; etc. time mix , match, python going implicitly convert you, using default encoding, never want.
as side note, 1 of many things python 3.x's unicode changes help with. first, str
unicode text, not encoded bytes. more importantly, if have encoded bytes, e.g., in bytes
object, calling encode
give attributeerror
instead of trying silently decode can re-encode. and, similarly, trying mix , match unicode , bytes give obvious typeerror
, instead of implicit conversion succeeds in cases , gives cryptic message encode or decode didn't inquire in others.
python postgresql python-2.7 encoding utf
No comments:
Post a Comment