Sunday 15 September 2013

python - DjangoUnicodeDecodeError: [Bad Unicode data] -



python - DjangoUnicodeDecodeError: [Bad Unicode data] -

the model:

class itemtype(models.model): name = models.charfield(max_length=100) def __unicode__(self): logger.debug("1. item type %s created" % self.name) homecoming self.name

the code:

(...) type = re.search(r"type:(.*?)",text) itemtype = itemtype.objects.create(name = name.group(1), defaults={'name':name.group(1)}) logger.debug("2. item type %s created" % name.group(1)) logger.debug("4. item type %s created" % itemtype.name) logger.debug("3. item type %s created" % itemtype)

and result unexpected (to me of course):

the first logger.debug prints item type ąęńłśóć created expected, sec raises error:

djangounicodedecodeerror: 'ascii' codec can't decode byte in position : ordinal not in range(128). passed in <itemtype: [bad unicode data]> (<class 'aaa.models.itemtype'>)

why there's error, , how can prepare it?

(text html response utf-8 encoding)

updated

i add together debug model , debug result is:

2014-10-06 09:38:53,342 debug views 2. item type ąęćńółśż created 2014-10-06 09:38:53,342 debug views 4. item type ąęćńółśż created 2014-10-06 09:38:53,344 debug models 1. item type ąęćńółśż created 2014-10-06 09:38:53,358 debug models 1. item type ąęćńółśż created

so why debug 3. can't print it?

update 2 problem here:

itemtype = itemtype.objects.create(name = name.group(1), defaults={'name':name.group(1)})

if changed

itemtype = itemtype.objects.create(name = name.group(1), defaults={'name':u'ĄĆĘŃŁÓŚ'})

everything ok.

so how convert unicode? unicode(name.group(1)) doesn't work.

after 2 days of figthing own shadow found solution. isn't workaround case, complex alter of thinking , have refactor whole code.

my assumption every string unicode. if isn't - prepare it.

do not utilize "%s" or "something" utilize u"%s" , u"cośtam"

in every model has models.charfield() or other "text" oriented fields override save() method:

in example:

class itemtype(models.model): name = models.charfield(max_length=100) def save(self, *args, **kwargs): if isinstance(self.name, str): self.name=self.name.decode("utf-8") super(itemtype, self).save(*args, **kwargs)

explanation - if somehow name filled str not unicode - alter unicode.

how found this:

i wondering type text in models.charfield, , found, if fill unicode - unicode, if fill - str - it's str. if 1 time fill "hand" unicode, , in other place regex fill str - result unexpected.

the biggest problem of unicode , str is no problem of using diactrics both:

>>> text_str = "żółć" >>> text_unicode = u"żółć" >>> print text_str żółć >>> print text_uni żółć

so can't see difference.

but if utilize other command:

>>> text_str '\xc5\xbc\xc3\xb3\xc5\x82\xc4\x87' >>> text_uni u'\u017c\xf3\u0142\u0107'

the difference glares.

if there setting alter behaviour of print (and similiars) this:

>>> print text_str '\xc5\xbc\xc3\xb3\xc5\x82\xc4\x87' >>> print text_uni żółć

everything much easier debug - if can see diactrics it's ok - if not - it's bad.

using decode('utf-8') leads me solution:

>>> text_str '\xc5\xbc\xc3\xb3\xc5\x82\xc4\x87' >>> text_str.decode('utf-8') u'\u017c\xf3\u0142\u0107' >>> text_uni u'\u017c\xf3\u0142\u0107'

voila!

python django unicode python-unicode

No comments:

Post a Comment