python - DjangoUnicodeDecodeError: [Bad Unicode data] -
the model:
class itemtype(models.model): name = models.charfield(max_length=100) def __unicode__(self): logger.debug("1. item type %s created" % self.name) homecoming self.name
the code:
(...) type = re.search(r"type:(.*?)",text) itemtype = itemtype.objects.create(name = name.group(1), defaults={'name':name.group(1)}) logger.debug("2. item type %s created" % name.group(1)) logger.debug("4. item type %s created" % itemtype.name) logger.debug("3. item type %s created" % itemtype)
and result unexpected (to me of course):
the first logger.debug
prints item type ąęńłśóć created
expected, sec raises error:
djangounicodedecodeerror: 'ascii' codec can't decode byte in position : ordinal not in range(128). passed in <itemtype: [bad unicode data]> (<class 'aaa.models.itemtype'>)
why there's error, , how can prepare it?
(text html response utf-8 encoding)
updated
i add together debug model , debug result is:
2014-10-06 09:38:53,342 debug views 2. item type ąęćńółśż created 2014-10-06 09:38:53,342 debug views 4. item type ąęćńółśż created 2014-10-06 09:38:53,344 debug models 1. item type ąęćńółśż created 2014-10-06 09:38:53,358 debug models 1. item type ąęćńółśż created
so why debug 3. can't print it?
update 2 problem here:
itemtype = itemtype.objects.create(name = name.group(1), defaults={'name':name.group(1)})
if changed
itemtype = itemtype.objects.create(name = name.group(1), defaults={'name':u'ĄĆĘŃŁÓŚ'})
everything ok.
so how convert unicode? unicode(name.group(1)) doesn't work.
after 2 days of figthing own shadow found solution. isn't workaround case, complex alter of thinking , have refactor whole code.
my assumption every string unicode. if isn't - prepare it.
do not utilize "%s" or "something" utilize u"%s" , u"cośtam"
in every model has models.charfield() or other "text" oriented fields override save() method:in example:
class itemtype(models.model): name = models.charfield(max_length=100) def save(self, *args, **kwargs): if isinstance(self.name, str): self.name=self.name.decode("utf-8") super(itemtype, self).save(*args, **kwargs)
explanation - if somehow name filled str not unicode - alter unicode.
how found this:
i wondering type text in models.charfield, , found, if fill unicode - unicode, if fill - str - it's str. if 1 time fill "hand" unicode, , in other place regex fill str - result unexpected.
the biggest problem of unicode , str is no problem of using diactrics both:
>>> text_str = "żółć" >>> text_unicode = u"żółć" >>> print text_str żółć >>> print text_uni żółć
so can't see difference.
but if utilize other command:
>>> text_str '\xc5\xbc\xc3\xb3\xc5\x82\xc4\x87' >>> text_uni u'\u017c\xf3\u0142\u0107'
the difference glares.
if there setting alter behaviour of print (and similiars) this:
>>> print text_str '\xc5\xbc\xc3\xb3\xc5\x82\xc4\x87' >>> print text_uni żółć
everything much easier debug - if can see diactrics it's ok - if not - it's bad.
using decode('utf-8') leads me solution:
>>> text_str '\xc5\xbc\xc3\xb3\xc5\x82\xc4\x87' >>> text_str.decode('utf-8') u'\u017c\xf3\u0142\u0107' >>> text_uni u'\u017c\xf3\u0142\u0107'
voila!
python django unicode python-unicode
No comments:
Post a Comment