Default
print sys.getdefaultencoding()
Cp1252
In PyDev, you can change it in the Run Configuration:
and you get:
UTF-8
How to
get the console encoding
stdout:
import sys
print sys.stdout.encoding
Cp1252
get the system file encoding
print sys.getfilesystemencoding()
mbcs
Text - Double Byte Character Set (multi-byte character set ?)
get rid of the Bom
s = u"This is an unicode string".encode('utf-8-sig')
print s # You will see the BOM
print s.decode('utf-8-sig')
This is an unicode string
This is an unicode string
Environment variable
set PYTHONIOENCODING=UTF-8
Support
'charmap' codec can't encode character u'\ufeff'
UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to <undefined>
Character \ufeff is a BOM
UnicodeEncodeError: 'charmap' codec can't encode character
The UnicodeEncodeError happens when encoding a unicode string into a certain coding.
Python encodes the output using default encoding then:
print u"\u20AC"
is equivalent to on a Windows platform:
print u"\u20AC".encode('Cp1252')
€
20AC is the Euro Sign as you can see in the Code page (cp) 1252
The codings mapping concerns only a limited number of unicode characters to str strings, a non-presented character will cause the coding-specific encode() to fail. The character set doesn't support all character.
For instance, the White heart suit (U+2661) is not present in the Cp1252 character set.
If you then try to print it, you will get a UnicodeEncodeError.
print u"\u2661"
Traceback (most recent call last):
File "D:\workspace\PythonWorkpsace\mypackage\Test.py", line 1, in <module>
print u"\u2661"
File "C:\Python27\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2661' in position 0: character maps to <undefined>
To resolve this problem, you can:
- encode it with a character set that support it.
print u"\u2661".encode('utf-8')
- use the replace option of the encode function. It will replace an unknown character with a ?
print u"\u2661".encode(sys.getdefaultencoding(), 'replace')
?
- change the default encoding of python