Python - Encoding and Unicode

Card Puncher Data Processing


print sys.getdefaultencoding()

In PyDev, you can change it in the Run Configuration:

Pydev Default Encoding

and you get:


How to

get the console encoding


import sys
print sys.stdout.encoding

get the system file encoding

print sys.getfilesystemencoding()

Text - Double Byte Character Set (multi-byte character set ?)

get rid of the Bom

s = u"This is an unicode string".encode('utf-8-sig')
print s # You will see the BOM
print s.decode('utf-8-sig')
This is an unicode string
This is an unicode string

Environment variable



'charmap' codec can't encode character u'\ufeff'

UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to <undefined>

Character \ufeff is a BOM

UnicodeEncodeError: 'charmap' codec can't encode character

The UnicodeEncodeError happens when encoding a unicode string into a certain coding.

Python encodes the output using default encoding then:

print u"\u20AC"

is equivalent to on a Windows platform:

print u"\u20AC".encode('Cp1252')

20AC is the Euro Sign as you can see in the Code page (cp) 1252

The codings mapping concerns only a limited number of unicode characters to str strings, a non-presented character will cause the coding-specific encode() to fail. The character set doesn't support all character.

For instance, the White heart suit (U+2661) is not present in the Cp1252 character set.

If you then try to print it, you will get a UnicodeEncodeError.

print u"\u2661"
Traceback (most recent call last):
  File "D:\workspace\PythonWorkpsace\mypackage\", line 1, in <module>
    print u"\u2661"
  File "C:\Python27\lib\encodings\", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2661' in position 0: character maps to <undefined>

To resolve this problem, you can:

  • encode it with a character set that support it.
print u"\u2661".encode('utf-8')
  • use the replace option of the encode function. It will replace an unknown character with a ?
print u"\u2661".encode(sys.getdefaultencoding(), 'replace')

Documentation / Reference

Discover More
Card Puncher Data Processing
Python - Unicode

unicode is an object type unicode. See also: split() can be called directly on a unicode or str object. For example, ...

Share this page:
Follow us:
Task Runner