Table of Contents

Python - String (str type)

About

Text - String in Python

A string literal is a sequence data type.

Strings in Python are:

Each character in a string has a subscript or offset (id). The number starts at 0 for the leftmost character and increases by one as you move character-by-character to the right.

The string python has 6 characters, numbered 0 to 5:

+---+---+---+---+---+---+
| P | Y | T | H | O | N |
+---+---+---+---+---+---+
  0   1   2   3   4   5

To get the letter P,

letter_P = "PYTHON"[0]

Initialization

Code

It's created by writing it down between quotation marks (' ' or “ ”). The escape character is the backslash character (\)

my_string = "I'm a string!"
my_other_string = 'and me too'
my_string_with_comma = 'It\'s great!'

string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'

File

with open("file.txt", "r") as fh:
    my_description = fh.read()

str

str is the class that creates strings objects.

Character Set

Unicode

data prefixed with the letter “u” are unicode strings. For example:

s = u"This is an unicode string"
print type(s)
<type 'unicode'>

python i/o is byte based.

s = file.readline() # bytes
print type(s)
s = file.readline().decode('utf-8') # unicode
print type(s)

s = u'Hello world!'
print type(s), repr(s)
s = 'Hello world!'
print type(s), repr(s)

If you encounter an error involving printing unicode, you can use the encode method to properly print the international characters, like this:

unicode_string = u"aaaàçççñññ"
encoded_string = unicode_string.encode('utf-8')
print encoded_string

“decode” converts from bytes to unicode. “encode” converts from unicode to bytes.

Loop

For

string = "Nico!"

for character in string:
    print character
N
i
c
o
!

Operator

in

if 'a' in 'Nicolas':
    print('a is a letter of Nicolas')
if 'z' not in 'Nicolas':
    print('a is not a letter of Nicolas')
if 'Nico' in 'Nicolas':
    print('Nico is in Nicolas')

+ (Concat)

The + operator between two strings concatenates them

print "gerard"+" "+"nico"

Function

Split

3/library/stdtypes.html

The string function will split a sentence in a list of words.

text = "How do you do?"

for word in text.split():
    print word
How
do
you
do?

Split returns a list data type

>>> type(text.split())
<class 'list'>

split() can be called directly on a unicode or str object. For example,

>>> u'split,me'.split(',')
[u'split', u'me']

length

my_string = "Nico Gerard"
print len(my_string)
# Dot notation works only for string specific methods (ie that don't work on anything else).
# my_string.len() is then not good because len() can work on different objects.

Lower

s.lower()

Upper

s.upper()

(Cast|Str)

str(). Makes strings out of non-strings. Explicit string conversion.

str(2)

Isalphanumeric

Isalphanumeric: “J123”.isalpha() == False

Slicing

Slicing of substring: string[i:j] gives the characters from position i to j.

>>> 'foo'[0:2]
'fo'
>>> 'foo'[2:]
'o'

Replace

Syntax:

str.replace(old, new[, max])

Example:

'Youplaboum'.replace('boum', 'hop')
'Youplahop'

'Youplaboumboum'.replace('boum', 'hop',1)
'Youplahopboum'

Strip

remove trailing and leading spaces.

s.strip()

Encoding

Characters are represented using a variable-length encoding scheme called UTF-8.

Each character is represented by some number of bytes.

Ord

You can find the value of a character c using ord©.

Example of numeric values of the characters 'a', 'A' and space:

>>> ord('a')
97
>>> ord('A')
65
>>> ord(' ')
32

Chr

You can obtain the character from a numerical value using chr(i).

To see the string of characters numbered 0 through 10, you can use the following:

s = ' - '.join([chr(i) for i in range(10)])
'\x00 - \x01 - \x02 - \x03 - \x04 - \x05 - \x06 - \x07 - \x08 - \t'

to

toInt

Python - Integer

int('24')

toByte

Python - Byte

myString = "hello";
myStringInByte = myString.encode()

Properties

immutable

Strings in Pyhton are immutable.

From Why are Python strings immutable?. There are several advantages.

Documentation / Reference