Text - String in Python
A string literal is a sequence data type.
Strings in Python are:
- sequence with characters as elements
Each character in a string has a subscript or offset (id). The number starts at 0 for the leftmost character and increases by one as you move character-by-character to the right.
The string python has 6 characters, numbered 0 to 5:
+---+---+---+---+---+---+ | P | Y | T | H | O | N | +---+---+---+---+---+---+ 0 1 2 3 4 5
To get the letter P,
letter_P = "PYTHON"
It's created by writing it down between quotation marks (' ' or “ ”). The escape character is the backslash character (\)
my_string = "I'm a string!" my_other_string = 'and me too' my_string_with_comma = 'It\'s great!' string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'
with open("file.txt", "r") as fh: my_description = fh.read()
str is the class that creates strings objects.
data prefixed with the letter “u” are unicode strings. For example:
s = u"This is an unicode string" print type(s)
python i/o is byte based.
s = file.readline() # bytes print type(s) s = file.readline().decode('utf-8') # unicode print type(s) s = u'Hello world!' print type(s), repr(s) s = 'Hello world!' print type(s), repr(s)
If you encounter an error involving printing unicode, you can use the encode method to properly print the international characters, like this:
unicode_string = u"aaaàçççñññ" encoded_string = unicode_string.encode('utf-8') print encoded_string
“decode” converts from bytes to unicode. “encode” converts from unicode to bytes.
string = "Nico!" for character in string: print character
N i c o !
if 'a' in 'Nicolas': print('a is a letter of Nicolas') if 'z' not in 'Nicolas': print('a is not a letter of Nicolas') if 'Nico' in 'Nicolas': print('Nico is in Nicolas')
The + operator between two strings concatenates them
print "gerard"+" "+"nico"
The string function will split a sentence in a list of words.
text = "How do you do?" for word in text.split(): print word
How do you do?
Split returns a list data type
>>> type(text.split()) <class 'list'>
split() can be called directly on a unicode or str object. For example,
>>> u'split,me'.split(',') [u'split', u'me']
- len(). Length of a string of numbers of characters
my_string = "Nico Gerard" print len(my_string) # Dot notation works only for string specific methods (ie that don't work on anything else). # my_string.len() is then not good because len() can work on different objects.
str(). Makes strings out of non-strings. Explicit string conversion.
Isalphanumeric: “J123”.isalpha() == False
Slicing of substring: string[i:j] gives the characters from position i to j.
>>> 'foo'[0:2] 'fo' >>> 'foo'[2:] 'o'
str.replace(old, new[, max])
- Replace boum by hop
- Replace the first boum by hop
remove trailing and leading spaces.
You can find the value of a character c using ord©.
Example of numeric values of the characters 'a', 'A' and space:
>>> ord('a') 97 >>> ord('A') 65 >>> ord(' ') 32
You can obtain the character from a numerical value using chr(i).
To see the string of characters numbered 0 through 10, you can use the following:
s = ' - '.join([chr(i) for i in range(10)])
'\x00 - \x01 - \x02 - \x03 - \x04 - \x05 - \x06 - \x07 - \x08 - \t'
myString = "hello"; myStringInByte = myString.encode()
Strings in Pyhton are immutable.
From Why are Python strings immutable?. There are several advantages.
- One is performance: knowing that a string is immutable means we can allocate space for it at creation time, and the storage requirements are fixed and unchanging. This is also one of the reasons for the distinction between tuples and lists.
- Another advantage is that strings in Python are considered as “elemental” as numbers. No amount of activity will change the value 8 to anything else, and in Python, no amount of activity will change the string “eight” to anything else.