Text - String in Python
A string literal is a sequence data type.
Strings in Python are:
Each character in a string has a subscript or offset (id). The number starts at 0 for the leftmost character and increases by one as you move character-by-character to the right.
The string python has 6 characters, numbered 0 to 5:
+---+---+---+---+---+---+
| P | Y | T | H | O | N |
+---+---+---+---+---+---+
0 1 2 3 4 5
To get the letter P,
letter_P = "PYTHON"[0]
It's created by writing it down between quotation marks (' ' or “ ”). The escape character is the backslash character (\)
my_string = "I'm a string!"
my_other_string = 'and me too'
my_string_with_comma = 'It\'s great!'
string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'
with open("file.txt", "r") as fh:
my_description = fh.read()
str is the class that creates strings objects.
data prefixed with the letter “u” are unicode strings. For example:
s = u"This is an unicode string"
print type(s)
<type 'unicode'>
python i/o is byte based.
s = file.readline() # bytes
print type(s)
s = file.readline().decode('utf-8') # unicode
print type(s)
s = u'Hello world!'
print type(s), repr(s)
s = 'Hello world!'
print type(s), repr(s)
If you encounter an error involving printing unicode, you can use the encode method to properly print the international characters, like this:
unicode_string = u"aaaàçççñññ"
encoded_string = unicode_string.encode('utf-8')
print encoded_string
“decode” converts from bytes to unicode. “encode” converts from unicode to bytes.
string = "Nico!"
for character in string:
print character
N
i
c
o
!
if 'a' in 'Nicolas':
print('a is a letter of Nicolas')
if 'z' not in 'Nicolas':
print('a is not a letter of Nicolas')
if 'Nico' in 'Nicolas':
print('Nico is in Nicolas')
The + operator between two strings concatenates them
print "gerard"+" "+"nico"
The string function will split a sentence in a list of words.
text = "How do you do?"
for word in text.split():
print word
How
do
you
do?
Split returns a list data type
>>> type(text.split())
<class 'list'>
split() can be called directly on a unicode or str object. For example,
>>> u'split,me'.split(',')
[u'split', u'me']
my_string = "Nico Gerard"
print len(my_string)
# Dot notation works only for string specific methods (ie that don't work on anything else).
# my_string.len() is then not good because len() can work on different objects.
s.lower()
s.upper()
str(). Makes strings out of non-strings. Explicit string conversion.
str(2)
Isalphanumeric: “J123”.isalpha() == False
Slicing of substring: string[i:j] gives the characters from position i to j.
>>> 'foo'[0:2]
'fo'
>>> 'foo'[2:]
'o'
Syntax:
str.replace(old, new[, max])
Example:
'Youplaboum'.replace('boum', 'hop')
'Youplahop'
'Youplaboumboum'.replace('boum', 'hop',1)
'Youplahopboum'
remove trailing and leading spaces.
s.strip()
Characters are represented using a variable-length encoding scheme called UTF-8.
Each character is represented by some number of bytes.
You can find the value of a character c using ord©.
Example of numeric values of the characters 'a', 'A' and space:
>>> ord('a')
97
>>> ord('A')
65
>>> ord(' ')
32
You can obtain the character from a numerical value using chr(i).
To see the string of characters numbered 0 through 10, you can use the following:
s = ' - '.join([chr(i) for i in range(10)])
'\x00 - \x01 - \x02 - \x03 - \x04 - \x05 - \x06 - \x07 - \x08 - \t'
int('24')
myString = "hello";
myStringInByte = myString.encode()
Strings in Pyhton are immutable.
From Why are Python strings immutable?. There are several advantages.