Table of Contents

About

Regular expression in Python

The regexp in Python have matching operations similar to those found in Perl.

Backslash

  • the backslash character ('\') indicate special forms or allow special characters
  • for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be \\, and each backslash must be expressed as
    inside a regular Python string literal.
  • backslashes are not handled in any special way in a string literal prefixed with 'r'. So r“\n” is a two-character string containing '\' and 'n', while “\n” is a one-character string containing a newline.

Tester

Example

Web Log Parsing

Log - Apache Common Log Format

'^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+)\s*(\S*)" (\d{3}) (\S+)'
127.0.0.1 - - [24/Jul/1973:08:32:01 -0400] "GET /images/gerardnico.gif HTTP/1.0" 200 2564

Word tokenization

re.split

Documentation / Reference