Python - Regular Expression (called REs, or regexes, or regex patterns)

Card Puncher Data Processing

About

Regular expression in Python

The regexp in Python have matching operations similar to those found in Perl.

Backslash

  • the backslash character ('\') indicate special forms or allow special characters
  • for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be \\, and each backslash must be expressed as
    inside a regular Python string literal.
  • backslashes are not handled in any special way in a string literal prefixed with 'r'. So r“\n” is a two-character string containing '\' and 'n', while “\n” is a one-character string containing a newline.

Tester

Example

Web Log Parsing

Log - Apache Common Log Format

'^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+)\s*(\S*)" (\d{3}) (\S+)'
127.0.0.1 - - [24/Jul/1973:08:32:01 -0400] "GET /images/gerardnico.gif HTTP/1.0" 200 2564

Word tokenization

re.split

Documentation / Reference







Share this page:
Follow us:
Task Runner