Regular Expression - Group (Capture|Substitution)

About

group are regexp expression that captures by default the match. You can then (extract|reference) the match content.

Look-around are also groups but implements an assertion and are not capturing the content

The below regular expression has two groups

([^ ]) (.*)

where:

the first group [^ ] will match all characters that not a space. Why ? It is:
- a class of characters:
- that contains only a space
- and the match is negated thanks to the ^ characters
the second group is .* that will match all characters.

if you parse the following text:

Hello World

You will get:

Every group must begin with an open bracket and end with a close bracket.

(myRegexp0 ( myRegexp1) ( myRegexp2) )

Construct	Definition
(?<name>X)	X, as a named-capturing group
non-capturing
(?:X)	X, as a non-capturing group
(?>X)	X, as an independent, non-capturing group
Assertion (See Regexp - Look-around group (Assertion) - ( Lookahead \| Lookbehind ))
(?=X)	X, positive lookahead (via zero-width)
(?!X)	X, negative lookahead (via zero-width)
(?<=X)	X, positive lookbehind (via zero-width)
(?<!X)	X, negative lookbehind (via zero-width)
Flag
(?idmsuxU-idmsuxU)	Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X)	X, as a non-capturing group with the given flags i d m s u x on - off

By default, the group is indexed by index (0,1,2,…) but you can give it a name with the following syntax

(?<name>X)

where X is the regular expression pattern that you want to capture

It's called a named-capturing group.

Capturing groups are numbered by counting their opening parentheses from left to right.

In the expression ((A)(B(C))), for example, there are the following groups:

A non capturing group will not be indexed.

In the expression (?:A)(B)(C), for example, there are the following groups:

The group (?:A) was not captured.

When you want to use the content of each captured group, you will generally use the following substitution construct:

When using group index, this construct must be used when:

The dollar is also not always mandatory:

Their is also a shorthand notation for groups up to 9.

Symbol	Definition
\0	backreference to the entire expression
\1	backreference to group 1
\2	backreference to group 2
\n	backreference to group n