About
This page has:
- an how-to that shows via examples how to replace a portion of text with regular expression in Notepad++.
- and other snippets with screenshot to fully understand the replace functionality.
Tutorial
The goal
In this tutorial, we have a classic output of data shown in columns but with only space as separator between each columns
Hello Print Hello
Youhou Print Youhou
We want to transform it in a markup language such as wikipedia, markdown and add a pipe character | to delimit the columns. The expected output is the following:
| Hello | Print Hello |
| Youhou | Print Youhou |
Discovering the input file end of line characters
When you copy/paste text in Notepad, the first thing that you want to see if the end of line. By default, the non-visible end of line character are:
- on windows: two characters denoted \r\n
- on Linux: \n
If you manipulate a text file that was created on Linux on a Windows system, you may get the linux.
To discover them with notepad,
- You can make them visible with the option “View all characters”
- P shortcut on the icon toolbar
- or View > Show Symbol > Show all characters or the reverse P icon with the menu
- You can see the EOL type in the status bar at the bottom left
Example:
The Replace screen and its Search Mode options
To open the replace screen, you type Ctrl+H or in the menu Search > Replace
The most important part on this screen is the Search Mode options that set the type of text of the input that you have written in the Find what field.
If you have selected:
- normal, the input is just a text
- extended, the input is a text and you can use the regular expression shorthand
- regular expression, the input is a regular expression
- . matches newline if the checkbox is checked means that the search will not occurs for each line but for the whole document when the dot is used.
The Regular Expression
Below is the regular expression that we have created to capture the two columns.
They are enclosed in parenthesis in order to create a group and capture the text in the next step. Enclosing all characters in group is not necessary but we have done it to understand how you can refer to a group in the next step when you replace.
(\w*)(\s*)(.*)(\r\n){0,1}
where:
- (\w*) means: takes all alphabetical characters (\w) until you found something that is not a alphabetical character.
- (\s*) means: takes all space until you found something that is not a space (the \s class means any whitespace character)
- (.*) means: takes all characters . (ie dot)
- (\r\n){0,1} until you found:
- the end of line \r\n
- or nothing {0,1} (ie the end of file)
Example:
Don't check the check box ( . matches newline ) because our regular expression applies to each line.
Replace
The replace expression uses what's called backreference in order to select the group of text and replace it in a regular expression process called substitution
In the Replace with field, we have entered the below substitution expression
| \1 | \3 |\4
where:
- | stands for itself
- \1 correspond to the match of the first pair of parenthesis (The first group)
- \2 to the second and so on.
The fully qualified expression for substitution is
${n}
and must be used when:
- the number of group is greater than 9
- you want a number that follow the substitution
Output
After hitting the replace button,
You should get this output:
| Hello | Print Hello |
| Youhou | Print Youhou |
Snippets
Take all letters until a separator character is reached
This is achieve with the negation character ^ of a class
Example with the - minus. Take all characters until a minus is found:
- Find what:
([^-]*).*
- Replace with:
\1
Example of input:
libX11-1.6.5
lksctp-tools-1.0.17
mailcap-2.1.41
gives this output:
libX11
lksctp
mailcap
Replace a character with new line
Suppress blank line
Regular expression language of NotePad
Notepad++ uses the Boost Library (C++). For the whole regex pattern format, you may have to refer to it.