Notepad++ - A step by step tutorial on how to replace a text with regular expression

About

This page has:

  • an how-to that shows via examples how to replace a portion of text with regular expression in Notepad++.
  • and other snippets with screenshot to fully understand the replace functionality.

Tutorial

The goal

In this tutorial, we have a classic output of data shown in columns but with only space as separator between each columns

Hello          Print Hello
Youhou         Print Youhou

We want to transform it in a markup language such as wikipedia, markdown and add a pipe character | to delimit the columns. The expected output is the following:

| Hello | Print Hello |
| Youhou | Print Youhou |

Discovering the input file end of line characters

When you copy/paste text in Notepad, the first thing that you want to see if the end of line. By default, the non-visible end of line character are:

  • on windows: two characters denoted \r\n
  • on Linux: \n

If you manipulate a text file that was created on Linux on a Windows system, you may get the linux.

To discover them with notepad,

  • You can make them visible with the option “View all characters”
    • P shortcut on the icon toolbar
    • or View > Show Symbol > Show all characters or the reverse P icon with the menu
  • You can see the EOL type in the status bar at the bottom left

Example:

_

The Replace screen and its Search Mode options

To open the replace screen, you type Ctrl+H or in the menu Search > Replace

The most important part on this screen is the Search Mode options that set the type of text of the input that you have written in the Find what field.

_

If you have selected:

  • normal, the input is just a text
  • extended, the input is a text and you can use the regular expression shorthand
  • regular expression, the input is a regular expression
  • . matches newline if the checkbox is checked means that the search will not occurs for each line but for the whole document when the dot is used.

The Regular Expression

Below is the regular expression that we have created to capture the two columns.

They are enclosed in parenthesis in order to create a group and capture the text in the next step. Enclosing all characters in group is not necessary but we have done it to understand how you can refer to a group in the next step when you replace.

(\w*)(\s*)(.*)(\r\n){0,1}

where:

  • (\w*) means: takes all alphabetical characters (\w) until you found something that is not a alphabetical character.
  • (\s*) means: takes all space until you found something that is not a space (the \s class means any whitespace character)
  • (.*) means: takes all characters . (ie dot)
  • (\r\n){0,1} until you found:
    • the end of line \r\n
    • or nothing {0,1} (ie the end of file)

Example:

Don't check the check box ( .matches newline ) because our regular expression applies to each line.

_

Replace

The replace expression uses what's called backreference in order to select the group of text and replace it in a regular expression process called substitution

In the Replace with field, we have entered the below substitution expression

| \1 | \3 |\4

where:

  • | stands for itself
  • \1 correspond to the match of the first pair of parenthesis (The first group)
  • \2 to the second and so on.

The fully qualified expression for substitution is

${n}

and must be used when:

  • the number of group is greater than 9
  • you want a number that follow the substitution

Output

After hitting the replace button,

_

You should get this output:

| Hello | Print Hello |
| Youhou | Print Youhou |

Snippets

Take all letters until a separator character is reached

This is achieve with the negation character ^ of a class

Example with the - minus. Take all characters until a minus is found:

_

  • Find what:
([^-]*).*
  • Replace with:
\1

Example of input:

libX11-1.6.5
lksctp-tools-1.0.17
mailcap-2.1.41

gives this output:

libX11
lksctp
mailcap

Replace a character with new line

_

Suppress blank line

_

Regular expression language of NotePad

Notepad++ uses the Boost Library (C++). For the whole regex pattern format, you may have to refer to it.

Documentation / Reference


Powered by ComboStrap