Shell Data Processing - Sed (Stream editor)

About

sed stands for stream editor.

It is a filter program used for filtering and transforming text

It:

In the stream, it can:

It's part of the Gnu utility.

Sed is line-based therefore it is hard for it to grasp newlines and to manipulate eol characters.

Use the utility:

  • or dos2unix

Syntax

# Default
sed 'expression1;...;expressionN' inputFileName > outputFileName
# In place editing - No outputFileName needed
sed -i 'expression1;...;expressionN' inputFileName

where:

  • expression
command/regularExpression/modifier

Script

Using a script file avoids problems with shell escaping or substitutions.

Example script.sed: A sed file script with one command by line and a shebang

#!/bin/sed -f
sedExpression1
sedExpression...
sedExpressionN

Run it:

  • with the f option
sed -f script.sed inputFileName > outputFileName
chmod u+x subst.sed
script.sed inputFileName > outputFileName

Command

Substitution

The Substitution command replace a string

# First occurence Default
sed 's/searchString/replacementString/' inputFileName > outputFileName
# All Occurences thanks to the g at the end
sed 's/searchString/replacementString/g' inputFileName > outputFileName
# In place editing - No outputFileName needed
sed -i 's/searchString/replacementString/g' inputFileName
# to use backslash characters. tab by arrow and end of line by reverse p
sed 's/\t/→/g;s/$/¶'

where: in the expression 's/searchString/replacementString/':

  • s stands for “substitution”.
  • searchString: the search string, the text to find.
  • replacementString: the replacement string
  • g stands for global (ie replace all occurence)
  • i is an option to edit the file directly - no need of outputFileName (a temporary output file is created in the background)
  • $ is the single quote format that allows backslash characters

Delete

The d (delete) command delete lines (to delete a word, substitute it with nothing)

# line
sed '/regularExpression/d' inputFileName
# word

Example:

  • delete lines that are either blank or only contain spaces
sed '/^ *$/d' inputFileName
  • delete word (ie substitute with empty)
s/yourword//g

Others

  • N add the next line to the pattern space;
  • P print the top line of the pattern space;
  • D delete the top line from the pattern space and run the script again.

Flow

Flow of control can be managed by:

  • the use of a label (a colon followed by a string)
  • and the branch instruction b.

An instruction b followed by a valid label name will move processing to the block following that label.

Documentation / Reference


Powered by ComboStrap