Shell Data Processing - Sed (Stream editor)

Card Puncher Data Processing

About

sed stands for stream editor.

It is a filter program used for filtering and transforming text

It:

In the stream, it can:

It's part of the Gnu utility.

Sed is line-based therefore it is hard for it to grasp newlines and to manipulate eol characters.

Use the utility:

  • or dos2unix

Example

How to extract a content via regular expression?

This expression capture and print the regular expression group

sed -n 's/.*\(your-group-expression\).*/\1/p'

This is:

  • with this arguments/options:
    • .*\(your-group-expression\).*: the matching regular expression that captures via the group brackets (They are escaped \( and \))
    • \1: the callback expression (known as new pattern space) where \1 refers to the first captured group (\2 to the second and so on)
    • p: a flag to print and not replace (p is not the p command but a flag of the substitution command)
  • the -n option that outputs only the match

Note the -n and p may not be necessary if the input is a single line.

Syntax

sed

sed 'command1;...;commandN' inputFileName
sed -e 'command1' -e 'commandN' inputFileName
sed --expression='command1' inputFileName
sed -f myscript-with-commands.sed input.txt
sed --file=myscript-with-commands.sed input.txt
# In place editing - No outputFileName needed
sed -i 'command1;...;commandN' inputFileName

Command

command syntax is

[LineAddressSelector]SingleLetterCommand[CommandOptions][sep]

where:

  • LineAddressSelector is an optional line address (ie executed only on the matched lines) 1)
  • SingleLetterCommand is a single-letter command known as X
  • CommandOptions are options of the command
  • sep is a command separator (ie semicolons ; or newlines ASCII 10)

Script

Using a script file avoids problems with shell escaping or substitutions.

Example script.sed: A sed file script with one command by line and a shebang

#!/bin/sed -f
sedCommand1
sedCommand...
sedCommandN

Run it:

  • with the f option
sed -f script.sed inputFileName > outputFileName
chmod u+x subst.sed
script.sed inputFileName > outputFileName

Command

A command is the first part in the sed expression command/regularExpression/modifier.

s: Substitution

The Substitution command 2) replace a string

It's:

  • line based by default (ie you can't use the \n in your pattern)
  • document based if the -z or --null-data (separate lines by NUL characters)

Syntax:

s/regexp/replacement/[flags]
# First occurence Default
sed 's/searchString/replacementString/' inputFileName > outputFileName
# All Occurences thanks to the g at the end
sed 's/searchString/replacementString/g' inputFileName > outputFileName
# In place editing - No outputFileName needed
sed -i 's/searchString/replacementString/g' inputFileName
# to use backslash characters. tab by arrow and end of line by reverse p
sed 's/\t/→/g;s/$/¶'

where: in the expression 's/searchString/replacementString/':

  • s stands for “substitution”.
  • searchString: the search string, the text to find.
  • replacementString: the replacement string
  • g stands for global (ie replace all occurence)
  • i is an option to edit the file directly - no need of outputFileName (a temporary output file is created in the background)
  • $ is the single quote format that allows backslash characters

p - print specific line

The -n delete the output that is not matched

# prints only line 45
sed -n '45p' file.txt
# prints the first line of the first file (one.txt) and the last line of the last file (three.txt)
# Use -s to reverse this behavior.
sed -n  '1p ; $p' one.txt two.txt three.txt
# Print line that matches an expression
sed -n "/patternExpression/p" one.txt

d: Delete lines

The d (delete) command delete lines (to delete a word, substitute it with nothing)

# line
'/regularExpression/d'
# deletes lines 30 to 35
'30,35d'

Example:

  • delete lines that are either blank or only contain spaces
sed '/^ *$/d' inputFileName
  • delete word (ie substitute with empty)
s/yourword//g

q: quit

Search for line that starts with foo and quit with the 42 exit code

/^foo/q42

More commands

https://www.gnu.org/software/sed/manual/sed.html#sed-commands-list

How to

test the search pattern expression

sed -n "/patternExpression/p" targetFilePath

where p means print

Flow

Flow of control can be managed by:

  • the use of a label (a colon followed by a string)
  • and the branch instruction b.

An instruction b followed by a valid label name will move processing to the block following that label.

Documentation / Reference





Discover More
Bash Liste Des Attaques Ovh
Bash - How to pass arguments that have space in their values ?

This article shows you how to pass arguments that have space characters in their values. Passing several arguments to a function that are stored as a string may not work properly when you have space...
Bash Liste Des Attaques Ovh
Bash - IFS (Field Separator)

The field separator is a set of character that defines one or more field separator that separates (delimit) field (word) in a string. DELIM It's defined in the IFS variable parameters statement...
Card Puncher Data Processing
How to replace in bulk a text in multiple file with a bash pipeline

An step by step that shows you how to create bash pipeline to replace in bulk text in files
Io Input Stream
I/O - Stream

A stream concept at the io level is a file (generally a text file) A stream is an abstract concept for files and io devices which can be read or written, or sometimes both. I/O devices can be interpreted...
Kafka Commit Log Messaging Process
Kafka - Installation Standalone / Open Source (Single Broker)

This page shows you how to install kafka from the open source package with a single broker (a single node) Kafka is working with zookeeper to store its data. A zookeeper server must be running before...
Bash Liste Des Attaques Ovh
Linux - File

Linux file management See Using Parameters Expansion Removal From a path string where the file does not exist dirname returns the first parent of an existing path file. ...
Bash Liste Des Attaques Ovh
Sh - Backslash Escape Characters (Whitespace, Tabs, Ends, End of Line, Newline) - Non-printing characters

in bash Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Backslash escape sequences,...
Bash Liste Des Attaques Ovh
Sh - String Variable

String Variable, see also character string in bash. When calling a function, quote the variable otherwise bash will not see the string as atomic but as an array Sh with Bash “” The...
Card Puncher Data Processing
Shell Data Processing - Awk (grep and sed) - Output filtering

The awk command is a filter that implements a language that is dedicated to text processing and combines the functions of: grep and sed AWK is a (tool|language) for event-based data processing....
Card Puncher Data Processing
Shell Data Processing - Filter (Stream Operator)

This page is pipeline operator in a shell language. They are known as filter in a shell language. It is a computer program or shell command (subroutine) that: read from standard input (stream)...



Share this page:
Follow us:
Task Runner