Jump to content

When to use sed awk perl or python


Recommended Posts

The sed program is a stream editor, and is designed to apply the actions from a script to each line (or, more generally, to specified ranges of lines) of the input file or files. Its language is based on ed, the Unix editor, and although it has conditionals and so on, it is hard to work with for complex tasks. You can work minor miracles with it - but at a cost to the hair on your head. However, it is probably the fastest of the programs when attempting tasks within its remit. (It has the least powerful regular expressions of the programs discussed - adequate for many purposes, but certainly not PCRE - Perl-Compatible Regular Expressions)

The awk program (name from the initials of its authors - Aho, Weinberger and Kernighan) is a tool originally for formatting reports. It can be used as a souped up sed; in its more recent versions, it is computationally complete. It uses an interesting idea - the program is based on 'patterns matched' and 'actions taken when the pattern matches'. The patterns are fairly powerful (Extended Regular Expressions). The language for the actions is similar to C. One of the key features of awk is that it splits the input lines into fields automatically.

Perl was written in part as an awk-killer and sed-killer. Two of the programs provided with it are a2p and s2p for converting awk scripts and sed scripts into Perl. Perl is one of the earliest of the next generation of scripting languages (Tcl/Tk can probably claim primacy). It has powerful integrated regular expression handling with a vastly more powerful language. It provides access to almost all system calls, and has the extensibility of the CPAN modules. (Neither awk nor sed is extensible.) One of Perl's mottos is "TMTOWTDI - There's more than one way to do it" (pronounced "tim-toady"). Perl has 'objects', but it is more of an add-on than a fundamental part of the language.

Python was written last, and probably in part as a reaction to Perl. It has some interesting syntactic ideas (indenting to indicate levels - no braces or equivalents). It is more fundamentally object-oriented than Perl; it is just as extensible as Perl.

OK - when to use each?

    sed - when you need to do simple text transforms on files.

    awk - when you only need simple formatting and summarization or transformation of data.

    perl - for almost any task, but especially when the task needs complex regular expressions.

    python - for the same tasks that you could use Perl for.

I'm not aware of anything that Perl can do that Python can't, nor vice versa. The choice between the two would depend on other factors.  Python has less accreted syntax and is generally somewhat simpler to learn.


sed example. Replace every word that matches ugly with beautiful in lover.txt

$ sed -i 's/ugly/beautiful/g' /home/dennis/friends/lover.txt

awk example. Basic awk command looks like this. Take each line of the input file; if the line contains the pattern apply the action to the line and write the resulting line to the output-file. If the pattern is omitted, the action is applied to all line.

awk 'pattern {action}' input-file > output-file

perl example. This reads from the standard input and counts the number lines which are blank, and lines which are entirely perl-style # comments (begin with # as the first non-blank character.)  It reports these figures.

use strict;

# Counters to return.
my $nblank = 0;
my $ncomm = 0;

# Read each line into the variable $line.
while(my $line = ) {
    if($line =~ /^s*$/) { ++$nblank; }
    if($line =~ /^s*#/) { ++$ncomm; }

print "$nblank blank lines, $ncomm comments.n";

python example.

Link to comment
Share on other sites

  • 1 year later...
Guest dennis

Here is an example of how to remove the last character in a txt file.

Scenario is I have an export of domains but every domain has a dang period at the end messing up any possibility of using the entry.  So how do I remove the period at the end of each domain.  I utilize sed in the following manor

sed 's/.$//' digresultsext.txt >digresultsextOUT.txt

So as you can see, this is removing the last character (no matter what it is) from the file digresultsext.txt and spitting out the domains without a period (which was the last character) into a new file named digresultsextOUT.txt

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Create New...