Sunday, 27 February 2011

10 useful little bash tricks

We're going to have a quick look at a collection of ten little bash tricks that you may or may not know. Or possibly forget. Or perhaps know, but don't tend to use often and so this might be a good reminder.

Please note, this stuff was tested on Bash 4.1.5. You can find further details and examples for all these tricks in the Bash manpage.

1::Using previous command

This trick is very simple. If you run a command and then have to repeat it (for example using sudo)
then instead of typing the whole command again you can refer to it using !!.

user@host:~$ aptitude install blah
[100%] Reading package lists^C
user@host:~$ sudo !!
sudo aptitude install blah
[sudo] password for user:

As you can see, it even prints out the new full command.

2::Using previous argument(s)

Similar to the first trick, bash will allow you to easily reference the last argument of the previous command.

In this first example we're going to touch a few files, then decide to remove the last one.

user@host:~$ touch a b c
user@host:~$ rm !$
rm c
user@host:~$ ls
a  b

As you can see, !$ only expanded to c. If you only had one argument, it would exand to that one argument.

We can also reference all the previous arguments using !*

user@host:~$ touch a b c
user@host:~$ rm !*
rm a b c
user@host:~$ ls

3::Process Substitution

Right, now its time to move on to something a little different.

We want to compare the files in two directories to see which are in common, missing or changed etc. It would make sense to use the diff utility to help. We are also going to use md5sum to get a fingerprint of the files so we can see if they have changed. All we have to do is look through each directory (using find), do an md5sum of each file and compare the two lists.

Now, diff usually expects to operate on two files. We could redirect our results into files and then diff those files, but there is a smarter way. By using the <(list) for we can run our finds and diff will treat the results files.

diff <(cd boot-a; find . -exec md5sum {} \;) <(cd boot-b; find . -exec md5sum {} \;)

< 9c830b456ed37e0c0d63f2528fb43de5  ./initrd.img-2.6.35-23-generic
< f70fd0262a6f8e1e82028237e94657fc  ./config-2.6.32-25-generic
< c16c8e1705a54db8b96adce7de17710a  ./vmcoreinfo-2.6.32-25-generic
< b1f002028905e42f594c83603606ca0b  ./config-2.6.35-23-generic
< 668d54c704f22d85cb548f27a7e19d41  ./config-2.6.35-22-generic
< 882721477fcf705c02b60a4bb219b0c8  ./vmlinuz-2.6.32-25-generic
< 88472c27c3d18a832b5d85aea1edc541  ./abi-2.6.35-22-generic
< 496b4aed0005b07574e9dd3895089b23  ./abi-2.6.35-23-generic
< 6ee40238bdabb40a862c6f63262efa9d  ./
< 4e02de05c66cb0b322dc5c71f43882e9  ./
< ac9641014f0b460fc6eb5d2936bd869c  ./grub/menu.lst
> c5cce31e9e9eb884685137aac1ab8a8a  ./grub/menu.lst

As you can see, boot-a had quite a few files that boot-b was missing, and grub/menu.lst was different between the two directories.

4::Here document and redirect

Sometimes you need to put together a little config file, or perhaps a test HTML page. It is possible to do it quickly without having to resort to opening some sort of editor such as vi.

user@host:~$ cat << EOF > test.conf
> option1=blah
> option2=something else
> debug=true
> # end of config
user@host:~$ cat test.conf
option2=something else
# end of config

What we did is specified a here document using << EOF. This keeps reading from the terminal until it encounters the string EOF. That is then redirected into the file, in this case test.conf.

5::List expansion when copying

This a very simple but extremely useful little trick that I'm sure you already know, but because of how often it is useful, I thought I'd put it here anyway.

Let us say you have a file you want to quickly backup. Instead of doing something like cp file file.backup we can use list expansion to shorten the command:

user@host:~$ ls
user@host:~$ cp special.conf{,.backup}
user@host:~$ ls
special.conf  special.conf.backup

6::Nested list expansion

Sticking with list expansion, it is worth remembering that you can have nested lists:

user@host:~$ touch {a{1..5},b{1,2,4},c{5..9}}
user@host:~$ ls
a1  a2  a3  a4  a5  b1  b2  b4  c5  c6  c7  c8  c9

7::Quick maths

Need to do a quick calculation? Already have a terminal open? Don't bother firing up a calculator, just do the calculation in bash:

user@host:~$ echo $((2**32))

See the Bash manpage for the supported operators.

8::Tilda expansion

Bash allows you to quickly reference the current $PWD and old one ($OLDPWD). This could be useful if you forgot to use pushd/popd:

user@host:~/a$ cd ../b
user@host:~/b$ ls ~+
user@host:~/b$ ls ~-
user@host:~/b$ echo $OLDPWD

So, ~+ expands to $PWD and ~- expands to $OLDPWD.

9::Setting default values

When you have a bash script that takes command line arguments you might want to set default values if no arguments are supplied by the user.

In the bash script below, we default arg1 to "a" if argument 1 is not supplied. We default arg2 to "b" and arg3 to "c". Of course, the order of the arguments matters.



echo $arg1 $arg2 $arg3

When we run the script without arguments, it uses the defaults. But when we give it arguments, it uses those values.

user@host:~/test$ bash
a b c
user@host:~/test$ bash x y z
x y z

10::TCP and UDP

Finally, bash allows you to send TCP and UDP traffic directly using a special device of the format /dev/tcp/host/port (use udp instead of tcp if necessary).

Firstly, we set file descriptor 3 to use that device. We can do that with exec, as it suggests in the manual:

"Note that the exec builtin command can make redirections take effect in the current shell."

user@host:~$ exec 3<>/dev/tcp/localhost/80
user@host:~$ ls -la /proc/$$/fd/
total 0
dr-x------ 2 user user  0 2011-02-27 16:34 .
dr-xr-xr-x 7 user user  0 2011-02-27 16:34 ..
lr-x------ 1 user user 64 2011-02-27 16:34 0 -> /dev/pts/8
lrwx------ 1 user user 64 2011-02-27 16:34 1 -> /dev/pts/8
lrwx------ 1 user user 64 2011-02-27 16:34 2 -> /dev/pts/8
lrwx------ 1 user user 64 2011-02-27 16:34 255 -> /dev/pts/8
lrwx------ 1 user user 64 2011-02-27 16:34 3 -> socket:[156914]

We quickly use $$ to see the open file descriptors for our process. As you can see, FD 3 has been opened for a socket connection, in this case a TCP connection to localhost on port 80.

We can now use redirect echo into FD 3. Note that we use the -e option so that bash interprets the escaped characters giving us new lines needed by the HTTP protocol. We then redirect into cat to read the output.

user@host:~$ echo -e "GET / HTTP/1.0\n\n" >&3
user@host:~$ cat <&3
HTTP/1.1 200 OK
Date: Sun, 27 Feb 2011 16:34:26 GMT
Server: Apache/2.2.16 (Ubuntu)
Last-Modified: Sun, 27 Feb 2011 16:25:27 GMT
ETag: "901-e-49d4602cddfad"
Accept-Ranges: bytes
Content-Length: 14
Vary: Accept-Encoding
Connection: close
Content-Type: text/html


Cool huh? Not a telnet, netcat or wget in sight...

Saturday, 5 February 2011

A simple Monte Carlo simulation with R

I recently read How to Measure Anything: Finding the Value of Intangibles in Business by Douglas W. Hubbard. It's a fascinating and informative read on the problems and solutions of measuring "soft" variables typically found in business. Fluffy variables like productivity, quality, risk can be measured if you use the right techniques and work within the limitations of measurement and statistics.

There is an excellent example of using a Monte Carlo simulation (or method) to calculate the risk of leasing a new machine in a manufacturing process. You can find the example on pages 82 through to 86.

Given my hatred of spreadsheets and having recently started playing with R, I thought I would have a go at replicating the simulation using R.

This is what I wrote. Please note I'm still an R n00b so some things can be done better no doubt.

######################### Variables #######################

# Firstly set this to TRUE if we want to save our plot as a 
# PNG and if so, what file and dimensions
sFile <- "htma.png"
iWidth <- 1024
iHeight <- 768

# The following values represent our 90% confidence interval 
# (CI) ranges for the various inputs to our simulation.

# We are 90% confident that the maintenance savins per unit 
# is between $10 and $20
vMaintenanceSavingsPerUnit <- c(10,20)

# We are 90% confident that the labour savings per unit 
# is between $-2 and $8
vLabourSavingsPerUnit <- c(-2,8)

# We are 90% confident that the raw material savings per unit 
# is between $3 and $9
vRawMaterialsSavingsPerUnit <- c(3,9)

# We are 90% confident that the production level per year 
# will be between 15K and 35K units
vProductionLevelPerYear <- c(15000,35000)

# The annual lease is $400K so we need to save this amount 
# just to break even for the investment
iAnnualLease <- 400000

# This is a quick cheat which basically means there are 
# 3.29 standard deviations in a 90% confidence interval
iStdDevCheat <- 3.29

# This is the number of simulations we are going to run
iNumberOfSims <- 100000

##################### Generate the basic data ###################

# A new data frame initiated to have iNumberOfSims rows in it
dData <- data.frame(seq(1,iNumberOfSims))

# We use the rnorm function to generate a distribution across 
# all the simulations for the maintenance savings. The mean is 
# literally just the mean of the range (e.g. (20-10)/2) and we 
# also give it the standard deviation of (20-10)/3.29.
dData$MainSavings <- rnorm(iNumberOfSims, 

# Same again for the labour savings
dData$LabourSavings <- rnorm(iNumberOfSims, 

# And the raw material savings
dData$RawMaterialsSavings <- rnorm(iNumberOfSims, 

# And finally the production levels
dData$ProdLevel <- rnorm(iNumberOfSims, 

# We can now create our total savings column based on the 
# inputs given. Because R is a vector language, the below 
# operation is applied to each row automatically.
dData$TotalSavings <- (dData$MainSavings + dData$LabourSavings +
dData$RawMaterialsSavings) * dData$ProdLevel

# Later on it will look better on the graphs if we deal
# with numbers in thousands so create a couple of shortcut variables
dData$TotalSavingsThousands <- dData$TotalSavings/1000
iAnnualLeaseThousands <- iAnnualLease/1000

# We now let R generate a histogram of our savings but without 
# actually plotting the results. We will end up with a series of 
# buckets (aka breaks) which will go on the X axis and the number 
# of simulations that fell within each bucket (on the Y axis)
hHist <- hist(dData$TotalSavingsThousands,plot=FALSE)

# We create a new data frame for the breaks and counts 
# excluding the last break
dHistData <- data.frame(

# We can calculate the chance of the project making a loss as 
# the sum of counts where the breaks were less than 
# the annual lease (ie. $400K).
fPercentChanceOfLoss <- 100*sum(subset(dHistData,

# Calculate the median of the savings. That is 50% of
# the simulations had savings less than the median 
# and 50% had savings of more than the median.
fMedian <- median(dData$TotalSavingsThousands)

# We put that chance of loss in a sub title
sSubTitle <- sprintf("%02.2f%% chance of loss at $400K expense, 
median savings at $%02.0fK", fPercentChanceOfLoss, fMedian)

# Check whether we want to save our PNG
bDoPNG && is.null(png(sFile, width=iWidth, height=iHeight))

# Now draw the actual histogram, setting some labels but without 
# drawing the axis
hist(dData$TotalSavingsThousands,col="lightblue", main="Histogram of Savings",
xlab="Savings per Year ($000s in 100,000 increments)",
ylab="Senarios in Increment", axes=F)

# Add the sub title

# Draw the Y axis using default parameters

# Now draw the X axis explicitly setting values of the ticks/breaks
axis(1, at=hHist$breaks)

# That's it, turn off output if saving PNG
bDoPNG && is.null(
And this is the pretty graph it produced. It should look similar to the one on page 86.

So yeah, go buy the book. Read it. Then have fun with R :)