Saturday, 5 February 2011

A simple Monte Carlo simulation with R

I recently read How to Measure Anything: Finding the Value of Intangibles in Business by Douglas W. Hubbard. It's a fascinating and informative read on the problems and solutions of measuring "soft" variables typically found in business. Fluffy variables like productivity, quality, risk can be measured if you use the right techniques and work within the limitations of measurement and statistics.

There is an excellent example of using a Monte Carlo simulation (or method) to calculate the risk of leasing a new machine in a manufacturing process. You can find the example on pages 82 through to 86.

Given my hatred of spreadsheets and having recently started playing with R, I thought I would have a go at replicating the simulation using R.

This is what I wrote. Please note I'm still an R n00b so some things can be done better no doubt.

######################### Variables #######################

# Firstly set this to TRUE if we want to save our plot as a 
# PNG and if so, what file and dimensions
bDoPNG <- FALSE
sFile <- "htma.png"
iWidth <- 1024
iHeight <- 768

# The following values represent our 90% confidence interval 
# (CI) ranges for the various inputs to our simulation.

# We are 90% confident that the maintenance savins per unit 
# is between $10 and $20
vMaintenanceSavingsPerUnit <- c(10,20)

# We are 90% confident that the labour savings per unit 
# is between $-2 and $8
vLabourSavingsPerUnit <- c(-2,8)

# We are 90% confident that the raw material savings per unit 
# is between $3 and $9
vRawMaterialsSavingsPerUnit <- c(3,9)

# We are 90% confident that the production level per year 
# will be between 15K and 35K units
vProductionLevelPerYear <- c(15000,35000)

# The annual lease is $400K so we need to save this amount 
# just to break even for the investment
iAnnualLease <- 400000

# This is a quick cheat which basically means there are 
# 3.29 standard deviations in a 90% confidence interval
iStdDevCheat <- 3.29

# This is the number of simulations we are going to run
iNumberOfSims <- 100000

##################### Generate the basic data ###################

# A new data frame initiated to have iNumberOfSims rows in it
dData <- data.frame(seq(1,iNumberOfSims))

# We use the rnorm function to generate a distribution across 
# all the simulations for the maintenance savings. The mean is 
# literally just the mean of the range (e.g. (20-10)/2) and we 
# also give it the standard deviation of (20-10)/3.29.
dData$MainSavings <- rnorm(iNumberOfSims, 
mean(vMaintenanceSavingsPerUnit), 
diff(vMaintenanceSavingsPerUnit,1,1)/iStdDevCheat)

# Same again for the labour savings
dData$LabourSavings <- rnorm(iNumberOfSims, 
mean(vLabourSavingsPerUnit), 
diff(vLabourSavingsPerUnit,1,1)/iStdDevCheat)

# And the raw material savings
dData$RawMaterialsSavings <- rnorm(iNumberOfSims, 
mean(vRawMaterialsSavingsPerUnit), 
diff(vRawMaterialsSavingsPerUnit,1,1)/iStdDevCheat)

# And finally the production levels
dData$ProdLevel <- rnorm(iNumberOfSims, 
mean(vProductionLevelPerYear), 
diff(vProductionLevelPerYear,1,1)/iStdDevCheat)

# We can now create our total savings column based on the 
# inputs given. Because R is a vector language, the below 
# operation is applied to each row automatically.
dData$TotalSavings <- (dData$MainSavings + dData$LabourSavings +
dData$RawMaterialsSavings) * dData$ProdLevel

# Later on it will look better on the graphs if we deal
# with numbers in thousands so create a couple of shortcut variables
dData$TotalSavingsThousands <- dData$TotalSavings/1000
iAnnualLeaseThousands <- iAnnualLease/1000

# We now let R generate a histogram of our savings but without 
# actually plotting the results. We will end up with a series of 
# buckets (aka breaks) which will go on the X axis and the number 
# of simulations that fell within each bucket (on the Y axis)
hHist <- hist(dData$TotalSavingsThousands,plot=FALSE)

# We create a new data frame for the breaks and counts 
# excluding the last break
dHistData <- data.frame(
breaks=hHist$breaks[1:length(hHist$breaks)-1], 
count=hHist$counts)

# We can calculate the chance of the project making a loss as 
# the sum of counts where the breaks were less than 
# the annual lease (ie. $400K).
fPercentChanceOfLoss <- 100*sum(subset(dHistData,
breaks<iAnnualLeaseThousands,
select=count))/sum(dHistData$count)

# Calculate the median of the savings. That is 50% of
# the simulations had savings less than the median 
# and 50% had savings of more than the median.
fMedian <- median(dData$TotalSavingsThousands)

# We put that chance of loss in a sub title
sSubTitle <- sprintf("%02.2f%% chance of loss at $400K expense, 
median savings at $%02.0fK", fPercentChanceOfLoss, fMedian)

# Check whether we want to save our PNG
bDoPNG && is.null(png(sFile, width=iWidth, height=iHeight))

# Now draw the actual histogram, setting some labels but without 
# drawing the axis
hist(dData$TotalSavingsThousands,col="lightblue", main="Histogram of Savings",
xlab="Savings per Year ($000s in 100,000 increments)",
ylab="Senarios in Increment", axes=F)

# Add the sub title
mtext(sSubTitle)

# Draw the Y axis using default parameters
axis(2)

# Now draw the X axis explicitly setting values of the ticks/breaks
axis(1, at=hHist$breaks)

# That's it, turn off output if saving PNG
bDoPNG && is.null(dev.off())
And this is the pretty graph it produced. It should look similar to the one on page 86.

So yeah, go buy the book. Read it. Then have fun with R :)

5 comments:

  1. Neat example, thanks for sharing.

    ReplyDelete
  2. Might want to check your colour scheme. I can't even read the text

    ReplyDelete
  3. Great help, two "errors" to note.
    Firstly and probably encoding on the webpage not real error:
    breaks&ltiAnnualLeaseThousands,
    should be:
    breaks<iAnnualLeaseThousands,

    The other a d should replace an h.
    axis(1, at=dHist$breaks)
    should be:
    axis(1, at=hHist$breaks)

    Thanks for the help this provided.

    ReplyDelete
  4. Cool, thanks for the bugs :) Updated, but not tested.

    ReplyDelete