RSync Backup using Amazon Web Services

This is an advanced topic. I can only offer limited help if something goes tits up. Please do not attempt unless you are fairly sure you can read the associated links and understand what's going on.

I recently lost nearly all my photos, music and video files when my MBWE disk crashed. The videos and music could be re-ripped, but 5 years of photos were gone. So I started seriously looking at automated backups.

rsync is an obvious choice for copying files. It works across the network, compresses to reduce traffic and compares the source and destination areas to only transfer the differences. For backups, this means tha tonly changes since the last run need transferred.

I've bought a 1Tb MyBook Essential which is connected to the MBWE by USB. I use rsync to copy everything to the 1Tb disk once per week. Great! Everything is replicated and can be restored in the case of a crash.

But what if there was a fire? PANIC!!

Music and videos are replaceable, but I'd like my photos to be be stored somewhere absolutely safe.

Amazon Web Services (AWS) offers the chance to pay a very reasonable rate for online storage. As little as 10c per Gb per month. You can also pay to use a virtual computer at an hourly rate. (This isn't advertising, I just want to make it clear that there is a charge associated with this process)

Storage Pricing
Processing Pricing

The process detailed below basically involves keeping a virtual disk on AWS to store my photos. Once a month, I fire up a virtual computer, attach the virtual disk to the virtual computer, rsync the photos to the remote computer, before detaching the disk and shutting down the computer again.

You only pay the monthly storage costs, plus the "processing time" while the transfer takes place.

I started by using the link here. This describes the basic process but the Amazon Tools used there rely on Java which won't really work on the MBWE. Luckily a Perl version exists.

Here are the steps required.

Download/Install CURL

The Perl AWS program uses CURL for its https communication. If you have optware you may prefer to install from there, but I used the source here. I won't go into the details as it's a standard
dvlp> ./configure
dvlp> make
dvlp> make install

process for anyone who compiles their own.

Note, curl has a dependency on OpenSSL for its https encryption. I think OpenSSL is included on the MBWE by default. If this is not the case then apologies, have a go solving the dependency yourself and let me know. I'll update this page.

Next, AWS uses secure http so CURL needs a Certified Authority cert bundle (as described here). I downloaded the following: and saved it to /etc/ssl/certs. (You must be root to save to this location) You can use the wget command to get this file from the MBWE prompt.
certs> wget

Sign Up for Amazon Web Services

I won't repeat the steps that you're about to follow, but start here: and select "Create an AWS Account" in the top right corner"
It quite self explanatory (Amazon having thought about this and paid swathes of money to web designers trained to explain it better than I could.

One of the actions that occurs when you create an account is the generation of a set of authentication certificates and credentials. These identify your account to Amazon. The AWS Perl tool uses these in its communication.
Go to the "Your Account" section on the AWS website and view "Your Account Identifiers". You will need "Your Access Key" and "Your Secret Access Key". Don't let anyone else see these. Save them into a file called .awssecret, saved in your home directory on the MBWE. For example

gordon> more .awssecret

For security, you should really make this file readable only by you.

gordon> chmod 600 .awssecret

(Those aren't my real keys, so don't try) This is all described here if you want more information but we're jumping ahead.

Download and install AWS

The Perl AWS tool is not an official Amazon download but is due to some excellent work by Tim Kay. Download the AWS Perl script and save it in /usr/local/bin. As root:
gordon> cd /usr/local/bin
bin> wget

AWS will use the .awssecret file we created earlier to let Amazon know "who you are". What you also will require is a .awsrc file to configure AWS. Here's mine

gordon> more .awsrc

The region setting just tells AWS to use the eu-west-1 region as the default. If you are in the US (specifically, if you created your volume in the steps above in the US region) then this setting is NOT required.
The sanity-check setting just tells AWS to be a bit strict. I foun without it that things would sometimes work but not be as secure as they could be so prefer to leave it in.

Create an AWS Volume

This is your "virtual disk". When you detach from any processing this is the storage which remains and holds your data. I haven't tested the following but suspect it is easy and more strightforward than anything else.

Use the "AWS Management Console" on the AWS website. Visit "Your Account" and select "AWS Management Console" You may have to log in again. Select the "Amazon EC2" tab and you will see a summary of your current setup.

There is an option here to view "Volumes" (left hand panel). From here it should be possible to create a new volume. Make sure it is big enough to hold your backup data. Remember that creation of this volume will start Amazon charging you on a per Gb, per month basis!
Important: In the top-left of the Management Console is a Region drop-down. If you are in Europe then you are probably better selecting the "EU-West" option. Those in the US should keep the default "US-East" option. For everyone else, it's your choice.
Take a note of your Volume ID, you'll need it later.
(As an side, I originally created my volume using the official Amazon EC2 tools. The volume is persistent and totally compatible with viewing here in the console, and attaching to virtual machines using the Perl tool.)

You must also create an authorisation "keypair" at this time. In the Managmeent Console, select "Key Pairs" from the left-hand menu (or "My Resources" box on the right). Create a keypair named gsg-keypair. (You can call it something else but then you will have to alter the script used later on). You will be prompted to save the public part of the key to your local machine. Create a sub-directory under your home directory named ".ec2" and save the file here, named id_rsa-gsg-keypair. (Again calling it something different is ppossible but requires more changes later on.)

Whilst we're in the Management Console, we can look at the types of virtual machines we can create. These are refered to as Amazon Machine Instances (AMIs). I like Ubuntu and the script later on is designed to be used with it, specifically the alestic 32 bit 8.10 intrepid instance with AMI ID ami-2810385c. If you're in the US region you might have a different AMI ID (possibly ami-1a5db973). Again, take a note of this.

Create a Script to Do All The Tasks Required To RSync Your Files To AWS

Here's one I made earlier:

Save this somewhere on your executable PATH and ensure it is executable. You may need to make the following changes:

  1. Replace <YOURVOLID> with the volume ID you recorded above.
  2. Replace <YOURAMIID> with the AMI ID you recorded above
  3. Remove any "-z eu-west-1a" references IF you are using the US-East-1 Region.
  4. Replace the reference to /shares/internal/photos with whatever directory you want to backup. You can duplicate this line and call rsync multiple times if you have multiple directories. The destination location is always under /mnt/store but again, subdirectories under here should be changed to reflect your structure.


You can run the script straightaway and it will automate the steps of starting an instance, mounting your volume, copying your files to the remote location, unmounting the volume, then shutting down the instance. You can then add the script to a cron job be called automatically, say once per month.

Improvements/To Do

  1. Automatically work out if your destination volume is big enough and expand it if necessary
  2. Failing that, email you if any step failed. What you do NOT want to happen is for an instance to be started, and then fail to shutdown, as you'll be charged something like 10cents per hour while it's running. (This has never happened at any point during my testing)
  3. Pull out the variables such as the directories to be backedup, -z option, the volume id, the ami id and such into a .rc file to make it easier to use/re-use.
  4. Add a step to create an S3 snapshot as described in the original link.
  5. Respond to any feedback about this page. I realise it's long and wordy. Any suggestions for bug-fixes are greatfully received.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License