Let’s say you have a large job to process with R, but the hardware on your laptop just isn’t cutting it. One solution is to spin up an AWS EC2 server, install R, and run your process on a temp server that you can use for a short amount of time. Currently, AWS has a server for $2.80/hour that you could scale up to with 32 cores and 224 GB of RAM, which is probably several multiples of a local machine. However, getting your environment set up takes time, and time is literally money when you’re paying by the hour…so I created a script to quickly get you up and running. It saves you the time of going to Revolution R Open to get their version of R, and RStudio to get their version of RStudio.
There are plenty of great AWS tutorials out there for spinning up an EC2 instance, so I will assume that you have that base level of knowledge and are ready to run a bash script from the command line on a fresh server. To avoid version and compatibility issues, I recommend using the same type of server that I used to build this: the Ubuntu Server 14.04 LTS (HVM), SSD Volume Type – ami-9eaa1cf6 AMI.
Here it is, hosted on GitHub...three lines of code to install RStudio Server leveraging Revolution R Open. It completes in less than 5 minutes.
sudo apt-get install git -y
git clone https://github.com/benporter/lazyman-openr-rstudio.git
The script will pause at the end for you with the "Enter new UNIX password" prompt for the newly created user to access RStudio, rstudiouser. Feel free to leave the other things it asks for blank.
The script prints your unique RStudio login URL at the end so you can get to work.
Description - Here is an overview of the script that you just blindly ran.
Install Revolution R Open
I prefer this over the vanilla version from CRAN for the potential of free code speed ups. They've optimized it to take advantage of all of your cores in some instances without changing your code, so why not default to this?
As of the writing of this post, 8.0.1 Beta 3 was the most recent version available, and hard coded into the script.
Install RStudio Server
If you're not already familiar, RStudio Server just like the desktop version of RStudio, except it is accessible via a url. Rarely do I experience a lag, and can comfortably write code directly into the browser without noticing a difference from the desktop version.
As of the writing of this post, version 0.98.1091 was the most recently available for RStudio Server, and is hard coded into this script.
Since this is the lazy man's version, I choose the username for you, rstudiouser. My wife assures me that laziness is solely a male quality, so we can call this the efficient woman's version if you prefer. If rstudiouser isn't original enough for you, then feel free to edit the script on the "sudo adduser rstudiouser" line. Or comment it out altogether if you already have a login to the box that you want to use.
This script hard codes the Revolution Analytics CRAN mirror. If you want to choose a different one, edit line 15 with your preferred mirror:
echo "deb http://cran.revolutionanalytics.com/bin/linux/ubuntu trusty/" | sudo tee -a /etc/apt/sources.list
Note, you should only change this part of the code to change the mirror: "http://cran.revolutionanalytics.com/bin/linux/ubuntu." Here is a list of CRAN mirrors for Ubuntu, their speed and last update. When changing your mirror, be sure to preserve the "deb" prefix and "<space>trusty/" suffix assuming you're running Ubuntu 14.04. If you're not sure what version you're running, then run this from the command line and look for your codename.
> lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Troubleshooting - AWS Security Group
If you can't see an RStudio login screen when you go to the url, then double check a couple easy things first. First make sure you have the ":8787" at the end of the url. Second, if you shutdown and restarted the machine, your public DNS has probably changed. You can see the new one from the AWS Console.
After checking those two things, then make sure that the security group on your EC2 server is open to port 8787, which is what RStudio Server runs on. You can open this port from the AWS Console for EC2 by clicking through the following:
- Under "Network & Security" click "Security Groups"
- If you're not sure which Security Group your server is running on, then click over to "Instances" and click on your server. You'll see the security group below.
- Back in "Security Groups" select your security group.
- Click "Actions" and "Edit Inbound Rules"
- Click "Add Rule" and put "8787" without the quotes in the "Port" field.
- For additional security, choose "My IP" under "Source," which only leaves the port open to the machine you're currently on. Remember this if you start working from another machine or renew your IP address later.