Chapter 5 Provisioning Instructions

In order to make the most of the material in these notes, you will need to have a Spark cluster with Microsoft R Server installed. The easiest way of accomplishing this is to provision a premium HDInsight Spark Cluster on Azure. This module provides a walkthrough of how to provision a Spark cluster on Azure HDInsight Premium with Microsoft R Server, and how to add an edge node with RStudio Server.

5.1 Provision Cluster from Azure Portal

The Azure documentation page provides details on how to provision a Spark cluster with Microsoft R Server.

The first steps are outlined here: Get started using R Server on HDInsight (preview)

I have summarized the steps here to help you get started quickly:

  • Login to portal.azure.com with your Azure subscription
  • New -> Data + Analytics -> HDInsight
  • Choose Premium cluster: R Server on Spark
  • Create an sshkey, using putty or openSSH, and include the public key in the credentials tab
  • Install RStudio Server on the Edge Node
  • Tunnel into your RStudio Server instance, and start your ML pipeline!

5.2 Installing Packages

For packages you only need to run on the edge node, you can continue using install.packages. For packages you need installed on the edge node as well as all the worker nodes, you’ll need to use a script action

5.2.1 todo - install packages demo