Monday, May 14, 2018

Azure Machine Learning Workbench: Python Notebooks

Today, we're going to continue our walkthrough of the "Classifying_Iris" template provided as part of the AML Workbench.  Previously, we've looked at Getting Started, Utilizing Different Environments, Built-In Data Sources and Built-In Data Preparation (Part 1, Part 2, Part 3).  In this post, we're going to begin looking at the Python Notebook available within the "Classifying Iris" project.

Notebooks are a very interesting coding technique that have risen to prominence recently.  Basically, they allow us to write code in a particular language, Python in this case, in an environment where we can see the code, results and comments in a single view.  This is extremely useful in scenarios where we want to showcase our techniques and results to our colleagues.  Let's take a look at the Notebook within AML Workbench.
iris Notebook
On the left side of the screen, select the "Notebook" icon.  This displays a list of all the Notebooks saved in the project.  In our case, we only one.  Let's open the "iris" Notebook.
Page 1
The first step to using the Notebook is to start the Notebook Server.  We do this by selecting the "Start Notebook Server" button at the top of the "iris" tab.
Notebook Started
The first thing we notice is the Jupyter icon at the top of the screen.  Jupyter is an open-source technology that creates the Notebook interface that we see here.  You can read more about Jupyter here.  Please note that Jupyter is not the only Notebook technology available, but it is one of the more common ones.  Feel free to look more into Notebooks if you are interested.

We also notice is that the top-right corner now says "EDIT MODE" instead of "PREVIEW MODE".  This means that we now have the ability to interact with the Notebook.  However, we first need to instantiate a "kernel".  For clarity, the term "kernel" here refers to the computer science term.  You can read more about the other types of kernels here.  Basically, without a kernel, we don't have any way of actually running our code.  So, let's spin one up.
Kernel Selection
We can instantiate a new kernel by selecting "Kernel -> Change Kernel -> Classifying_Iris local".  This will spin up an instance on our local machine.  In more advanced use cases, it's possible to spin up remote containers using Linux VMs or HDInsight clusters.  These can be very useful if we want to run analyses using more power than we have available on our local machine.  You can read more about AML Workbench kernels here.
Notebook Ready
Once we select a new kernel, we see that the kernel name appears in the top-right of the tab, along with an open circle.  This means that the kernel is "idle".

The creators of this notebook were nice enough to provide some additional information in the notebook.  This formatted text is known as "Markdown".  Basically, it's a very easy way to add cleanly formatted text to the notebook.  You can read more about it here.

Depending on your setup, you may need to run these two commands from the "Command Prompt".  We looked at how to use the Command Prompt in the first post in this series.  If you run into any issues, try running the commands in the first post and restarting the kernel.  Let's look at the first segment of code.
Segment 1
Code segments can be identified by the grey background behind them.  This code segment sets up some basic notebook options.  The "%matplotlib inline" command allows the plots created in subsequent code segments to be visualized and saved within the notebook.  The "%azureml history off" command tells AML Workbench not to store history for the subsequent code segments.  This is extremely helpful when we are importing packages and documenting settings, as we don't want to verbosely log these types of tasks.

We also see one of the major advantages of utilizing notebooks.  The "%azureml history off" command creates an output in the console.  The notebook captures this and displays it just below the code segment.  We'll see this in a much more useful manner later in this post.  Let's check out the next code segment.
In Python, we have a few options for importing existing objects.  Basically, libraries contain modules, which contain functions.  We have the option of importing the entire library, an individual module within that library or an individual function within that module.  In Python, we often refer to modules and functions using "dot notation".  We'll see this a little later.  We bring it up now because it can be cumbersome to refer to the "matplotlib.pyplot.figure" function using its full name.  So, we see that the above code aliases this module using the "as plt" code snippet.  Here's a brief synopsis of what each of these libraries/modules do, along with links.

pickle: Serializes and Deserializes Python Objects
sys: Contains System-Specific Python Parameters and Functions
os: Allows Interaction with the Operating System
numpy: Allows Array-based Processing of Python Objects
matplotlib: Allows 2-Dimensional Plotting
sklearn (aka scikit-learn): Contains Common Machine Learning Capabilities
azureml: Contains Azure Machine Learning-specific Functions

Let's move on to the next segment.
Segment 3
The "get_azureml_logger()" function allows us to explicitly define what we output to the AML Workbench Logs.  This is crucial for production quality data science workflows.  You can read more about this functionality here.

Finally, this code prints the current version of Python we are utilizing.  Again, we see the advantage of using notebooks, as we get to see the code and the output in a single block.  Let's move on the final code segment for this post.
Segment 4
Since we are ready to begin the data science portion of the experiment, we turn the logging back on.  The rest of the code segments in this notebook deal with data science programming.  So, we'll save this for the next post.

Hopefully, this post opened your minds to the possibilities of using Python Notebooks within AML Workbench.  Notebooks are quickly becoming the industry standard technique for sharing data science experiments, and will no doubt play a major role in the day-to-day tasks of most Data Scientists.  Stay tuned for the next post where we'll walk through the data science code within the "iris" notebook.  Thanks for reaching.  We hope you found this informative.

Brad Llewellyn
Senior Analytics Associate - Data Science
Syntelli Solutions
@BreakingBI
www.linkedin.com/in/bradllewellyn
llewellyn.wb@gmail.com