Skip to main content
Version: v2.1 print this page

Notebooks

Amorphic platform provides a way to host Jupyter/IPython notebooks, which are interactive, web-based environments that allow users to create and share documents that contain live code, equations, visualizations, and narrative text.

Amorphic Notebooks contain the following information:

TypeDescription
Notebook NameML Notebook Name, which uniquely
DescriptionA brief explanation of the notebook
Notebook StatusStatus of ML notebook. Ex: Creating, InService, Stopping, Stopped etc.
Instance TypeML compute notebook instance type
Volume SizeThe size, in GB, of the ML storage volume attached to the notebook instance.
Notebook URLURL to connect to the Jupyter server from notebook instance
Glue SessionsFlag to identify whether glue sessions is enabled for the notebook instance
Auto TerminateStatus of the auto-termination. Ex: Enabled, Disabled
Remaining TimeAmount of time (in hr) left for auto-termination
Auto Termination TimeTime at which the system auto terminates the ML notebook.
Internet AccessSets whether SageMaker provides internet access to the notebook instance
CreatedByUser who created the ML notebook.
LastModifiedByUser who has recently updated the ML notebook.

Notebook Operations

Amorphic ML Notebook provides below operations for a notebook.

OperationDescription
Create NotebookCreate a ML Notebook in AWS Sagemaker.
View NotebookView an existing ML Notebook.
Delete NotebookDelete an existing ML Notebook.

How to Create a Notebook?

ML Notebook Homepage

To create an ML Notebook:

  1. Click on + New Notebook
  2. Fill in the details shown in the table:
AttributeDescription
Notebook NameGive your Notebook a unique name.
DescriptionDescribe the notebook purpose and important details.
Instance TypeChoose the type of ML compute instance to launch the notebook instance.
Volume SizeML notebook storage volume size in GB.
Datasets Write AccessSelect datasets with the write access required for the notebook.
Datasets Read AccessSelect datasets with the read access required for the notebook.
Enable Internet AccessThis setting controls whether the notebook instance can access the internet. If you set it to "Disabled", the notebook instance can only access resources inside your VPC, and will not be able to use Amazon SageMaker training and endpoint services unless you set up a NAT Gateway in your VPC.
Glue SessionsUser can select this to enable or disable Glue sessions for the notebook instance
Auto TerminateThis option allows you to save on resource costs by providing a termination time value. The auto termination process will be triggered every hour, looking for any ML notebooks that need to be notified or stopped, and sending an email when one of the following criteria is met.

You will receive a notification email when:

  • The auto-terminate process trigger runs every hour, and the termination time is less than 30 minutes.
  • The auto-termination process was successfully able to stop the ML notebook after the termination time.
  • The auto-termination process wasn't able to stop the ML notebook due to some fatal errors.

Auto Termination Time: You can set the maximum auto termination time for the ML notebook to be less than 168 hours (7 days). Once the current time is greater than the termination time, the notebook will be deleted at the next whole hour. You can also modify the termination time with the maximum time set to less than 168 hours (7 days).

Note
  • Auto-termination process is scheduled to run every hour on the hour (e.g: 6:00, 7:00, 8:00, 9:00).
  • You will receive a email notification only if you are subscribed to alerts. To enable alerts, refer to Alert Preferences.
  • When the termination time elapses, auto termination process will stop the ML notebook. You need to manually delete the notebook if intended.

You can set up or create a new notebook instance and use your IPython notebook to perform model training. You can call Python Sagemaker SDK to create a training job.

Once a training job is created, you can use the S3 model location information to create a model in the Amorphic portal. For accessing the datasets inside the IPython notebooks, you can check the dataset details for the S3 location information. For example, the exhibit above shows the dataset details with the respective dataset S3 location.

For the purpose of creating a Sagemaker model in the notebook, the user can use the ml-temp bucket. Amorphic Notebooks have write access to the ml-temp bucket (for example, s3://cdap-us-west-2-484084523624-develop-ml-temp). Please note that this S3 bucket is almost the same as the dataset S3 path, except for the ml-temp at the end. This ml-temp bucket can be used to create a training job and upload a model tar file. This model file location can then be used to create a model using the "Artifact Location" of Amorphic model (see model creation section).

You can use the S3 location mentioned here to read the files related to the training dataset and save the output Sagemaker model tar file for Amorphic model object creation purposes.

Notebook Details

ML Notebook Details

All the information specified while creating the notebook is displayed in the Details page, including the Notebook URL and Message field. The Message field displays different information based on the notebook's status:

  • If the notebook status is failed, the Message field displays the failure information.
  • If you do not have all the datasets access required for the notebook, the Notebook URL will not be displayed and the Message field will show the missing datasets access information.

Following details is displayed when you enable auto-termination on the ML notebook. Remaining Time denotes the amount of time (rounded to nearest upper hour) left for auto-termination.

In below image, the auto termination time is set to 02 Jun, 2021 7:10 PM but the ML notebook will be stopped at 02 Jun, 2021 8:00 PM because the termination process is scheduled to run at whole hour.

Note

Starting from version 1.9, Auto termination process will only stop the notebook instance and won't delete the instance.

In the details page, Estimated Cost of the ML notebook is also displayed to show approximate cost incurred since the creation/last modified time.

Edit Notebook

The Edit Notebook page is divided into two sections:

  • Basic Info: You can use this section to update all the basic details of the notebook.
  • Datasets: You can use this section to update datasets which requires access permissions.

Delete Notebook

If you have sufficient permissions you can delete an ML notebook. Deleting an ML notebook is an asynchronous operation. When triggered, the status will change to deleting and the notebook will be deleted from AWS Sagemaker. Once the notebook is deleted in AWS Sagemaker, the associated metadata will also be removed.

Note

The ML notebook must be in Stopped state in order to delete it.

Update Extra Resource Access

To provide parameter or shared libraries or dataset access to a notebook in large number, use the documentation on How to provide large number of resources access to an ETL Entity in Amorphic

Glue Session Operations

Amorphic ML Notebook provides below operations for a glue sessions enabled notebook.

OperationDescription
Create Glue SessionCreate a Glue session for a Notebook.
Stop Glue SessionStop an existing Glue session for ML Notebook.
Delete Glue SessionDelete an existing Glue session for ML Notebook.

Create Glue Session

Create Glue Session

The user must create a notebook instance with Glue Session enabled as described in the steps for creating a notebook.

Once the notebook instance is active ( if the notebook status is stopped, the user must start the notebook), the user can find the Notebook URL on the details page and on opening the Notebook URL the user is redirected to Jupyter server. The user has to create a new Jupyter notebook with conda_glue_pyspark kernel. The following code example will help the user create a glue session for the notebook.

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

glueContext = GlueContext(SparkContext.getOrCreate())

Notebooks support executing SQL commands from Notebook with Glue session enabled using Magic commands

%%sql
select * from table

Once the glue session is created, all active and stopped sessions of the notebook can be viewed in sessions tab of the notebook details page.

Stop Glue Session

Stop Glue Session

Delete Glue Session

Delete Glue Session

Note

If a notebook with glue sessions enabled is stopped, all the glue sessions associated with the notebook will be deleted.

Notebook use case

A use case for ML Notebook could be a company that wants to use machine learning to predict customer churn.

The company can set up a new notebook instance on the Amorphic platform and use IPython notebooks to perform model training. They can call the Python Sagemaker SDK to create a training job using the customer churn data stored in the S3 bucket.

Once the training job is complete, the company can use the S3 model location information to create a model in the Amorphic portal. They can access the customer churn dataset inside the IPython notebooks using the dataset details and S3 location information. The ml-temp bucket can be used to create the Sagemaker model and upload the model tar file, which can then be used to create a model object in the Amorphic portal.

The company can use the S3 location mentioned in the use case to read the files related to the customer churn dataset and save the output Sagemaker model tar file for Amorphic model object creation. This allows the company to effectively train a machine learning model to predict customer churn and use it in their business processes.