ETL Notebooks
Amorphic platform provides a way to host Jupyter/IPython notebook.
User can set up or create a new notebook instance and use your IPython notebook to perform model training.
The following picture depicts the ETL Notebook page in Amorphic:
Amorphic ETL Notebooks contain the following information:
ETL Notebooks Information
Type | Description |
---|---|
Notebook Name | ETL Notebook Name, which uniquely |
Description | A brief explanation of the notebook |
Notebook Status | Status of ETL notebook. Ex: Creating, InService, Stopping, Stopped etc. |
Instance Type | ML compute notebook instance type |
Volume Size | The size, in GB, of the ML storage volume attached to the notebook instance. |
Notebook URL | URL to connect to the Jupyter server from notebook instance |
Auto Terminate | Status of the auto-termination. Ex: Enabled, Disabled |
Remaining Time | Amount of time (in hr) left for auto-termination |
Auto Termination Time | Time at which the system auto terminates the ETL notebook. |
Glue Endpoint Id | Glue Endpoint ID to which the Notebook is attached |
CreatedBy | User who created the ETL notebook. |
LastModifiedBy | User who has recently updated the ETL notebook. |
ETL Notebook Operations
Amorphic ETL Notebook provides below operations for a notebook.
- Create ETL Notebook: Create an ETL Notebook in AWS Sagemaker.
- View ETL Notebook: View an existing ETL Notebook.
- Delete ETL Notebook: Delete an existing ETL Notebook.
Create ETL Notebook
ETL Notebooks component in Amorphic helps user to create an ETL notebook.
An ETL Notebook can be created using the '+' icon on the top right of page under ETL Notebook section in ETL Dropdown list.
In order to create ETL Notebook, User needs to specify all the mandatory requirements.
ETL Notebook Name: A Unique name of Notebook that User wants to Create
Instance Type: User needs to choose the type of ML compute instance to launch the notebook instance
Volume Size: Notebook storage volume size in GB
Endpoint Name: User needs to specify the Endpoint to be mapped for the notebook
Description: Brief description of the Notebook
Auto Terminate: Whether to enable or disable auto termination on the ETL notebook. This option enables ETL notebook termination to save resource costs based on the termination time value provided by the user. Auto termination process will be triggered every hour and looks for any ETL notebooks that needs to be notified/stopped and sends an email when one of the below criteria met.
User will receive a notification email when:
- the difference between the auto-terminate process trigger run (every whole hour) and the termination time is less than 30 minutes.
- the auto-termination process was successfully able to stop the ETL notebook after the termination time
- the auto-termination process wasn't able to stop the ETL notebook due to some fatal errors.
Auto Termination Time: Denotes the time at which the user wants the ETL notebook to be auto terminated. The maximum auto termination time that a user can set will be less than 168 hours (7 days). Once the current time is greater than the termination time then the termination process will be deleting this ETL notebook at the next whole hour. User will also be able to modify the termination time by selecting "Edit ETL notebook" in the details page and the maximum time that can be set must be less than 168 hours (7 days).
Note- Auto-termination process is scheduled to run every hour on the hour (e.g: 6:00, 7:00, 8:00, 9:00).
- User must make sure that the notebook termination time is less than the termination time of the dependent endpoint. If this criteria is not met then email notification will be sent asking the user to update the downstream dependent resource so that the resources can be auto terminated.
- User will receive a email notification only when the user is subscribed to alerts. Please refer to Alert Preferences to enable alerts.
- When the termination time elapses, auto termination process will stop the ETL notebook. User needs to manually delete the notebook if intended.
Network Configuration ETL notebook has the same network-configuration as ETL endpoint's network configuration. It can be Public, App-Public or App-Private.
View ETL Notebook
If the user has sufficient permissions to view a ETL notebook then all the notebook information can be viewed by clicking on the ETL notebook name in the "ETL Notebooks" under ETL section.
All the information specified while creating the ETL notebook will be displayed in the details page. Along with these, Notebook URL and Message field will be displayed based on the below scenarios:
- If the notebook status is failed then user can view the failure information in the Message field.
- If the user doesn't have all the datasets access required for the notebook, then the user cannot view the Notebook URL and missing datasets access information will be displayed in the Message field.
Following details will be displayed when the user enables auto-termination on the ETL notebook. Remaining Time denotes the amount of time (rounded to nearest upper hour) left for auto-termination.
In below image, the auto termination time is set to 02 Jun, 2021 7:41 PM but the ETL notebook will be stopped at 02 Jun, 2021 8:00 PM because the termination process is scheduled to run at whole hour.
Starting from version 1.9, Auto termination process will only stop the notebook instance and won't delete the instance.
In the details page, Estimated Cost of the ETL notebook is also displayed to show approximate cost incurred since the creation/last modified time.
Delete ETL Notebook
If the user has sufficient permissions to delete an ETL notebook then the notebook can be deleted using the Delete (trash) button on the right side.
The ETL notebook must be in Stopped state in order to delete it.
Delete ETL notebook is an asynchronous operation so when the delete is triggered the status will change to deleting and waits until the notebook is deleted from AWS Sagemaker. When the notebook is deleted in AWS Sagemaker then the metadata related to the notebook is also deleted.
Please follow the below animation to delete the notebook: