Endpoints
Endpoint is an environment that you can use to develop and test or run AWS Glue scripts. Endpoint is the platform needed to install jupyter notebooks and run scripts. It enables the machine to AWS connectivity.
The following picture depicts the Glue Endpoint page in Amorphic:
Amorphic Endpoints contain the following information:
Endpoints Information
Type | Description |
---|---|
Glue Endpoint Name | Glue Endpoint Name, which uniquely |
Description | A brief explanation of the glue endpoint |
Glue Endpoint Status | Status of Glue Endpoint. Ex: provisioning, ready etc. |
Capacity | The number of AWS Glue Data Processing Units (DPUs) allocated to this Endpoint |
Glue Python Version | Python version indicates the version supported for running your ETL scripts on development endpoints. |
Auto Terminate | Status of the auto-termination. Ex: Enabled, Disabled |
Network Configuration | Subnet(Public or Private) in which the endpoint is deployed and provisioned |
Remaining Time | Amount of time (in hr) left for auto-termination |
Auto Termination Time | Time at which the system auto terminates the glue endpoint. |
Public Keys | A list of public keys to be used by the Endpoints for authentication |
Extra Jars S3 Path | The path to one or more Java .jar files in an S3 bucket that should be loaded in the Endpoint. |
Extra Python Libs S3 Path | The paths to one or more Python libraries in an Amazon S3 bucket that should be loaded in the Endpoint. |
CreatedBy | User who created the glue endpoint. |
LastModifiedBy | User who has recently updated the glue endpoint. |
Endpoint Operations
Amorphic Endpoint provides below operations for a Glue Endpoint.
- Create Endpoint: Create an Endpoint in AWS Glue.
- View Endpoint: View an existing Endpoint.
- Edit Endpoint: Edit an existing Endpoint.
- Delete Endpoint: Delete an existing Endpoint.
Create Endpoint
In order to create a glue development endpoint in the platform, following information is required:
Endpoint Name: Name of the endpoint which uniquely identifies the functionality of the endpoint.
Description: Brief description of the endpoint.
Capacity: Relative measure of DPUs to allocate to this DevEndpoint.
Glue Python Version: Python version for Glue. Select either 2 or 3.
Auto Terminate: Whether to enable or disable auto termination on the endpoint. This option enables endpoint termination to save resource costs based on the termination time value provided by the user. Auto termination process will be triggered every hour and looks for any endpoints that needs to be notified or deleted and sends an email when one of the below criteria met.
User will receive a notification email when:
- the difference between the auto-terminate process trigger run (every whole hour) and the termination time is less than 30 minutes.
- the auto-termination process was successfully able to delete the endpoint after the termination time
- the auto-termination process wasn't able to delete the endpoint due to a dependent ETL Notebook or if any other error occurs.
Auto Termination Time: Denotes the time at which the user wants the endpoint to be auto terminated. The maximum auto termination time that a user can set will be less than 168 hours (7 days). Once the current time is greater than the termination time then the termination process will be deleting this endpoint at the next whole hour. User will also be able to modify the termination time by selecting "Edit Endpoint" in the details page and the maximum time that can be set must be less than 168 hours (7 days).
Note- Auto-termination process is scheduled to run every hour on the hour (6:00, 7:00, 8:00, 9:00).
- User will receive a email notification only when the user is subscribed to alerts. Please refer to Alert Preferences to enable alerts.
- When the termination time elapses, auto termination process will terminate/delete the endpoint and also deletes all the metadata related to the endpoint and this process cannot be undone.
Network Configuration: There are three types of network configurations i.e. Public, App-Public and App-Private.
- Public and App-Public endpoints have direct access to internet.
- App-Public deploys endpoints in public subnet of Amorphic application whereas Public endpoint is deployed in AWS Default VPC subnets.
- App-Private endpoints doesn't have direct access to internet. It is deployed in private subnet of Amorphic application VPC.
Extra Python Libs S3Path: User can share the path/paths to one or more Python libraries in an S3 bucket that should be loaded in Endpoint. Multiple paths can be specified separated by comma.
Extra Jars S3Path: User can share the path/paths to one or more Java Jars in an S3 bucket that should be loaded in Endpoint. Multiple paths can be specified separated by comma. Only pure java/scala libraries can be used.
Datasets Write Access: User can select datasets with the write access required for the endpoint
Datasets Read Access: User can select datasets with the read access required for the endpoint
Keywords: User can specify keywords required for the endpoint
Public Keys: The user can specify a list of public keys which are used by the Endpoints for authentication. This is an optional field.
You can generate the key using:
ssh-keygen -t rsa -C your_email@example.com
The format of the key generated in the file will be as following:
ssh-rsa <key> <email>
User can use <key> as a public key in the platform to create an endpoint.
View Endpoint
If the user has sufficient permissions to view an endpoint then all the endpoint information can be viewed by clicking on the Endpoint name in the "Endpoints" under ETL section.
All the information specified while creating the endpoint will be displayed in the details page. Along with these, a new Message field will be displayed based on the below scenarios:
- If the endpoint status is failed then user can view the failure information in the Message field.
- If the user doesn't have all the datasets access required for the endpoint, then the user cannot view the IPAddress of the endpoint and missing datasets access information will be displayed in the Message field.
Following details will be displayed in the endpoint details page:
Following details will be displayed when the user enables auto-termination on the endpoint. Remaining Time denotes the amount of time (rounded to nearest upper hour) left for auto-termination.
In below image, the auto termination time is set to 02 Jun, 2021 07:18 PM but the endpoint will be deleted at 02 Jun, 2021 08:00 PM because the termination process is scheduled to run at whole hour.
In the details page, Estimated Cost of the endpoint is also displayed to show approximate cost incurred since the creation time.
Edit Endpoint
Endpoint details can be edited using the Edit Endpoint button and changes will be reflected in the Details page immediately for few changes and for few changes it will get updated asynchronously in the backend. See below list of fields for more details:
Fields that get updated immediately:
- Description
- Auto Terminate and Auto Terminate Time
- Keywords
- Datasets Write/Read Access
Fields that are updated asynchronously:
- Glue Python Version
- Extra Python Libs S3Path
- Extra Jars S3Path
- Public Keys
When the asynchronous fields are edited then the status changes to update_in_progress. A page refresh after few minutes will update the status to ready state.
The Edit Endpoint page is divided into two sections:
Basic Info: User can use this section to update all the basic details of an endpoint.
Datasets: User can use this section to update datasets which requires access permissions.
Delete Endpoint
If the user has sufficient permissions to delete an endpoint then it can be deleted using the Delete (trash) button on the right side.
The endpoint must be in ready state in order to delete it.
Update Extra Resource Access
To provide parameter or shared libraries or dataset access to an endpoint in large number, use the documentation on How to provide large number of resources access to an ETL Entity in Amorphic