ETL Libraries
Shared ETL Libraries are an extension of external job libraries. The primary purpose of shared ETL libraries is to maintain a central repository of organization approved libraries/packages to be used across multiple ETL Jobs.
Amorphic Shared ETL Libraries provides the following capabilities:
- User can have multiple packages attached to a job and has the ability to switch between them to perform various actions based on his/her job requirements.
- Customize job dependencies to a granular level.
- Flexibility to choose the type of packages. Currently based on the type of ETL Job Amorphic supports "py", "egg" and "whl" extensions for python shell and "py", "zip", "jar" for pyspark applications
The following picture depicts the Shared ETL library Console in Amorphic:
What is a Shared ETL Library?
Shared ETL Library is defined as a collection of package/modules which provide standardized solutions for many problems that occur in everyday programming. These packages are unlike the OS provided python supporting collection but are explicitly designed by a User/Organization or the open source community to encourage and enhance the portability of Python programs by abstracting away platform-specifics into platform-neutral APIs.
Library has the following properties:
- A Library can have multiple packages attached to it.
- A Library can be attached to multiple ETL Jobs.
In Amorphic we have two types of ETL Libraries:
- External Libraries : The scope of the library is within an ETL job and are deleted once we delete the ETL job
- Shared Libraries : These Libraries have a global scope where multiple jobs can use the same shared library upon user Authentication and remain in the central repository even after ETL job deletion.
Amorphic Shared ETL Libraries contains the following information:
Type | Description |
---|---|
Library Name | Library Name, which uniquely identifies the functionality of the library. |
Library Description | A brief explanation of the library typically the contents/package inside it. |
Packages | A package is a file or a list of files that can be imported into a ETL Job to perform a specific set of operations Ex: matplotlib: A numerical plotting library which is used by any data scientist or any data analyzer for visualizations |
Jobs | The list of etl jobs to which the library is attached to. |
CreatedBy | User who created the library. |
LastModifiedBy | User who has recently updated the library. |
LastModifiedTime | Timestamp when the library was recently updated. |
Shared ETL library Operations
Amorphic Shared ETL library provides all the basic CRUD (Create, Read, Update and Delete) operations for a library.
- Create Library: Create a custom library by choosing package(s) of user's choice.
- View Library: View existing library Shared ETL Libraries Metadata Information
- Update Library: Update an existing library.
- Delete Library: Delete an existing library.
- Attach Library: Attach an existing library to a ETL Job.
- View Dependent ETL Jobs: View the dependent ETL jobs on the current library.
- Download Library Packages: Download a package from ETL library.
Create Library
You can create new Library in Amorphic by using the “Create New Library” section under “ETL Libraries” of Amorphic application.
In order to create a new Library, you would require information like name and description to the library. The applications allows libraries to have zero or more packages/jobs attached to it. Please follow the animation to create a new library.
View Library
If the user has sufficient permissions to view a library, He/She can view all the existing library information by clicking on the Library name under the “ETL Libraries” section inside Job Menu.
Please follow the below animation to view the library information in detail
Update Library
If the user has sufficient permissions to update a library, He/She can view all the existing library information by clicking on the Library Name under the “ETL Libraries” section inside Job Menu and by clicking on the Edit Library icon from the top right side Actions menu. This will re-direct you to a different page where you can start editing any of the Library metadata.
Please follow the below animation to update the library information in detail
Delete Library
If the user has sufficient permissions to delete a library and can be done by clicking on the Delete Library icon from the top right side Actions menu.
Please follow the below animation to delete the library.
Please note user will not be able to delete a shared library if it is attached to any of the existing ETL Jobs. A pop-up notification will be displayed with the list of ETL jobs that are dependent on the library. User should remove all the library usage in ETL jobs and re-try to delete the library.
Please follow the below animation to see the dependent resources of a shared library
Attach Library
Attach Library functionality is enabled for users from the job details page. There are two ways how a user can attach a shared library to a job i.e while creating or updating.
When creating/updating an ETL Job Amorphic provides a drop down menu Shared libraries along with other job parameters. User will be presented with a set of shared libraries which he/she has access to and can multi-select libraries from the drop down which needs to be attached to the job. Once attached all the packages in the shared library are passed as arguments to the ETL job automatically without any user intervention.
Please follow the below animation to attach a shared ETL library to an existing ETL Job.
Dependent ETL Jobs
User can view the dependent ETL jobs under the Resources tab in Library Details page.
Download Library Packages
User can download the packages within a library by clicking on the Download Library Packages button which is displayed on the upper right corner of the Library Details page. Upon clicking it then the user can choose which package to download from the library.
Please follow the below animation to download a package from a ETL library.