Skip to main content
Version: v2.3 print this page

ETL libraries

Shared libraries are the extension of external job libraries. It is mainly used to maintain a central repository of organization-approved libraries/packages to be used across multiple Jobs.

Shared ETL Libraries have the following capabilities:

  • It allows you to have multiple packages attached to a job, so you can easily switch between them to perform various actions based on the job requirements.
  • Provides you the ability to customize job dependencies to a granular level.
  • Offers flexibility to choose the type of package.
Note

Currently based on the type of ETL Job Amorphic supports "py", "egg" and "whl" extensions for python shell and "py", "zip", "jar" for pyspark applications.`

Take a look at Shared ETL Library Console in Amorphic:

Shared ETL Libraries Home Page

What is a Shared ETL Library?

Shared ETL Library is a collection of packages/modules that provides a standardized solution for problems in everyday programming. Unlike the OS-provided python supporting the collection, the packages are explicitly designed by User/Organization/Open-Source Community. This encourages and enhances the portability of Python programs by abstracting away the platform-specific APIs into platform-neutral APIs.

The shared ETL Library has the following properties:

  • A Library can have multiple packages attached to it.
  • A Library can be attached to multiple Jobs.

Types of Amorphic ETL Libraries:

  • External Libraries: They have scope within the ETL job and are deleted once you delete the ETL job.
  • Shared Libraries: They have a global scope where multiple jobs can use the same shared library upon user authentication and remain in the central repository even after the ETL job is deleted.

Amorphic Shared ETL Libraries contain the following information:

TypeDescription
Library NameUniquely identifies the functionality of the library
Library DescriptionA brief explanation of the library typically the contents/package inside it
PackagesIt is a file or a list of files that can be imported into an ETL Job to perform a specific set of operations. Example: matplotlib is A numerical plotting library used by any data scientist or any data analyzer for visualizations
JobsThe list of ETL jobs to which the library is attached
CreatedByUser who created the library.
LastModifiedByUser who has recently updated the library.
LastModifiedTimeTimestamp when the library was recently updated.

Shared ETL library Operations

Amorphic Shared ETL library provides all the basic CRUD (Create, Read, Update and Delete) operations for a library.

Create Library

To create a new Library in Amorphic, go to the “Create New Library” section under the “ETL Libraries”. The applications allow libraries to have zero or more packages/jobs attached to it. After creating the Library you can view, update, & delete it. You can only do these operations if you have permission to access the libraries.

note

You can not delete a shared library if it is attached to the existing Job. So, when you try to delete a library you will be notified with the list of dependent ETL Jobs with a pop-up. Here, you should remove all the libraries used in Jobs and retry to delete the library.

The below gif shows how you can create a new library.

Create ETL Library

View Library

To view all the existing library information you must have sufficient permissions. Click the Library name under the “ETL Libraries” section inside the Job Menu to view the library.

Take a look at how you can view the library information in detail

View library

Attach Library

You can enable attach library from the job details page and attach a shared library to a job while creating or updating it. Amorphic provides a list of shared libraries along with other job parameters, which you can attach to the job. Once attached all the packages in the shared library are passed as arguments to the job automatically without any intervention.

Follow the below gif to attach a shared ETL library to an existing ETL Job.

Attach library