Skip to main content
Version: v2.5 print this page

ETL libraries

Shared libraries are an extension of external job libraries. They are mainly used to maintain a central repository of organization-approved libraries/packages to be used across multiple Jobs.

Shared ETL Libraries have the following capabilities:

  • They allow you to have multiple packages attached to a job, so you can easily switch between them to perform various actions based on the job requirements.
  • They provide the ability to customize job dependencies to a granular level.
  • They offer flexibility to choose among the different type of packages.
Note

Currently based on the type of ETL Job, Amorphic supports "py", "egg" and "whl" extensions for python shell applications and "py", "zip", "jar" for pyspark applications.`

Take a look at Shared ETL Library Console in Amorphic:

Shared ETL Libraries Home Page

What is a Shared ETL Library?

Shared ETL Library is a collection of packages/modules that provides a standardized solution for problems in everyday programming. Unlike the OS-provided python supporting the collection, the packages are explicitly designed by User/Organization/Open-Source Community. This encourages and enhances the portability of Python programs by abstracting away the platform-specific APIs into platform-neutral APIs.

The shared ETL Library has the following properties:

  • A Library can have multiple packages attached to it.
  • A Library can be attached to multiple Jobs.

Types of Amorphic ETL Libraries:

  • External Libraries: Their scope is within the ETL job, and they get removed when you delete the ETL job.
  • Shared Libraries: They possess a universal scope, allowing multiple jobs to utilize the same shared library upon user authentication, and persist in the central repository even after the ETL job has been deleted.

Amorphic Shared ETL Libraries contain the following information:

TypeDescription
Library NameUniquely identifies the functionality of the library
Library DescriptionA brief explanation of the library typically the contents/package inside it
PackagesIt is a file or a list of files that can be imported into an ETL Job to perform a specific set of operations. Example: matplotlib is A numerical plotting library used by any data scientist or any data analyzer for visualizations
JobsThe list of ETL jobs to which the library is attached
CreatedByUser who created the library.
LastModifiedByUser who has recently updated the library.
LastModifiedTimeTimestamp when the library was recently updated.

Shared ETL library Operations

Amorphic Shared ETL library provides all the basic CRUD (Create, Read, Update and Delete) operations for a library.

Create Library

To create a new Library in Amorphic, go to the "Create New Library" section under the "ETL Libraries". The application allows libraries to have zero or more packages/jobs attached to it. After creating the Library you can view, update, & delete it. You can only do these operations if you have permission to access the libraries.

note

You can not delete a shared library if it is attached to the existing Job. So, when you try to delete such a library, you will be notified with the list of dependent ETL Jobs with a pop-up. Then, you should remove all the libraries used in Jobs and retry to delete the library.

The below gif shows how you can create a new library.

Create ETL Library

View Library

To view all the existing library information you must have sufficient permissions. Click the Library name under the "ETL Libraries" section inside the Job Menu to view the library.

Take a look at how you can view the library information in detail

View library

Attach Library

You can attach a library from the job details page and attach a shared library to a job while creating or updating it. Amorphic provides a list of shared libraries along with other job parameters, which you can attach to the job. Once attached all the packages in the shared library are passed as arguments to the job automatically without any intervention.

Follow the below gif to attach a shared ETL library to an existing ETL Job.

Attach library

Importing and using a library

If you have a library with a single version of your module or multiple different files added in this single library, then you can import the module and use it.

Python
from amorphicutils.common import read_param_store
print(read_param_store("SYSTEM.S3BUCKET.DLZ", secure=False)['data'])

If you have a library with a multiple version of your module , then you should explicitly insert into the system path the versioned file and then import the module and use it. This ensures it allows picking up the specific version of the library and not a random one.

Python
import sys
# explicitly specify the version you want to use
sys.path.insert(0, "amorphicutils-0.3.1.zip")
from amorphicutils.common import read_param_store
print(read_param_store("SYSTEM.S3BUCKET.DLZ", secure=False)['data'])