ETL libraries
Shared libraries are an extension of external job libraries. They are mainly used to maintain a central repository of organization-approved libraries/packages to be used across multiple Jobs.
Shared ETL Libraries have the following capabilities:
- They allow you to have multiple packages attached to a job, so you can easily switch between them to perform various actions based on the job requirements.
- They provide the ability to customize job dependencies to a granular level.
- They offer flexibility to choose among the different type of packages.
Currently based on the type of ETL Job, Amorphic supports "py", "egg" and "whl" extensions for python shell applications and "py", "zip", "jar" for pyspark applications.`
Take a look at Shared ETL Library Console in Amorphic:
What is a Shared ETL Library?
Shared ETL Library is a collection of packages/modules that provides a standardized solution for problems in everyday programming. Unlike the OS-provided python supporting the collection, the packages are explicitly designed by User/Organization/Open-Source Community. This encourages and enhances the portability of Python programs by abstracting away the platform-specific APIs into platform-neutral APIs.
The shared ETL Library has the following properties:
- A Library can have multiple packages attached to it.
- A Library can be attached to multiple Jobs.
Types of Amorphic ETL Libraries:
- External Libraries: Their scope is within the ETL job, and they get removed when you delete the ETL job.
- Shared Libraries: They possess a universal scope, allowing multiple jobs to utilize the same shared library upon user authentication, and persist in the central repository even after the ETL job has been deleted.
Amorphic Shared ETL Libraries contain the following information:
Type | Description |
---|---|
Library Name | Uniquely identifies the functionality of the library |
Library Description | A brief explanation of the library typically the contents/package inside it |
Packages | It is a file or a list of files that can be imported into an ETL Job to perform a specific set of operations. Example: matplotlib is A numerical plotting library used by any data scientist or any data analyzer for visualizations |
Jobs | The list of ETL jobs to which the library is attached |
CreatedBy | User who created the library. |
LastModifiedBy | User who has recently updated the library. |
LastModifiedTime | Timestamp when the library was recently updated. |
Shared ETL library Operations
Amorphic Shared ETL library provides all the basic CRUD (Create, Read, Update and Delete) operations for a library.
- Create Library: Create a custom library by choosing the package(s) of the user's choice
- View Library: View existing library Shared ETL Libraries Metadata Information
- Update Library: Update an existing library
- Delete Library: Delete an existing library
- Attach Library: Attach an existing library to a ETL Job
- View Dependent ETL Jobs: View the dependent ETL jobs on the current library
- Download Library Packages: Download a package from ETL library
Create Library
To create a new Library in Amorphic, go to the "Create New Library" section under the "ETL Libraries". The application allows libraries to have zero or more packages/jobs attached to it. After creating the Library you can view, update, & delete it. You can only do these operations if you have permission to access the libraries.
You can not delete a shared library if it is attached to the existing Job. So, when you try to delete such a library, you will be notified with the list of dependent ETL Jobs with a pop-up. Then, you should remove all the libraries used in Jobs and retry to delete the library.
The below gif shows how you can create a new library.
View Library
To view all the existing library information you must have sufficient permissions. Click the Library name under the "ETL Libraries" section inside the Job Menu to view the library.
Take a look at how you can view the library information in detail
Attach Library
You can attach a library from the job details page and attach a shared library to a job while creating or updating it. Amorphic provides a list of shared libraries along with other job parameters, which you can attach to the job. Once attached all the packages in the shared library are passed as arguments to the job automatically without any intervention.
Follow the below gif to attach a shared ETL library to an existing ETL Job.
Importing and using a library
If you have a library with a single version of your module or multiple different files added in this single library, then you can import the module and use it.
from amorphicutils.common import read_param_store
print(read_param_store("SYSTEM.S3BUCKET.DLZ", secure=False)['data'])
If you have a library with a multiple version of your module , then you should explicitly insert into the system path the versioned file and then import the module and use it. This ensures it allows picking up the specific version of the library and not a random one.
import sys
# explicitly specify the version you want to use
sys.path.insert(0, "amorphicutils-0.3.1.zip")
from amorphicutils.common import read_param_store
print(read_param_store("SYSTEM.S3BUCKET.DLZ", secure=False)['data'])