Writing Class
Amorphic platform have concept of Landing Zone and Data Landing Zone, we can only write data onto Landing Zone. If we need to ingest data into Amorphic, we need to write data into proper location of landing zone. With using following class we can write to Amorphic.
Writing in python-shell
It writes pandas dataframe to amorphic landing zone.
class amorphicutils.python.write.Write(bucket_name, region=None)
Class to write data from Amorphic
__init__(bucket_name, region=None)
Initialize the Write object
- Parameters:bucket_name – bucket name
>>> writer = Write("lz_bucket")
write_bytes_data(data_bytes, domain_name, dataset_name, user, file_type, upload_date=None, full_reload=False, path=None, file_name=None)
Writes bytes data to datalake
- Parameters:
- data_bytes – data bytes to write
- domain_name – domain name for dataset
- dataset_name – dataset name
- user – username with write access to dataset
- file_type – file type of the dataset
- upload_date – Current timestamp from time.time(). If not supplied, then current timestamp is used
- full_reload – True if the table type is of reload type, Default: False
- path – (Optional)Path where data is stored. Implicit creation of path will be ignored
- file_name – Name of the file
- Returns:
write_csv_data(data, domain_name, dataset_name, user, header=True, file_type='csv', quote=True, delimiter=',', upload_date=None, full_reload=False, path=None, file_name=None, **kwargs)
Write data to lz bucket
- Parameters:
- data – pandas dataframe of data
- domain_name – domain name for dataset
- dataset_name – dataset name
- user – username with write access to dataset
- header – True if you want to save file with header. Default: True
- file_type – file type for dataset
- quote – True if you want to save you data with quoted character. Default: True
- delimiter – Delimiter to use to save to s3. Default: ,
- upload_date – Current timestamp from time.time(). If not supplied, then current timestamp is used
- full_reload – True if the table type is of reload type, Default: False
- path – (Optional)Path where data is stored. Implicit creation of path will be ignored
- file_name – Name of the file
- kwargs – Optional arguments available for pyspark read
- Returns:
>>> writer = Write("lz_bucket")
>>> response = writer.write_csv_data(pandas_df, "testdomain", "testdataset", user="userid", file_type="csv")
>>> print(response)
>>> {
"exitcode": 0,
"message": "This is success message"
}
Writing in pyspark
It writes spark dataframe to amorphic landing zone. The attributes of class depends on the Context passed to Write class.
If one want to use GlueDynamicFrameWriter to write then pass GlueContext to Write class and if required to use Spark the pass SparkContext to the write class.
class amorphicutils.pyspark.write.Write(bucket_name, spark, region=None, logger=None)
Class to write data from Amorphic
__init__(bucket_name, spark, region=None, logger=None)
Initialize the Write object
- Parameters:
- bucket_name – Bucket name
- spark – SparkContext
>>> writer = Write("lz_bucket", spark_object)