Writing Class

Amorphic platform have concept of Landing Zone and Data Landing Zone, we can only write data onto Landing Zone. If we need to ingest data into Amorphic, we need to write data into proper location of landing zone. With using following class we can write to Amorphic.

Writing in python-shell

It writes pandas dataframe to amorphic landing zone.

class amorphicutils.python.write.Write(bucket_name, region=None)

Class to write data from Amorphic

init(bucket_name, region=None)

Initialize the Write object

Parameters:bucket_name – bucket name

>>> writer = Write("lz_bucket")

write_bytes_data(data_bytes, domain_name, dataset_name, user, file_type, upload_date=None, full_reload=False, path=None, file_name=None)

Writes bytes data to datalake

Parameters:
- data_bytes – data bytes to write
- domain_name – domain name for dataset
- dataset_name – dataset name
- user – username with write access to dataset
- file_type – file type of the dataset
- upload_date – Current timestamp from time.time(). If not supplied, then current timestamp is used
- full_reload – True if the table type is of reload type, Default: False
- path – (Optional)Path where data is stored. Implicit creation of path will be ignored
- file_name – Name of the file
Returns:

write_csv_data(data, domain_name, dataset_name, user, header=True, file_type='csv', quote=True, delimiter=',', upload_date=None, full_reload=False, path=None, file_name=None, **kwargs)

Write data to lz bucket

Parameters:
- data – pandas dataframe of data
- domain_name – domain name for dataset
- dataset_name – dataset name
- user – username with write access to dataset
- header – True if you want to save file with header. Default: True
- file_type – file type for dataset
- quote – True if you want to save you data with quoted character. Default: True
- delimiter – Delimiter to use to save to s3. Default: ,
- upload_date – Current timestamp from time.time(). If not supplied, then current timestamp is used
- full_reload – True if the table type is of reload type, Default: False
- path – (Optional)Path where data is stored. Implicit creation of path will be ignored
- file_name – Name of the file
- kwargs – Optional arguments available for pyspark read
Returns:

>>> writer = Write("lz_bucket")
>>> response = writer.write_csv_data(pandas_df, "testdomain", "testdataset", user="userid", file_type="csv")
>>> print(response)
>>> {
  "exitcode": 0,
  "message": "This is success message"
  }

Writing in pyspark

It writes spark dataframe to amorphic landing zone. The attributes of class depends on the Context passed to Write class.

If one want to use GlueDynamicFrameWriter to write then pass GlueContext to Write class and if required to use Spark the pass SparkContext to the write class.

class amorphicutils.pyspark.write.Write(bucket_name, spark, region=None, logger=None)

Class to write data from Amorphic

init(bucket_name, spark, region=None, logger=None)

Initialize the Write object

Parameters:
- bucket_name – Bucket name
- spark – SparkContext

>>> writer = Write("lz_bucket", spark_object)

Writing in python-shell​

class amorphicutils.python.write.Write(bucket_name, region=None)​

__init__(bucket_name, region=None)​

write_bytes_data(data_bytes, domain_name, dataset_name, user, file_type, upload_date=None, full_reload=False, path=None, file_name=None)​

write_csv_data(data, domain_name, dataset_name, user, header=True, file_type='csv', quote=True, delimiter=',', upload_date=None, full_reload=False, path=None, file_name=None, **kwargs)​

Writing in pyspark​

class amorphicutils.pyspark.write.Write(bucket_name, spark, region=None, logger=None)​

__init__(bucket_name, spark, region=None, logger=None)​