Skip to main content
info
This documentation is for version v2.4 of the product.
For the latest version(v2.7) documentation click here
Version: v2.4 print this page

PySpark

This node runs any given PySpark code. The input dataframe is passed in the variable inDF. The output dataframe is passed back by registering it as a temporary table.

Input

The input dataframe is passed in the variable inDF.

Output

The output dataframe is passed back by registering it as a temporary table

Type

pyspark

Fields

NameTitleDescription
codePySparkPySpark code to be run. Input dataframe : "inDF", SparkContext : "sc", SQLContext : "sqlContext", Output/Result dataframe should be registered as a temporary table - df.registerTempTable("outDF")
schemaSchema
outputColNamesColumn Names for the CSVNew Output Columns of the SQL
outputColTypesColumn Types for the CSVData Type of the Output Columns
outputColFormatsColumn Formats for the CSVFormat of the Output Columns