Version: v2.4 print this page

PySpark

This node runs any given PySpark code. The input dataframe is passed in the variable inDF. The output dataframe is passed back by registering it as a temporary table.

Input

The input dataframe is passed in the variable inDF.

Output

The output dataframe is passed back by registering it as a temporary table

Type

pyspark

Fields

Name	Title	Description
code	PySpark	PySpark code to be run. Input dataframe : "inDF", SparkContext : "sc", SQLContext : "sqlContext", Output/Result dataframe should be registered as a temporary table - df.registerTempTable("outDF")
schema	Schema
outputColNames	Column Names for the CSV	New Output Columns of the SQL
outputColTypes	Column Types for the CSV	Data Type of the Output Columns
outputColFormats	Column Formats for the CSV	Format of the Output Columns

Input​

Output​

Type​

Fields​

Input

Output

Type

Fields