pyspark2teradataml - $file_name
'pyspark2teradataml' to convert the PySpark script to a Python script that runs on Vantage ClearScape Analytics.
Important Notes:
- Refer user guide and supportability matrix for ML functions.
- Some functions are not supported; however, they are supported with manual code changes. Click on the alert icon to view 'Examples' and additional details.
- ML Functions accept multiple columns for arguments. Hence, no need to pass vectors. Update the script to pass multiple columns.
- RDD API's are not applicable to Vantage. Use DataFrame API's instead.
- Columns are case-sensitive in teradatamlspk and case-insensitive in PySpark. Convert column names to the appropriate case while converting the code.
- DataFrame.rdd returns the same DataFrame as RDD is not applicable to Vantage. Use DataFrame API's and avoid using RDD API's.
- pyspark2teradataml does not modify SQL statements. Users are advised to manually update the SQL statements if they are not valid in Vantage.
- teradatamlspk timezone strings do not consider Daylight Saving Time (DST). Users are advised to use Teradata Vantage timezone strings for DST consideration.
- In teradatamlspk, DoubleType and FloatType are treated the same. If you are performing any operation on dtypes, make sure to handle this in script appropriately.
- DataFrame.sort(), DataFrame.orderBy():
- Does not propagate the changes to the next API.
- To get top N elements or bottom N elements, use ranking with window aggregates and filter it.
- Columns are not supported. Only column names are supported.
Instructions
Toggle between actual PySpark script and generated teradatamlspk script with drop down available on left pane.
Click the bell icon next to a line in the left pane to view detailed notes for the corresponding API.
Each icon has significance as:
- $not_supported: This API do not have functionality. Look at Teradata documentation and achieve the functionality in other way.
- $partially_supported: This API have functionality, but there may be some differences compared to PySpark. Notes specify what the exact difference is, so the user should adjust it manually.
- $notification: This is a notification to the user. No action required.
- $bug_report: Files in this category were not converted to corresponding teradatamlspk script either because it has syntax error(s) or utility unable to parse it.
- $empty_file: Files in this category were not converted to corresponding teradatamlspk script because they are empty.
- $success_file: Files in this category are converted successfully and doesn't require attention.
$total_summary
$consolidated_table
$overview_table