Get free YouTube views, likes and subscribers
Get Free YouTube Subscribers, Views and Likes

Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame |

Follow
GeekCoders

Input
data = [
(1, "Sagar", 23, "Male", 68.0),
(2, "Kim", 35, "Female", 90.2),
(3, "Alex", 40, "Male", 79.1),
]
schema = "Id int,Name string,Age int,Gender string,Marks float"
df = spark.createDataFrame(data, schema)


Solution:
from pyspark.sql.functions import col
set_of_dtypes=set(i[1] for i in df.dtypes)
for i in set_of_dtypes:
cols=[]
for j in df.dtypes:
if(i==j[1]):
cols.append(j[0])
df.select(cols).write.mode('overwrite').save(f'/FileStore/tables/output_capegmini/{i}')

Combo course package : https://www.geekcoders.co.in/courses/...

I have prepared many courses on Azure Data Engineering

1. Build Azure End to. End Project
https://www.geekcoders.co.in/courses/...

2. Build Delta Lake project
https://www.geekcoders.co.in/courses/...

3. Master in Azure Data Factory with ETL Project and PowerBi
https://www.geekcoders.co.in/courses/...

4. Master in Python
https://www.geekcoders.co.in/courses/...

Check out my courses on Azure Data Engineering
https://www.geekcoders.co.in/s/store/...

hastags
tags

#dataengineer #interviewquestions #spark
#hashtags #hastag #tags

posted by pohodiliav