Black Friday Sale - Special 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70dumps

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions and Answers

Question # 6

Which of the following code blocks displays various aggregated statistics of all columns in DataFrame transactionsDf, including the standard deviation and minimum of values in each column?

A.

transactionsDf.summary()

B.

transactionsDf.agg("count", "mean", "stddev", "25%", "50%", "75%", "min")

C.

transactionsDf.summary("count", "mean", "stddev", "25%", "50%", "75%", "max").show()

D.

transactionsDf.agg("count", "mean", "stddev", "25%", "50%", "75%", "min").show()

E.

transactionsDf.summary().show()

Full Access
Question # 7

Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?

A.

itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)

B.

1.itemsDf.withColumnRenamed("attributes", "feature0")

2.itemsDf.withColumnRenamed("supplier", "feature1")

C.

itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D.

itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E.

itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Full Access
Question # 8

Which of the following is a problem with using accumulators?

A.

Only unnamed accumulators can be inspected in the Spark UI.

B.

Only numeric values can be used in accumulators.

C.

Accumulator values can only be read by the driver, but not by executors.

D.

Accumulators do not obey lazy evaluation.

E.

Accumulators are difficult to use for debugging because they will only be updated once, independent if a task has to be re-run due to hardware failure.

Full Access
Question # 9

The code block displayed below contains an error. The code block should save DataFrame transactionsDf at path path as a parquet file, appending to any existing parquet file. Find the error.

Code block:

A.

transactionsDf.format("parquet").option("mode", "append").save(path)

B.

The code block is missing a reference to the DataFrameWriter.

C.

save() is evaluated lazily and needs to be followed by an action.

D.

The mode option should be omitted so that the command uses the default mode.

E.

The code block is missing a bucketBy command that takes care of partitions.

F.

Given that the DataFrame should be saved as parquet file, path is being passed to the wrong method.

Full Access
Question # 10

The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before

2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.

Schema:

1.root

2. |-- itemId: integer (nullable = true)

3. |-- attributes: array (nullable = true)

4. | |-- element: string (containsNull = true)

5. |-- supplier: string (nullable = true)

Code block:

1.schema = StructType([

2. StructType("itemId", IntegerType(), True),

3. StructType("attributes", ArrayType(StringType(), True), True),

4. StructType("supplier", StringType(), True)

5.])

6.

7.spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)

A.

The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.

B.

Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.

C.

The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

D.

Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

E.

Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

Full Access
Question # 11

Which of the following code blocks returns a single row from DataFrame transactionsDf?

Full DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

A.

transactionsDf.where(col("storeId").between(3,25))

B.

transactionsDf.filter((col("storeId")!=25) | (col("productId")==2))

C.

transactionsDf.filter(col("storeId")==25).select("predError","storeId").distinct()

D.

transactionsDf.select("productId", "storeId").where("storeId == 2 OR storeId != 25")

E.

transactionsDf.where(col("value").isNull()).select("productId", "storeId").distinct()

Full Access
Question # 12

The code block shown below should return a single-column DataFrame with a column named consonant_ct that, for each row, shows the number of consonants in column itemName of DataFrame

itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

DataFrame itemsDf:

1.+------+----------------------------------+-----------------------------+-------------------+

2.|itemId|itemName |attributes |supplier |

3.+------+----------------------------------+-----------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |

6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|

7.+------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.select(__1__(__2__(__3__(__4__), "a|e|i|o|u|\s", "")).__5__("consonant_ct"))

A.

1. length

2. regexp_extract

3. upper

4. col("itemName")

5. as

B.

1. size

2. regexp_replace

3. lower

4. "itemName"

5. alias

C.

1. lower

2. regexp_replace

3. length

4. "itemName"

5. alias

D.

1. length

2. regexp_replace

3. lower

4. col("itemName")

5. alias

E.

1. size

2. regexp_extract

3. lower

4. col("itemName")

5. alias

Full Access
Question # 13

Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?

A.

itemsDf.withColumn(["supplier", "manufacturer"])

B.

itemsDf.withColumn("supplier").alias("manufacturer")

C.

itemsDf.withColumnRenamed("supplier", "manufacturer")

D.

itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))

E.

itemsDf.withColumnsRenamed("supplier", "manufacturer")

Full Access
Question # 14

Which of the following describes Spark actions?

A.

Writing data to disk is the primary purpose of actions.

B.

Actions are Spark's way of exchanging data between executors.

C.

The driver receives data upon request by actions.

D.

Stage boundaries are commonly established by actions.

E.

Actions are Spark's way of modifying RDDs.

Full Access
Question # 15

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

A.

1. select

2. "storeId"

3. print_schema()

B.

1. limit

2. 1

3. columns

C.

1. select

2. "storeId"

3. printSchema()

D.

1. limit

2. "storeId"

3. printSchema()

E.

1. select

2. storeId

3. dtypes

Full Access
Question # 16

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

A.

spark.mode("parquet").read("/FileStore/imports.parquet")

B.

spark.read.path("/FileStore/imports.parquet", source="parquet")

C.

spark.read().parquet("/FileStore/imports.parquet")

D.

spark.read.parquet("/FileStore/imports.parquet")

E.

spark.read().format('parquet').open("/FileStore/imports.parquet")

Full Access
Question # 17

Which of the following is the deepest level in Spark's execution hierarchy?

A.

Job

B.

Task

C.

Executor

D.

Slot

E.

Stage

Full Access
Question # 18

Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?

A.

spark.read.schema(fileSchema).format("parquet").load(filePath)

B.

spark.read.schema("fileSchema").format("parquet").load(filePath)

C.

spark.read().schema(fileSchema).parquet(filePath)

D.

spark.read().schema(fileSchema).format(parquet).load(filePath)

E.

spark.read.schema(fileSchema).open(filePath)

Full Access
Question # 19

Which of the following code blocks adds a column predErrorSqrt to DataFrame transactionsDf that is the square root of column predError?

A.

transactionsDf.withColumn("predErrorSqrt", sqrt(predError))

B.

transactionsDf.select(sqrt(predError))

C.

transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())

D.

transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))

E.

transactionsDf.select(sqrt("predError"))

Full Access
Question # 20

The code block displayed below contains an error. The code block should write DataFrame transactionsDf as a parquet file to location filePath after partitioning it on column storeId. Find the error.

Code block:

transactionsDf.write.partitionOn("storeId").parquet(filePath)

A.

The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.

B.

The partitionOn method should be called before the write method.

C.

The operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.

D.

Column storeId should be wrapped in a col() operator.

E.

No method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.

Full Access
Question # 21

The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

spark.sql.shuffle.partitions

__1__.__2__.__3__(__4__, 100)

A.

1. spark

2. conf

3. set

4. "spark.sql.shuffle.partitions"

B.

1. pyspark

2. config

3. set

4. spark.shuffle.partitions

C.

1. spark

2. conf

3. get

4. "spark.sql.shuffle.partitions"

D.

1. pyspark

2. config

3. set

4. "spark.sql.shuffle.partitions"

E.

1. spark

2. conf

3. set

4. "spark.sql.aggregate.partitions"

Full Access
Question # 22

Which of the following code blocks reads the parquet file stored at filePath into DataFrame itemsDf, using a valid schema for the sample of itemsDf shown below?

Sample of itemsDf:

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

A.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", StringType()),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

B.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType),

3. StructField("attributes", ArrayType(StringType)),

4. StructField("supplier", StringType)])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

C.

1.itemsDf = spark.read.schema('itemId integer, attributes , supplier string').parquet(filePath)

D.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", ArrayType(StringType())),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

E.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", ArrayType([StringType()])),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read(schema=itemsDfSchema).parquet(filePath)

Full Access
Question # 23

Which of the following code blocks generally causes a great amount of network traffic?

A.

DataFrame.select()

B.

DataFrame.coalesce()

C.

DataFrame.collect()

D.

DataFrame.rdd.map()

E.

DataFrame.count()

Full Access
Question # 24

Which of the following code blocks returns a DataFrame that matches the multi-column DataFrame itemsDf, except that integer column itemId has been converted into a string column?

A.

itemsDf.withColumn("itemId", convert("itemId", "string"))

B.

itemsDf.withColumn("itemId", col("itemId").cast("string"))

(Correct)

C.

itemsDf.select(cast("itemId", "string"))

D.

itemsDf.withColumn("itemId", col("itemId").convert("string"))

E.

spark.cast(itemsDf, "itemId", "string")

Full Access
Question # 25

The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)

A.

1. save

2. mode

3. "ignore"

4. "compression"

5. path

B.

1. store

2. with

3. "replacement"

4. "compression"

5. path

C.

1. write

2. mode

3. "overwrite"

4. "compression"

5. save

(Correct)

D.

1. save

2. mode

3. "replace"

4. "compression"

5. path

E.

1. write

2. mode

3. "overwrite"

4. compression

5. parquet

Full Access
Question # 26

Which of the following code blocks creates a new one-column, two-row DataFrame dfDates with column date of type timestamp?

A.

1.dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"])

2.dfDates = dfDates.withColumn("date", to_timestamp("dd/MM/yyyy HH:mm:ss", "date"))

B.

1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])

2.dfDates = dfDates.withColumnRenamed("date", to_timestamp("date", "yyyy-MM-dd HH:mm:ss"))

C.

1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])

2.dfDates = dfDates.withColumn("date", to_timestamp("date", "dd/MM/yyyy HH:mm:ss"))

D.

1.dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"])

2.dfDates = dfDates.withColumnRenamed("date", to_datetime("date", "yyyy-MM-dd HH:mm:ss"))

E.

1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])

Full Access