Winter Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: geek65

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Questions 4

The code block displayed below contains an error. The code block should configure Spark to split data in 20 parts when exchanging data between executors for joins or aggregations. Find the error.

Code block:

spark.conf.set(spark.sql.shuffle.partitions, 20)

Options:

A.

The code block uses the wrong command for setting an option.

B.

The code block sets the wrong option.

C.

The code block expresses the option incorrectly.

D.

The code block sets the incorrect number of parts.

E.

The code block is missing a parameter.

Buy Now
Questions 5

Which of the following describes the conversion of a computational query into an execution plan in Spark?

Options:

A.

Spark uses the catalog to resolve the optimized logical plan.

B.

The catalog assigns specific resources to the optimized memory plan.

C.

The executed physical plan depends on a cost optimization from a previous stage.

D.

Depending on whether DataFrame API or SQL API are used, the physical plan may differ.

E.

The catalog assigns specific resources to the physical plan.

Buy Now
Questions 6

Which of the following statements about executors is correct?

Options:

A.

Executors are launched by the driver.

B.

Executors stop upon application completion by default.

C.

Each node hosts a single executor.

D.

Executors store data in memory only.

E.

An executor can serve multiple applications.

Buy Now
Questions 7

The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate

row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes

contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:

itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))

Options:

A.

1. filter

2. array_contains("cozy")

3. select

4. "itemId"

5. explode

6. "attributes"

B.

1. where

2. "array_contains(attributes, 'cozy')"

3. select

4. itemId

5. explode

6. attributes

C.

1. filter

2. "array_contains(attributes, 'cozy')"

3. select

4. "itemId"

5. map

6. "attributes"

D.

1. filter

2. "array_contains(attributes, cozy)"

3. select

4. "itemId"

5. explode

6. "attributes"

E.

1. filter

2. "array_contains(attributes, 'cozy')"

3. select

4. "itemId"

5. explode

6. "attributes"

Buy Now
Questions 8

Which of the following statements about the differences between actions and transformations is correct?

Options:

A.

Actions are evaluated lazily, while transformations are not evaluated lazily.

B.

Actions generate RDDs, while transformations do not.

C.

Actions do not send results to the driver, while transformations do.

D.

Actions can be queued for delayed execution, while transformations can only be processed immediately.

E.

Actions can trigger Adaptive Query Execution, while transformation cannot.

Buy Now
Questions 9

Which of the following code blocks reads JSON file imports.json into a DataFrame?

Options:

A.

spark.read().mode("json").path("/FileStore/imports.json")

B.

spark.read.format("json").path("/FileStore/imports.json")

C.

spark.read("json", "/FileStore/imports.json")

D.

spark.read.json("/FileStore/imports.json")

E.

spark.read().json("/FileStore/imports.json")

Buy Now
Questions 10

Which of the following describes the difference between client and cluster execution modes?

Options:

A.

In cluster mode, the driver runs on the worker nodes, while the client mode runs the driver on the client machine.

B.

In cluster mode, the driver runs on the edge node, while the client mode runs the driver in a worker node.

C.

In cluster mode, each node will launch its own executor, while in client mode, executors will exclusively run on the client machine.

D.

In client mode, the cluster manager runs on the same host as the driver, while in cluster mode, the cluster manager runs on a separate node.

E.

In cluster mode, the driver runs on the master node, while in client mode, the driver runs on a virtual machine in the cloud.

Buy Now
Questions 11

Which of the following statements about broadcast variables is correct?

Options:

A.

Broadcast variables are serialized with every single task.

B.

Broadcast variables are commonly used for tables that do not fit into memory.

C.

Broadcast variables are immutable.

D.

Broadcast variables are occasionally dynamically updated on a per-task basis.

E.

Broadcast variables are local to the worker node and not shared across the cluster.

Buy Now
Questions 12

Which of the following code blocks returns a DataFrame where columns predError and productId are removed from DataFrame transactionsDf?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId|f |

3.+-------------+---------+-----+-------+---------+----+

4.|1 |3 |4 |25 |1 |null|

5.|2 |6 |7 |2 |2 |null|

6.|3 |3 |null |25 |3 |null|

7.+-------------+---------+-----+-------+---------+----+

Options:

A.

transactionsDf.withColumnRemoved("predError", "productId")

B.

transactionsDf.drop(["predError", "productId", "associateId"])

C.

transactionsDf.drop("predError", "productId", "associateId")

D.

transactionsDf.dropColumns("predError", "productId", "associateId")

E.

transactionsDf.drop(col("predError", "productId"))

Buy Now
Questions 13

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

Options:

A.

1. select

2. "storeId"

3. print_schema()

B.

1. limit

2. 1

3. columns

C.

1. select

2. "storeId"

3. printSchema()

D.

1. limit

2. "storeId"

3. printSchema()

E.

1. select

2. storeId

3. dtypes

Buy Now
Questions 14

The code block displayed below contains an error. The code block should combine data from DataFrames itemsDf and transactionsDf, showing all rows of DataFrame itemsDf that have a matching

value in column itemId with a value in column transactionsId of DataFrame transactionsDf. Find the error.

Code block:

itemsDf.join(itemsDf.itemId==transactionsDf.transactionId)

Options:

A.

The join statement is incomplete.

B.

The union method should be used instead of join.

C.

The join method is inappropriate.

D.

The merge method should be used instead of join.

E.

The join expression is malformed.

Buy Now
Questions 15

The code block shown below should return a two-column DataFrame with columns transactionId and supplier, with combined information from DataFrames itemsDf and transactionsDf. The code

block should merge rows in which column productId of DataFrame transactionsDf matches the value of column itemId in DataFrame itemsDf, but only where column storeId of DataFrame

transactionsDf does not match column itemId of DataFrame itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

transactionsDf.__1__(itemsDf, __2__).__3__(__4__)

Options:

A.

1. join

2. transactionsDf.productId==itemsDf.itemId, how="inner"

3. select

4. "transactionId", "supplier"

B.

1. select

2. "transactionId", "supplier"

3. join

4. [transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId]

C.

1. join

2. [transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId]

3. select

4. "transactionId", "supplier"

D.

1. filter

2. "transactionId", "supplier"

3. join

4. "transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId"

E.

1. join

2. transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId

3. filter

4. "transactionId", "supplier"

Buy Now
Questions 16

Which of the following code blocks immediately removes the previously cached DataFrame transactionsDf from memory and disk?

Options:

A.

array_remove(transactionsDf, "*")

B.

transactionsDf.unpersist()

(Correct)

C.

del transactionsDf

D.

transactionsDf.clearCache()

E.

transactionsDf.persist()

Buy Now
Questions 17

Which of the following code blocks returns a new DataFrame with only columns predError and values of every second row of DataFrame transactionsDf?

Entire DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

Options:

A.

transactionsDf.filter(col("transactionId").isin([3,4,6])).select([predError, value])

B.

transactionsDf.select(col("transactionId").isin([3,4,6]), "predError", "value")

C.

transactionsDf.filter("transactionId" % 2 == 0).select("predError", "value")

D.

transactionsDf.filter(col("transactionId") % 2 == 0).select("predError", "value")

(Correct)

E.

1.transactionsDf.createOrReplaceTempView("transactionsDf")

2.spark.sql("FROM transactionsDf SELECT predError, value WHERE transactionId % 2 = 2")

F.

transactionsDf.filter(col(transactionId).isin([3,4,6]))

Buy Now
Questions 18

Which of the following code blocks returns a copy of DataFrame transactionsDf that only includes columns transactionId, storeId, productId and f?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

A.

transactionsDf.drop(col("value"), col("predError"))

B.

transactionsDf.drop("predError", "value")

C.

transactionsDf.drop(value, predError)

D.

transactionsDf.drop(["predError", "value"])

E.

transactionsDf.drop([col("predError"), col("value")])

Buy Now
Questions 19

The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose

the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

__1__(__2__.__3__.csv(filePath, __4__).__5__)

Options:

A.

1. size

2. spark

3. read()

4. escape='#'

5. columns

B.

1. DataFrame

2. spark

3. read()

4. escape='#'

5. shape[0]

C.

1. len

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

D.

1. size

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

E.

1. len

2. spark

3. read

4. comment='#'

5. columns

Buy Now
Questions 20

Which of the following code blocks returns a copy of DataFrame transactionsDf where the column storeId has been converted to string type?

Options:

A.

transactionsDf.withColumn("storeId", convert("storeId", "string"))

B.

transactionsDf.withColumn("storeId", col("storeId", "string"))

C.

transactionsDf.withColumn("storeId", col("storeId").convert("string"))

D.

transactionsDf.withColumn("storeId", col("storeId").cast("string"))

E.

transactionsDf.withColumn("storeId", convert("storeId").as("string"))

Buy Now
Questions 21

Which of the following statements about Spark's DataFrames is incorrect?

Options:

A.

Spark's DataFrames are immutable.

B.

Spark's DataFrames are equal to Python's DataFrames.

C.

Data in DataFrames is organized into named columns.

D.

RDDs are at the core of DataFrames.

E.

The data in DataFrames may be split into multiple chunks.

Buy Now
Questions 22

The code block displayed below contains an error. The code block should create DataFrame itemsAttributesDf which has columns itemId and attribute and lists every attribute from the attributes column in DataFrame itemsDf next to the itemId of the respective row in itemsDf. Find the error.

A sample of DataFrame itemsDf is below.

Code block:

itemsAttributesDf = itemsDf.explode("attributes").alias("attribute").select("attribute", "itemId")

Options:

A.

Since itemId is the index, it does not need to be an argument to the select() method.

B.

The alias() method needs to be called after the select() method.

C.

The explode() method expects a Column object rather than a string.

D.

explode() is not a method of DataFrame. explode() should be used inside the select() method instead.

E.

The split() method should be used inside the select() method instead of the explode() method.

Buy Now
Questions 23

Which of the following describes characteristics of the Spark driver?

Options:

A.

The Spark driver requests the transformation of operations into DAG computations from the worker nodes.

B.

If set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.

C.

The Spark driver processes partitions in an optimized, distributed fashion.

D.

In a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.

E.

The Spark driver's responsibility includes scheduling queries for execution on worker nodes.

Buy Now
Questions 24

Which of the following is a problem with using accumulators?

Options:

A.

Only unnamed accumulators can be inspected in the Spark UI.

B.

Only numeric values can be used in accumulators.

C.

Accumulator values can only be read by the driver, but not by executors.

D.

Accumulators do not obey lazy evaluation.

E.

Accumulators are difficult to use for debugging because they will only be updated once, independent if a task has to be re-run due to hardware failure.

Buy Now
Questions 25

The code block displayed below contains an error. The code block should arrange the rows of DataFrame transactionsDf using information from two columns in an ordered fashion, arranging first by

column value, showing smaller numbers at the top and greater numbers at the bottom, and then by column predError, for which all values should be arranged in the inverse way of the order of items

in column value. Find the error.

Code block:

transactionsDf.orderBy('value', asc_nulls_first(col('predError')))

Options:

A.

Two orderBy statements with calls to the individual columns should be chained, instead of having both columns in one orderBy statement.

B.

Column value should be wrapped by the col() operator.

C.

Column predError should be sorted in a descending way, putting nulls last.

D.

Column predError should be sorted by desc_nulls_first() instead.

E.

Instead of orderBy, sort should be used.

Buy Now
Questions 26

Which of the following code blocks returns a DataFrame that has all columns of DataFrame transactionsDf and an additional column predErrorSquared which is the squared value of column

predError in DataFrame transactionsDf?

Options:

A.

transactionsDf.withColumn("predError", pow(col("predErrorSquared"), 2))

B.

transactionsDf.withColumnRenamed("predErrorSquared", pow(predError, 2))

C.

transactionsDf.withColumn("predErrorSquared", pow(col("predError"), lit(2)))

D.

transactionsDf.withColumn("predErrorSquared", pow(predError, lit(2)))

E.

transactionsDf.withColumn("predErrorSquared", "predError"**2)

Buy Now
Questions 27

Which of the following code blocks returns about 150 randomly selected rows from the 1000-row DataFrame transactionsDf, assuming that any row can appear more than once in the returned

DataFrame?

Options:

A.

transactionsDf.resample(0.15, False, 3142)

B.

transactionsDf.sample(0.15, False, 3142)

C.

transactionsDf.sample(0.15)

D.

transactionsDf.sample(0.85, 8429)

E.

transactionsDf.sample(True, 0.15, 8261)

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Last Update: Jan 18, 2025
Questions: 180
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 pdf

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

$29.75  $84.99
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

$35  $99.99
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

$47.25  $134.99