Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Questions and Answers

Questions 4

What is the structure of an Asset Bundle?

Options:

A single plain text file enumerating the names of assets to be migrated to a new workspace.

A compressed archive (ZIP) that solely contains workspace assets without any accompanying metadata.

A YAML configuration file that specifies the artifacts, resources, and configurations for the project.

A Docker image containing runtime environments and the source code of the assets

Buy Now

Questions 5

A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to apply this custom logic at scale, the data engineer wants to create a SQL user-defined function (UDF).

Which of the following code blocks creates this SQL UDF?

Options:

Buy Now

Questions 6

A data engineer wants to create a new table containing the names of customers that live in France.

They have written the following command:

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (PII).

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

There is no way to indicate whether a table contains PII.

"COMMENT PII"

TBLPROPERTIES PII

COMMENT "Contains PII"

PII

Buy Now

Questions 7

A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite database.

They run the following command:

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

org.apache.spark.sql.jdbc

autoloader

DELTA

sqlite

org.apache.spark.sql.sqlite

Buy Now

Questions 8

A data engineer is attempting to write Python and SQL in the same command cell and is running into an error The engineer thought that it was possible to use a Python variable in a select statement.

Why does the command fail?

Options:

Databricks supports multiple languages but only one per notebook.

Databricks supports language interoperability in the same cell but only between Scala and SQL

Databricks supports language interoperability but only if a special character is used.

Databricks supports one language per cell.

Buy Now

Questions 9

A data engineer needs access to a table new_uable, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which approach can be used to identify the owner of new_table?

Options:

There is no way to identify the owner of the table

Review the Owner field in the table's page in the cloud storage solution

Review the Permissions tab in the table's page in Data Explorer

Review the Owner field in the table’s page in Data Explorer

Buy Now

Questions 10

Which of the following describes the relationship between Gold tables and Silver tables?

Options:

Gold tables are more likely to contain aggregations than Silver tables.

Gold tables are more likely to contain valuable data than Silver tables.

Gold tables are more likely to contain a less refined view of data than Silver tables.

Gold tables are more likely to contain more data than Silver tables.

Gold tables are more likely to contain truthful data than Silver tables.

Buy Now

Questions 11

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

Options:

They could submit a feature request with Databricks to add this functionality.

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.

They could only run the entire program on Sundays.

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.

They could redesign the data model to separate the data used in the final query into a new table.

Buy Now

Questions 12

Which of the following commands will return the location of database customer360?

Options:

DESCRIBE LOCATION customer360;

DROP DATABASE customer360;

DESCRIBE DATABASE customer360;

ALTER DATABASE customer360 SET DBPROPERTIES ('location' = '/user'};

USE DATABASE customer360;

Buy Now

Questions 13

A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data.

Which of the following relational objects should the data engineer create?

Options:

Spark SQL Table

View

Database

Temporary view

Delta Table

Buy Now

Questions 14

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A data lakehouse provides storage solutions for structured and unstructured data.

A data lakehouse supports ACID-compliant transactions.

A data lakehouse allows the use of SQL queries to examine data.

A data lakehouse stores data in open formats.

A data lakehouse enables machine learning and artificial Intelligence workloads.

Buy Now

Questions 15

A data engineer is processing ingested streaming tables and needs to filter out NULL values in the order_datetime column from the raw streaming table orders_raw and store the results in a new table orders_valid using DLT.

Which code snippet should the data engineer use?

Options:

Option A

Option B

Option C

Option D

Buy Now

Questions 16

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

Records that violate the expectation cause the job to fail.

Buy Now

Questions 17

A data engineer is reviewing the documentation on audit logs in Databricks for compliance purposes and needs to understand the format in which audit logs output events.

How are events formatted in Databricks audit logs?

Options:

In Databricks, audit logs output events in a plain text format.

In Databricks, audit logs output events in a JSON format.

In Databricks, audit logs output events in an XML format.

In Databricks, audit logs output events in a CSV format.

Buy Now

Questions 18

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

Options:

When another task needs to be replaced by the new task

When another task needs to fail before the new task begins

When another task has the same dependency libraries as the new task

When another task needs to use as little compute resources as possible

When another task needs to successfully complete before the new task begins

Buy Now

Questions 19

Which of the following Structured Streaming queries is performing a hop from a Silver table to a Gold table?

Options:

Buy Now

Questions 20

Identify a scenario to use an external table.

A Data Engineer needs to create a parquet bronze table and wants to ensure that it gets stored in a specific path in an external location.

Which table can be created in this scenario?

Options:

An external table where the location is pointing to specific path in external location.

An external table where the schema has managed location pointing to specific path in external location.

A managed table where the catalog has managed location pointing to specific path in external location.

A managed table where the location is pointing to specific path in external location.

Buy Now

Questions 21

A data engineer has written a function in a Databricks Notebook to calculate the population of bacteria in a given medium.

Analysts use this function in the notebook and sometimes provide input arguments of the wrong data type, which can cause errors during execution.

Which Databricks feature will help the data engineer quickly identify if an incorrect data type has been provided as input?

Options:

The Data Engineer should add print statements to find out what the variable is.

The Databricks debugger enables breakpoints that will raise an error if the wrong data type is submitted

The Spark User interface has a debug tab that contains the variables that are used in this session.

The Databricks debugger enables the use of a variable explorer to see at a glance the value of the variables.

Buy Now

Questions 22

Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?

Options:

SELECT * FROM my_table WHERE age > 25;

UPDATE my_table WHERE age > 25;

DELETE FROM my_table WHERE age > 25;

UPDATE my_table WHERE age <= 25;

DELETE FROM my_table WHERE age <= 25;

Buy Now

Questions 23

Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

Options:

CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.

CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.

CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.

Buy Now

Questions 24

A data engineer has created a new database using the following command:

CREATE DATABASE IF NOT EXISTS customer360;

In which of the following locations will the customer360 database be located?

Options:

dbfs:/user/hive/database/customer360

dbfs:/user/hive/warehouse

dbfs:/user/hive/customer360

More information is needed to determine the correct response

Buy Now

Questions 25

A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.

Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?