Databricks-Machine-Learning-Associate Databricks Certified Machine Learning Associate Exam Questions and Answers

Questions 4

A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:

They have written the following incomplete code block to use predict to score each record of Spark DataFramespark_df:

Which of the following lines of code can be used to complete the code block to successfully complete the task?

Options:

predict(*spark_df.columns)

mapInPandas(predict)

predict(Iterator(spark_df))

mapInPandas(predict(spark_df.columns))

predict(spark_df.columns)

Buy Now

Questions 5

A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.

Which change could the data scientist make to improve their model accuracy over the course of their tuning process?

Options:

Change the number of compute nodes to be half or less than half of the number of evaluations.

Change the number of compute nodes and the number of evaluations to be much larger but equal.

Change the iterative optimization algorithm used to facilitate the tuning process.

Change the number of compute nodes to be double or more than double the number of evaluations.

Buy Now

Questions 6

A data scientist is wanting to explore the Spark DataFrame spark_df. The data scientist wants visual histograms displaying the distribution of numeric features to be included in the exploration.

Which of the following lines of code can the data scientist run to accomplish the task?

Options:

spark_df.describe()

dbutils.data(spark_df).summarize()

This task cannot be accomplished in a single line of code.

spark_df.summary()

dbutils.data.summarize (spark_df)

Buy Now

Questions 7

A machine learning engineering team has a Job with three successive tasks. Each task runs a single notebook. The team has been alerted that the Job has failed in its latest run.

Which of the following approaches can the team use to identify which task is the cause of the failure?

Options:

Run each notebook interactively

Review the matrix view in the Job's runs

Migrate the Job to a Delta Live Tables pipeline

Change each Task’s setting to use a dedicated cluster

Buy Now

Questions 8

A data scientist has replaced missing values in their feature set with each respective feature variable’s median value. A colleague suggests that the data scientist is throwing away valuable information by doing this.

Which of the following approaches can they take to include as much information as possible in the feature set?

Options:

Impute the missing values using each respective feature variable's mean value instead of the median value

Refrain from imputing the missing values in favor of letting the machine learning algorithm determine how to handle them

Remove all feature variables that originally contained missing values from the feature set

Create a binary feature variable for each feature that contained missing values indicating whether each row's value has been imputed

Create a constant feature variable for each feature that contained missing values indicating the percentage of rows from the feature that was originally missing

Buy Now

Questions 9

The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.

Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?

Options:

Logistic regression

Spark ML cannot distribute linear regression training

Iterative optimization

Least-squares method

Singular value decomposition

Buy Now

Questions 10

A machine learning engineer has grown tired of needing to install the MLflow Python library on each of their clusters. They ask a senior machine learning engineer how their notebooks can load the MLflow library without installing it each time. The senior machine learning engineer suggests that they use Databricks Runtime for Machine Learning.

Which of the following approaches describes how the machine learning engineer can begin using Databricks Runtime for Machine Learning?

Options:

They can add a line enabling Databricks Runtime ML in their init script when creating their clusters.

They can check the Databricks Runtime ML box when creating their clusters.

They can select a Databricks Runtime ML version from the Databricks Runtime Version dropdown when creating their clusters.

They can set the runtime-version variable in their Spark session to “ml”.

Buy Now

Questions 11

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:

prediction DOUBLE

actual DOUBLE

Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

Options:

Option A

Option B

Option C

Option D

Option E

Buy Now

Questions 12

A machine learning engineer is trying to perform batch model inference. They want to get predictions using the linear regression model saved at the pathmodel_urifor the DataFramebatch_df.

batch_dfhas the following schema:

customer_id STRING

The machine learning engineer runs the following code block to perform inference onbatch_dfusing the linear regression model atmodel_uri:

In which situation will the machine learning engineer’s code block perform the desired inference?

Options:

When the Feature Store feature set was logged with the model at model_uri

When all of the features used by the model at model_uri are in a Spark DataFrame in the PySpark

When the model at model_uri only uses customer_id as a feature

This code block will not perform the desired inference in any situation.

When all of the features used by the model at model_uri are in a single Feature Store table

Buy Now

Questions 13

A data scientist has developed a machine learning pipeline with a static input data set using Spark ML, but the pipeline is taking too long to process. They increase the number of workers in the cluster to get the pipeline to run more efficiently. They notice that the number of rows in the training set after reconfiguring the cluster is different from the number of rows in the training set prior to reconfiguring the cluster.

Which of the following approaches will guarantee a reproducible training and test set for each model?

Options:

Manually configure the cluster

Write out the split data sets to persistent storage

Set a speed in the data splitting operation

Manually partition the input data

Buy Now

Questions 14

A data scientist uses 3-fold cross-validation and the following hyperparameter grid when optimizing model hyperparameters via grid search for a classification problem:

● Hyperparameter 1: [2, 5, 10]

● Hyperparameter 2: [50, 100]

Which of the following represents the number of machine learning models that can be trained in parallel during this process?

Options:

Buy Now

Questions 15

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.

Which of the following explanations justifies this suggestion?

Options:

One-hot encoding is not supported by most machine learning libraries.

One-hot encoding is dependent on the target variable's values which differ for each application.

One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.

One-hot encoding is not a common strategy for representing categorical feature variables numerically.

One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.

Buy Now

Questions 16

A machine learning engineer has created a Feature Table new_table using Feature Store Client fs. When creating the table, they specified a metadata description with key information about the Feature Table. They now want to retrieve that metadata programmatically.

Which of the following lines of code will return the metadata description?

Options:

There is no way to return the metadata description programmatically.

fs.create_training_set("new_table")

fs.get_table("new_table").description

fs.get_table("new_table").load_df()

fs.get_table("new_table")

Buy Now

Questions 17

A machine learning engineer is converting a decision tree from sklearn to Spark ML. They notice that they are receiving different results despite all of their data and manually specified hyperparameter values being identical.

Which of the following describes a reason that the single-node sklearn decision tree and the Spark ML decision tree can differ?

Options:

Spark ML decision trees test every feature variable in the splitting algorithm

Spark ML decision trees automatically prune overfit trees

Spark ML decision trees test more split candidates in the splitting algorithm

Spark ML decision trees test a random sample of feature variables in the splitting algorithm

Spark ML decision trees test binned features values as representative split candidates

Buy Now

Questions 18

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.

Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

Options:

import pyspark.pandas as ps

df = ps.DataFrame(spark_df)

import pyspark.pandas as ps

df = ps.to_pandas(spark_df)

spark_df.to_pandas()

import pandas as pd

df = pd.DataFrame(spark_df)

Buy Now

Questions 19

Which of the following hyperparameter optimization methods automatically makes informed selections of hyperparameter values based on previous trials for each iterative model evaluation?

Options:

Random Search

Halving Random Search

Tree of Parzen Estimators

Grid Search

Buy Now

Questions 20

A data scientist uses 3-fold cross-validation when optimizing model hyperparameters for a regression problem. The following root-mean-squared-error values are calculated on each of the validation folds:

• 10.0

• 12.0

• 17.0

Which of the following values represents the overall cross-validation root-mean-squared error?

Options:

13.0

17.0

12.0

39.0

10.0

Buy Now

Questions 21

A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.

Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?

Options:

A holdout set is not necessary when using a train-validation split

Reproducibility is achievable when using a train-validation split

Fewer hyperparameter values need to be tested when usinga train-validation split

Bias is avoidable when using a train-validation split

Fewer models need to be trained when using a train-validation split

Buy Now

Questions 22

Which of the following statements describes a Spark ML estimator?

Options:

An estimator is a hyperparameter arid that can be used to train a model

An estimator chains multiple alqorithms toqether to specify an ML workflow

An estimator is a trained ML model which turns a DataFrame with features into a DataFrame with predictions

An estimator is an alqorithm which can be fit on a DataFrame to produce a Transformer

An estimator is an evaluation tool to assess to the quality of a model

Buy Now

Exam Code: Databricks-Machine-Learning-Associate

Exam Name: Databricks Certified Machine Learning Associate Exam

Last Update: Jul 13, 2025

Questions: 74

Databricks-Machine-Learning-Associate PDF

$29.75 ~~$84.99~~

Add to Cart

Databricks-Machine-Learning-Associate Engine

Databricks-Machine-Learning-Associate Testing Engine

$35 ~~$99.99~~

Add to Cart

Databricks-Machine-Learning-Associate PDF + Engine

Databricks-Machine-Learning-Associate PDF + Testing Engine

$47.25 ~~$134.99~~

Add to Cart

Summer Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: geek65

clapgeek logo

Databricks-Machine-Learning-Associate Databricks Certified Machine Learning Associate Exam Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Databricks-Machine-Learning-Associate PDF

Databricks-Machine-Learning-Associate Testing Engine

Databricks-Machine-Learning-Associate PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure