CT-AI Certified Tester AI Testing Exam Questions and Answers

Questions 4

A bank wants to use an algorithm to determine which applicants should be given a loan. The bank hires a data scientist to construct a logistic regression model to predict whether the applicant will repay the loan or not. The bank has enough data on past customers to randomly split the data into a training data set and a test/validation data set. A logistic regression model is constructed on the training data set using the following independent variables:

Gender

Marital status

Number of dependents

Education

Income

Loan amount

Loan term

Credit score

The model reveals that those with higher credit scores and larger total incomes are more likely to repay their loans. The data scientist has suggested that there might be bias present in the model based on previous models created for other banks.

Given this information, what is the best test approach to check for potential bias in the model?

Options:

Experienced-based testing should be used to confirm that the training data set is operationally relevant. This can include applying exploratory data analysis (EDA) to check for bias within the training data set.

Back-to-back testing should be used to compare the model created using the training data set to another model created using the test data set, if the two models significantly differ, it will indicate there is bias in the original model.

Acceptance testing should be used to make sure the algorithm is suitable for the customer. The team can re-work the acceptance criteria such that the algorithm is sure to correctly predict the remaining applicants that have been set aside for the validation data set ensuring no bias is present.

A/B testing should be used to verify that the test data set does not detect any bias that might have been introduced by the original training data. If the two models significantly differ, it will indicate there is bias in the original model.

Buy Now

Answer:

Explanation:

Bias in an AI system occurs when the training data contains inherent prejudices that cause the model to make unfair predictions. Experience-based testing, particularlyExploratory Data Analysis (EDA), helps uncover these biases by analyzing patterns, distributions, and potential discriminatory factors in the training data.

Option A:“Experience-based testing should be used to confirm that the training data set is operationally relevant. This can include applying exploratory data analysis (EDA) to check for bias within the training data set.”

This is the correct answer. EDA involves examining the dataset for bias, inconsistencies, or missing values, ensuring fairness in ML model predictions.

Option B:“Back-to-back testing should be used to compare the model created using the training data set to another model created using the test data set. If the two models significantly differ, it will indicate there is bias in the original model.”

Back-to-back testing is used for regression testing and to compare versions of an AI system but is not primarily used to detect bias.

Option C:“Acceptance testing should be used to make sure the algorithm is suitable for the customer. The team can re-work the acceptance criteria such that the algorithm is sure to correctly predict the remaining applicants that have been set aside for the validation data set ensuring no bias is present.”

Acceptance testing focuses on meeting predefined business requirements rather than detecting and mitigating bias.

Option D:“A/B testing should be used to verify that the test data set does not detect any bias that might have been introduced by the original training data. If the two models significantly differ, it will indicate there is bias in the original model.”

A/B testing is used for evaluating variations of a model rather than for explicitly identifying bias.

Bias Testing Methods:"AI-based systems should be tested for algorithmic bias, sample bias, and inappropriate bias. Experience-based testing and EDA are useful for detecting bias".

Exploratory Data Analysis (EDA):"EDA helps uncover potential bias in training data through visualization and statistical analysis".

Analysis of the Answer Options:ISTQB CT-AI Syllabus References:Thus,Option A is the best choice for detecting bias in the loan applicant model.

Questions 5

“BioSearch” is creating an Al model used for predicting cancer occurrence via examining X-Ray images. The accuracy of the model in isolation has been found to be good. However, the users of the model started complaining of the poor quality of results, especially inability to detect real cancer cases, when put to practice in the diagnosis lab, leading to stopping of the usage of the model.

A testing expert was called in to find the deficiencies in the test planning which led to the above scenario.

Which ONE of the following options would you expect to MOST likely be the reason to be discovered by the test expert?

SELECT ONE OPTION

Options:

A lack of similarity between the training and testing data.

The input data has not been tested for quality prior to use for testing.

A lack of focus on choosing the right functional-performance metrics.

A lack of focus on non-functional requirements testing.

Buy Now

Questions 6

Which of the following characteristics of AI-based systems make it more difficult to ensure they are safe?

Options:

Simplicity

Sustainability

Non-determinism

Robustness

Buy Now

Questions 7

Which ONE of the following tests is MOST likely to describe a useful test to help detect different kinds of biases in ML pipeline?

SELECT ONE OPTION

Options:

Testing the distribution shift in the training data for inappropriate bias.

Test the model during model evaluation for data bias.

Testing the data pipeline for any sources for algorithmic bias.

Check the input test data for potential sample bias.

Buy Now

Questions 8

An airline has created a ML model to project fuel requirements for future flights. The model imports weather data such as wind speeds and temperatures, calculates flight routes based on historical routings from air traffic control, and estimates loads from average passenger and baggage weights. The model performed within an acceptable standard for the airline throughout the summer but as winter set in the load weights became less accurate. After some exploratory data analysis it became apparent that luggage weights were higher in the winter than in summer.

Which of the following statements BEST describes the problem and how it could have been prevented?

Options:

The model suffers from drift and therefore should be regularly tested to ensure that any occurrences of drift are detected soon enough for the problem to be mitigated.

The model suffers from drift and therefore the performance standard should be eased until a newmodel with more transparency can be developed.

The model suffers from corruption and therefore should be reloaded into the computer system being used, preferably with a method of version control to prevent further changes.

The model suffers from a lack of transparency and therefore should be regularly tested to ensure that any progressive errors are detected soon enough for the problem to be mitigated.

Buy Now

Answer:

Explanation:

The problem described in the question is a classic case ofconcept drift. Concept drift occurs when the relationship between input variables and the output variable changes over time, leading to a decline in model accuracy.

In this scenario, theaverage passenger and baggage weightsused in the model changed due to seasonal variations, but the model was not updated accordingly. This resulted in inaccurate predictions for fuel requirements in the winter season. This is an example ofseasonal drift, where model behavior changes periodically due to recurring trends (e.g., higher luggage weights in winter compared to summer).

To prevent such problems:

Themodel should be regularly testedfor concept drift against agreed ML functional performance criteria.

Exploratory Data Analysis (EDA)should be performed periodically to detect gradual changes in input distributions.

Retraining of the modelwith updated training data should be done to maintain accuracy.

If drift is detected, mitigation techniques such asincremental learning, retraining with new data, or adjusting model parametersshould be employed.

Option B (Easing the performance standard instead of addressing drift): Lowering the performance standard is not a solution; it only masks the problem without fixing it. Instead, regular testing and retraining should be used to handle drift properly.

Option C (Corruption and reloading the model): Model corruption is unrelated to this issue. Corruption refers to accidental or malicious damage to the model or data, whereas this case is due to a changing data environment.

Option D (Lack of transparency): Transparency refers to how understandable the model’s decisions are, but the problem here is a change in data distributions, making drift the primary concern.

ISTQB CT-AI Syllabus (Section 7.6: Testing for Concept Drift)

"The operational environment can change over time without the trained model changing correspondingly. This phenomenon is known as concept drift and typically causes the outputs of the model to become increasingly less accurate and less useful."

"Systems that may be prone to concept drift should be regularly tested against their agreed ML functional performance criteria to ensure that any occurrences of concept drift are detected soon enough for the problem to be mitigated."

ISTQB CT-AI Syllabus (Section 7.7: Selecting a Test Approach for an ML System)

"If concept drift is detected, it may be mitigated by retraining the system with up-to-date training data followed by confirmation testing, regression testing, and possibly A/B testing where the updated system must outperform the original system."

Why Other Options Are Incorrect:Supporting References from ISTQB Certified Tester AI Testing Study Guide:Conclusion:Since the question describes a situation whereseasonal variations affected input data distributions, the correct answer isA: The model suffers from drift and therefore should be regularly tested to ensure that any occurrences of drift are detected soon enough for the problem to be mitigated.

Questions 9

Which ONE of the following options does NOT describe a challenge for acquiring test data in ML systems?

SELECT ONE OPTION

Options:

Compliance needs require proper care to be taken of input personal data.

Nature of data constantly changes with lime.

Data for the use case is being generated at a fast pace.

Test data being sourced from public sources.

Buy Now

Questions 10

A beer company is trying to understand how much recognition its logo has in the market. It plans to do that by monitoring images on various social media platforms using a pre-trained neural network for logo detection. This particular model has been trained by looking for words, as well as matching colors on social media images. The company logo has a big word across the middle with a bold blue and magenta border.

Which associated risk is most likely to occur when using this pre-trained model?

Options:

There is no risk, as the model has already been trained

Insufficient function; the model was not trained to check for colors or words

Improper data preparation

Inherited bias: the model could have inherited unknown defects

Buy Now

Questions 11

Arihant Meditation is a startup using Al to aid people in deeper and better meditation based on analysis of various factors such as time and duration of the meditation, pulse and blood pressure, EEG patters etc. among others. Their model accuracy and other functional performance parameters have not yet reached their desired level.

Which ONE of the following factors is NOT a factor affecting the ML functional performance?

SELECT ONE OPTION

Options:

The data pipeline

The quality of the labeling

Biased data

The number of classes

Buy Now

Questions 12

Which ONE of the following is the BEST option to optimize the regression test selection and prevent the regression suite from growing large?

SELECT ONE OPTION

Options:

Identifying suitable tests by looking at the complexity of the test cases.

Using of a random subset of tests.

Automating test scripts using Al-based test automation tools.

Using an Al-based tool to optimize the regression test suite by analyzing past test results

Buy Now

Questions 13

A company is using a spam filter to attempt to identify which emails should be marked as spam. Detection rules are created by the filter that causes a message to be classified as spam. An attacker wishes to have all messages internal to the company be classified as spam. So, the attacker sends messages with obvious red flags in the body of the email and modifies the from portion of the email to make it appear that the emails have been sent by company members. The testers plan to use exploratory data analysis (EDA) to detect the attack and use this information to prevent future adversarial attacks.

How could EDA be used to detect this attack?

Options:

EDA can help detect the outlier emails from the real emails.

EDA can detect and remove the false emails.

EDA can restrict how many inputs can be provided by unique users.

EDA cannot be used to detect the attack.

Buy Now

Answer:

Explanation:

Exploratory Data Analysis (EDA) is an essential technique for examining datasets to uncover patterns, trends, and anomalies, including outliers. In this case, the attacker manipulates the spam filter by injecting emails with red flags and masking them as internal company emails. The primary goal of EDA here is to detect these adversarial modifications.

Detecting Outliers:

EDA techniques such as statistical analysis, clustering, and visualization can reveal patterns in email metadata (e.g., sender details, email content, frequency).

Outlier detection methods like Z-score, IQR (Interquartile Range), or machine learning-based anomaly detection can identify emails that significantly deviate from typical internal communications.

Identifying Distribution Shifts:

By analyzing the frequency and characteristics of emails flagged as spam, testers can detect if the attack has introduced unusual patterns.

If a surge of internal emails is suddenly classified as spam, EDA can help verify whether these classifications are consistent with historical data.

Feature Analysis for Adversarial Patterns:

EDA enables visualization techniques such as scatter plots or histograms to distinguish normal emails from manipulated ones.

Examining email metadata (e.g., changes in headers, unusual wording in email bodies) can reveal adversarial tactics.

Counteracting Adversarial Attacks:

Once anomalies are identified, the spam filter’s detection rules can be improved by retraining the model on corrected datasets.

The adversarial examples can be added to the training data to enhance the robustness of the filter against future attacks.

Exploratory Data Analysis (EDA) is used to detect outliers and adversarial attacks."EDA is where data are examined for patterns, relationships, trends, and outliers. It involves the interactive, hypothesis-driven exploration of data."

EDA can identify poisoned or manipulated data by detecting anomalies and distribution shifts."Testing to detect data poisoning is possible using EDA, as poisoned data may show up as outliers."

EDA helps validate ML models and detect potential vulnerabilities."The use of exploratory techniques, primarily driven by data visualization, can help validate the ML algorithm being used, identify changes that result in efficient models, and leverage domain expertise."

References from ISTQB Certified Tester AI Testing Study GuideThus,option A is the correct answer, as EDA is specifically useful for detecting outliers, which can help identify manipulated spam emails.

Questions 14

Upon testing a model used to detect rotten tomatoes, the following data was observed by the test engineer, based on certain number of tomato images.

For this confusion matrix which combinations of values of accuracy, recall, and specificity respectively is CORRECT?

SELECT ONE OPTION

Options:

0.87.0.9. 0.84

1,0.87,0.84

1,0.9, 0.8

0.84.1,0.9

Buy Now

Questions 15

Which ONE of the following characteristics is the least likely to cause safety related issues for an Al system?

SELECT ONE OPTION

Options:

Non-determinism

Robustness

High complexity

Self-learning

Buy Now

Questions 16

Consider a natural language processing (NLP) algorithm that attempts to predict the next word that you would like to type in a text message. An update to the algorithm has been created that should increase the accuracy of the predictions based on user typing patterns. The old algorithm was rated for accuracy by the users. Then, after the new update was released, the users rated the updated algorithm. A statistical test was used to compare between the two versions of the algorithm to see whether or not the update should remain in place.

This is an example of what type of testing?

Options:

Metamorphic testing

A/B testing

Exploratory testing

Pairwise testing

Buy Now

Questions 17

A team of software testers is attempting to create an AI algorithm to assist in software testing. This particular team has gone through over 40 iterations of testing and cannot afford to spend as much time as it takes to run the full regression test suite. They are hoping to have the algorithm reduce the amount of testing required thus reducing the time needed for each testing cycle.

How can an AI-based tool be expected to assist in this reduction?

Options:

By using a clustering method to quantify the relationships between test cases and then assigning each test case to a category

By performing optimization of the data from past iterations to see where the most common defects occurred and select the corresponding test cases

By performing bayesian analysis to estimate the types of human interactions that are expected to be seen in the system and then selecting those test cases

By using A/B testing to compare the last update with the newest change and compare metrics between the two

Buy Now

Questions 18

"Splendid Healthcare" has started developing a cancer detection system based on ML. The type of cancer they plan on detecting has 2% prevalence rate in the population of a particular geography. It is required that the model performs well for both normal and cancer patients.

Which ONE of the following combinations requires MAXIMIZATION?

SELECT ONE OPTION

Options:

Maximize precision and accuracy

Maximize accuracy and recall

Maximize recall and precision

Maximize specificity number of classes

Buy Now

Answer:

Explanation:

Prevalence Rate and Model Performance:

The cancer detection system being developed by "Splendid Healthcare" needs to account for the fact that the type of cancer has a 2% prevalence rate in the population. This indicates that the dataset is highly imbalanced with far fewer positive (cancer) cases compared to negative (normal) cases.

Importance of Recall:

Recall, also known as sensitivity or true positive rate, measures the proportion of actual positive cases that are correctly identified by the model. In medical diagnosis, especially cancer detection, recall is critical because missing a positive case (false negative) could have severe consequences for the patient. Therefore, maximizing recall ensures that most, if not all, cancer cases are detected.

Importance of Precision:

Precision measures the proportion of predicted positive cases that are actually positive. High precision reduces the number of false positives, meaning fewer people will be incorrectly diagnosed with cancer. This is also important to avoid unnecessary anxiety and further invasive testing for those who do not have the disease.

Balancing Recall and Precision:

In scenarios where both false negatives and false positives have significant consequences, it is crucial to balance recall and precision. This balance ensures that the model is not only good at detecting positive cases but also accurate in its predictions, reducing both types of errors.

Accuracy and Specificity:

While accuracy (the proportion of total correct predictions) is important, it can be misleading in imbalanced datasets. In this case, high accuracy could simply result from the model predicting the majority class (normal) correctly. Specificity (true negative rate) is also important, but for a cancer detection system, recall and precision take precedence to ensure positive cases are correctly and accurately identified.

Conclusion:

Therefore, for a cancer detection system with a low prevalence rate, maximizing both recall and precision is crucial to ensure effective and accurate detection of cancer cases.

[: This explanation aligns with the principles outlined in the ISTQB CT-AI Syllabus, particularly sections on performance metrics for ML models and handling imbalanced datasets (Chapter 5: ML Functional Performance Metrics)., , , ]

Questions 19

An image classification system is being trained for classifying faces of humans. The distribution of the data is 70% ethnicity A and 30% for ethnicities B, C and D. Based ONLY on the above information, which of the following options BEST describes the situation of this image classification system?

SELECT ONE OPTION

Options:

This is an example of expert system bias.

This is an example of sample bias.

This is an example of hyperparameter bias.

This is an example of algorithmic bias.

Buy Now

Questions 20

A tourist calls an airline to book a ticket and is connected with an automated system which is able to recognize speech, understand requests related to purchasing a ticket, and provide relevant travel options. When the tourist asks about the expected weather at the destination or potential impacts on operations because of the tight labor market the only response from the automated system is: "Idon't understand your question."

This AI system should be categorized as?

Options:

General AI

Narrow AI

Super AI

Conventional AI

Buy Now

Questions 21

Which of the following is one of the reasons for data mislabelling?

Options:

Lack of domain knowledge

Expert knowledge

Interoperability error

Small datasets

Buy Now

Questions 22

Which ONE of the following options describes a scenario of A/B testing the LEAST?

SELECT ONE OPTION

Options:

A comparison of two different websites for the same company to observe from a user acceptance perspective.

A comparison of two different offers in a recommendation system to decide on the more effective offer for same users.

A comparison of the performance of an ML system on two different input datasets.

A comparison of the performance of two different ML implementations on the same input data.

Buy Now

Answer:

Explanation:

A/B testing, also known as split testing, is a method used to compare two versions of a product or system to determine which one performs better. It is widely used in web development, marketing, and machine learning to optimize user experiences and model performance. Here’s why option C is the least descriptive of an A/B testing scenario:

Understanding A/B Testing:

In A/B testing, two versions (A and B) of a system or feature are tested against each other. The objective is to measure which version performs better based on predefined metrics such as user engagement, conversion rates, or other performance indicators.

Application in Machine Learning:

In ML systems, A/B testing might involve comparing two different models, algorithms, or system configurations on the same set of data to observe which yields better results.

Why Option C is the Least Descriptive:

Option C describes comparing the performance of an ML system on two different input datasets. This scenario focuses on the input data variation rather than the comparison of system versions or features, which is the essence of A/B testing. A/B testing typically involves a controlled experiment with two versions being tested under the same conditions, not different datasets.

Clarifying the Other Options:

A. A comparison of two different websites for the same company to observe from a user acceptance perspective: This is a classic example of A/B testing where two versions of a website are compared.

B. A comparison of two different offers in a recommendation system to decide on the more effective offer for the same users: This is another example of A/B testing in a recommendation system.

D. A comparison of the performance of two different ML implementations on the same input data: This fits the A/B testing model where two implementations are compared under the same conditions.

References:

ISTQB CT-AI Syllabus, Section 9.4, A/B Testing, explains the methodology and application of A/B testing in various contexts.

"Understanding A/B Testing" (ISTQB CT-AI Syllabus).

Questions 23

"AllerEgo" is a product that uses sell-learning to predict the behavior of a pilot under combat situation for a variety of terrains and enemy aircraft formations. Post training the model was exposed to the real-

world data and the model was found to be behaving poorly. A lot of data quality tests had been performed on the data to bring it into a shape fit for training and testing.

Which ONE of the following options is least likely to describes the possible reason for the fall in the performance, especially when considering the self-learning nature of the Al system?

SELECT ONE OPTION

The difficulty of defining criteria for improvement before the model can be accepted.

The fast pace of change did not allow sufficient time for testing.

The unknown nature and insufficient specification of the operating environment might have caused the poor performance.

Options:

There was an algorithmic bias in the Al system.

Buy Now

Questions 24

Pairwise testing can be used in the context of self-driving cars for controlling an explosion in the number of combinations of parameters.

Which ONE of the following options is LEAST likely to be a reason for this incredible growth of parameters?

SELECT ONE OPTION

Options:

Different Road Types

Different weather conditions

ML model metrics to evaluate the functional performance

Different features like ADAS, Lane Change Assistance etc.

Buy Now

Answer:

Explanation:

Pairwise testing is used to handle the large number of combinations of parameters that can arise in complex systems like self-driving cars. The question asks which of the given options isleast likelyto be a reason for the explosion in the number of parameters.

Different Road Types (A): Self-driving cars must operate on various road types, such as highways, city streets, rural roads, etc. Each road type can have different characteristics, requiring the car's system to adapt and handle different scenarios. Thus, this is a significant factor contributing to the growth of parameters.

Different Weather Conditions (B): Weather conditions such as rain, snow, fog, and bright sunlight significantly affect the performance of self-driving cars. The car's sensors and algorithms must adapt to these varying conditions, which adds to the number of parameters that need to be considered.

ML Model Metrics to Evaluate Functional Performance (C): While evaluating machine learning (ML) model performance is crucial, it does not directly contribute to the explosion of parameter combinations in the same way that road types, weather conditions, and car features do. Metrics are used to measure and assess performance but are not themselves variable conditions that the system must handle.

Different Features like ADAS, Lane Change Assistance, etc. (D): Advanced Driver Assistance Systems (ADAS) and other features add complexity to self-driving cars. Each feature can have multiple settings and operational modes, contributing to the overall number of parameters.

Hence, theleast likelyreason for the incredible growth in the number of parameters isC. ML model metrics to evaluate the functional performance.

References:

ISTQB CT-AI Syllabus Section 9.2 on Pairwise Testing discusses the application of this technique to manage the combinations of different variables in AI-based systems, including those used in self-driving cars.

Sample Exam Questions document, Question #29 provides context for the explosion in parameter combinations in self-driving cars and highlights the use of pairwise testing as a method to manage this complexity.

Exam Code: CT-AI

Exam Name: Certified Tester AI Testing Exam

Last Update: Apr 9, 2025

Questions: 80

CT-AI PDF

$29.75 ~~$84.99~~

Add to Cart

CT-AI Testing Engine

$35 ~~$99.99~~

Add to Cart

CT-AI PDF + Testing Engine

$47.25 ~~$134.99~~

Add to Cart

Limited Time Discount Offer 65% Off - Ends in 0d 00h 00m 00s - Coupon code: pass65

clapgeek logo

CT-AI Certified Tester AI Testing Exam Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

CT-AI PDF

CT-AI Testing Engine

CT-AI PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure