Professional-Cloud-DevOps-Engineer Google Cloud Certified - Professional Cloud DevOps Engineer Exam Questions and Answers

Questions 4

You are creating a CI/CD pipeline to perform Terraform deployments of Google Cloud resources Your CI/CD tooling is running in Google Kubernetes Engine (GKE) and uses an ephemeral Pod for each pipeline run You must ensure that the pipelines that run in the Pods have the appropriate Identity and Access Management (1AM) permissions to perform the Terraform deployments You want to follow Google-recommended practices for identity management What should you do?

Choose 2 answers

Options:

Create a new Kubernetes service account, and assign the service account to the Pods Use Workload Identity to authenticate as the Google service account

Create a new JSON service account key for the Google service account store the key as a Kubernetes secret, inject the key into the Pods, and set the boogle_application_credentials environment variable

Create a new Google service account, and assign the appropriate 1AM permissions

Create a new JSON service account key for the Google service account store the key in the secret management store for the CI/CD tool and configure Terraform to use this key for authentication

Assign the appropriate 1AM permissions to the Google service account associated with the Compute Engine VM instances that run the Pods

Buy Now

Questions 5

You are deploying an application that needs to access sensitive information. You need to ensure that this information is encrypted and the risk of exposure is minimal if a breach occurs. What should you do?

Options:

Store the encryption keys in Cloud Key Management Service (KMS) and rotate the keys frequently

Inject the secret at the time of instance creation via an encrypted configuration management system.

Integrate the application with a Single sign-on (SSO) system and do not expose secrets to the application

Leverage a continuous build pipeline that produces multiple versions of the secret for each instance of the application.

Buy Now

Questions 6

Your company operates in a highly regulated domain. Your security team requires that only trusted container images can be deployed to Google Kubernetes Engine (GKE). You need to implement a solution that meets the requirements of the security team, while minimizing management overhead. What should you do?

Options:

Grant the roles/artifactregistry. writer role to the Cloud Build service account. Confirm that no employee has Artifact Registry write permission.

Use Cloud Run to write and deploy a custom validator Enable an Eventarc trigger to perform validations when new images are uploaded.

Configure Kritis to run in your GKE clusters to enforce deploy-time security policies.

Configure Binary Authorization in your GKE clusters to enforce deploy-time security policies

Buy Now

Questions 7

You are building the Cl/CD pipeline for an application deployed to Google Kubernetes Engine (GKE) The application is deployed by using a Kubernetes Deployment, Service, and Ingress The application team asked you to deploy the application by using the blue'green deployment methodology You need to implement the rollback actions What should you do?

Options:

Run the kubectl rollout undo command

Delete the new container image, and delete the running Pods

Update the Kubernetes Service to point to the previous Kubernetes Deployment

Scale the new Kubernetes Deployment to zero

Buy Now

Questions 8

You need to enforce several constraint templates across your Google Kubernetes Engine (GKE) clusters. The constraints include policy parameters, such as restricting the Kubernetes API. You must ensure that the policy parameters are stored in a GitHub repository and automatically applied when changes occur. What should you do?

Options:

Set up a GitHub action to trigger Cloud Build when there is a parameter change. In Cloud Build, run a gcloud CLI command to apply the change.

When there is a change in GitHub, use a web hook to send a request to Anthos Service Mesh, and apply the change.

Configure Anthos Config Management with the GitHub repository. When there is a change in the repository, use Anthos Config Management to apply the change.

Configure Config Connector with the GitHub repository. When there is a change in the repository, use Config Connector to apply the change.

Buy Now

Answer:

Explanation:

The correct answer is C. Configure Anthos Config Management with the GitHub repository. When there is a change in the repository, use Anthos Config Management to apply the change.

According to the web search results, Anthos Config Management is a service that lets you manage the configuration of your Google Kubernetes Engine (GKE) clusters from a single source of truth, such as a GitHub repository1. Anthos Config Management can enforce several constraint templates across your GKE clusters by using Policy Controller, which is a feature that integrates the Open Policy Agent (OPA) Constraint Framework into Anthos Config Management2. Policy Controller can apply constraints that include policy parameters, such as restricting the Kubernetes API3. To use Anthos Config Management and Policy Controller, you need to configure them with your GitHub repository and enable the sync mode4. When there is a change in the repository, Anthos Config Management will automatically sync and apply the change to your GKE clusters5.

The other options are incorrect because they do not use Anthos Config Management and Policy Controller. Option A is incorrect because it uses a GitHub action to trigger Cloud Build, which is a service that executes your builds on Google Cloud Platform infrastructure6. Cloud Build can run a gcloud CLI command to apply the change, but it does not use Anthos Config Management or Policy Controller. Option B is incorrect because it uses a web hook to send a request to Anthos Service Mesh, which is a service that provides a uniform way to connect, secure, monitor, and manage microservices on GKE clusters7. Anthos Service Mesh can apply the change, but it does not use Anthos Config Management or Policy Controller. Option D is incorrect because it uses Config Connector, which is a service that lets you manage Google Cloud resources through Kubernetes configuration. Config Connector can apply the change, but it does not use Anthos Config Management or Policy Controller.

[Reference:, Anthos Config Management documentation, Overview. Policy Controller, Policy Controller. Constraint template library, Constraint template library. Installing Anthos Config Management, Installing Anthos Config Management. Syncing configurations, Syncing configurations. Cloud Build documentation, Overview. Anthos Service Mesh documentation, Overview. [Config Connector documentation], Overview., , , , , ]

Questions 9

Your team is preparing to launch a new API in Cloud Run. The API uses an OpenTelemetry agent to send distributed tracing data to Cloud Trace to monitor the time each request takes. The team has noticed inconsistent trace collection. You need to resolve the issue. What should you do?

Options:

Increase the CPU limit in Cloud Run from 2 to 4.

Use an HTTP health check.

Configure CPU to be allocated only during request processing.

Configure CPU to be always-allocated.

Buy Now

Questions 10

You are deploying an application to Cloud Run. The application requires a password to start. Your organization requires that all passwords are rotated every 24 hours, and your application must have the latest password. You need to deploy the application with no downtime. What should you do?

Options:

Store the password in Secret Manager and send the secret to the application by using environment variables.

Store the password in Secret Manager and mount the secret as a volume within the application.

Use Cloud Build to add your password into the application container at build time. Ensure that Artifact Registry is secured from public access.

Store the password directly in the code. Use Cloud Build to rebuild and deploy the application each time the password changes.

Buy Now

Answer:

Explanation:

The correct answer is B. Store the password in Secret Manager and mount the secret as a volume within the application.

Secret Manager is a service that allows you to securely store and manage sensitive data such as passwords, API keys, certificates, and tokens.You can use Secret Manager to rotate your secrets automatically or manually, and access them from your Cloud Run applications1.

There are two ways to use secrets from Secret Manager in Cloud Run:

As environment variables: You can set environment variables that point to secrets in Secret Manager. Cloud Run will resolve the secrets at runtime and inject them into the environment of your application. However, this method has some limitations, such as:

The environment variables are cached for up to 10 minutes, so you may not get the latest version of the secret immediately.

The environment variables are visible in plain text in the Cloud Console and the Cloud SDK, which may expose sensitive information.

The environment variables are limited to 4 KB of data, which may not be enough for some secrets.2

As file system volumes: You can mount secrets from Secret Manager as files in a volume within your application. Cloud Run will create a tmpfs volume and write the secrets as files in it. This method has some advantages, such as:

The files are updated every 30 seconds, so you can get the latest version of the secret faster.

The files are not visible in the Cloud Console or the Cloud SDK, which provides better security.

The files can store up to 64 KB of data, which allows for larger secrets.3

Therefore, for your use case, it is better to use the second method and mount the secret as a file system volume within your application. This way, you can ensure that your application has the latest password, and you can deploy it with no downtime.

To mount a secret as a file system volume in Cloud Run, you can use the following command:

gcloud beta run deploy SERVICE --image IMAGE_URL --update-secrets=/path/to/file=secretName:version

where:

SERVICE is the name of your Cloud Run service.

IMAGE_URL is the URL of your container image.

/path/to/file is the path where you want to mount the secret file in your application.

secretName is the name of your secret in Secret Manager.

version is the version of your secret.You can uselatestto get the most recent version.3

You can also use the Cloud Console to mount secrets as file system volumes. For more details, seeMounting secrets from Secret Manager.

Questions 11

You have an application deployed to Cloud Run. A new version of the application has recently been deployed using the canary deployment strategy. Your Site Reliability Engineering (SRE) teammate informs you that an SLO has been exceeded for this application. You need to make the application healthy as quickly as possible. What should you do first?

Options:

Configure traffic splitting to send 100% of the traffic to the latest revision.

Configure traffic splitting to send 100% of the traffic to the previous revision.

Create a new revision using the last known good version of the application.

Identify the cause of the latency by using Cloud Trace.

Buy Now

Questions 12

Your company processes IOT data at scale by using Pub/Sub, App Engine standard environment, and an application written in GO. You noticed that the performance inconsistently degrades at peak load. You could not reproduce this issue on your workstation. You need to continuously monitor the application in production to identify slow paths in the code. You want to minimize performance impact and management overhead. What should you do?

Options:

Install a continuous profiling tool into Compute Engine. Configure the application to send profiling data to the tool.

Periodically run the go tool pprof command against the application instance. Analyze the results by using flame graphs.

Configure Cloud Profiler, and initialize the cloud.go@gle.com/go/profiler library in the application.

Use Cloud Monitoring to assess the App Engine CPU utilization metric.

Buy Now

Answer:

Explanation:

The correct answer is C. Configure Cloud Profiler, and initialize the cloud.google.com/go/profiler library in the application.

According to the Google Cloud documentation, Cloud Profiler is a statistical, low-overhead profiler that continuously gathers CPU usage and memory-allocation information from your production applications1. Cloud Profiler can help you identify slow paths in your code and optimize the performance of your applications. Cloud Profiler supports applications written in Go that run on App Engine standard environment2. To use Cloud Profiler, you need to configure it in your Google Cloud project and initialize the cloud.google.com/go/profiler library in your application code3. You can then use the Cloud Profiler interface to analyze the profiling data and visualize the results by using flame graphs4. Cloud Profiler has minimal performance impact and management overhead, as it only samples a small fraction of the application activity and does not require any additional infrastructure or agents.

The other options are incorrect because they do not meet the requirements of minimizing performance impact and management overhead. Option A is incorrect because it requires installing a continuous profiling tool into Compute Engine, which is an additional infrastructure that needs to be managed and maintained. Option B is incorrect because it requires periodically running the go tool pprof command against the application instance, which is a manual and disruptive process that can affect the application performance. Option D is incorrect because it only uses Cloud Monitoring to assess the App Engine CPU utilization metric, which is not enough to identify slow paths in the code or optimize the application performance.

[Reference:, Cloud Profiler documentation, Overview. Profiling Go applications, Supported environments. Profiling Go applications, Using Cloud Profiler. Analyzing data, Analyzing data., , , , , ]

Questions 13

Your company operates in a highly regulated domain that requires you to store all organization logs for seven years You want to minimize logging infrastructure complexity by using managed services You need to avoid any future loss of log capture or stored logs due to misconfiguration or human error What should you do?

Options:

Use Cloud Logging to configure an aggregated sink at the organization level to export all logs into a BigQuery dataset

Use Cloud Logging to configure an aggregated sink at the organization level to export all logs into Cloud Storage with a seven-year retention policy and Bucket Lock

Use Cloud Logging to configure an export sink at each project level to export all logs into a BigQuery dataset

Use Cloud Logging to configure an export sink at each project level to export all logs into Cloud Storage with a seven-year retention policy and Bucket Lock

Buy Now

Questions 14

You are running an application on Compute Engine and collecting logs through Stackdriver. You discover that some personally identifiable information (Pll) is leaking into certain log entry fields. All Pll entries begin with the text userinfo. You want to capture these log entries in a secure location for later review and prevent them from leaking to Stackdriver Logging. What should you do?

Options:

Create a basic log filter matching userinfo, and then configure a log export in the Stackdriver console with Cloud Storage as a sink.

Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries containing userinfo, and then copy the entries to a Cloud Storage bucket.

Create an advanced log filter matching userinfo, configure a log export in the Stackdriver console with Cloud Storage as a sink, and then configure a tog exclusion with userinfo as a filter.

Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries containing userinfo, create an advanced log filter matching userinfo, and then configure a log export in the Stackdriver console with Cloud Storage as a sink.

Buy Now

Questions 15

Your organization wants to implement Site Reliability Engineering (SRE) culture and principles. Recently, a service that you support had a limited outage. A manager on another team asks you to provide a formal explanation of what happened so they can action remediations. What should you do?

Options:

Develop a postmortem that includes the root causes, resolution, lessons learned, and a prioritized list of action items. Share it with the manager only.

Develop a postmortem that includes the root causes, resolution, lessons learned, and a prioritized list of action items. Share it on the engineering organization's document portal.

Develop a postmortem that includes the root causes, resolution, lessons learned, the list of people responsible, and a list of action items for each person. Share it with the manager only.

Develop a postmortem that includes the root causes, resolution, lessons learned, the list of people responsible, and a list of action items for each person. Share it on the engineering organization's document portal.

Buy Now

Questions 16

Your application services run in Google Kubernetes Engine (GKE). You want to make sure that only images from your centrally-managed Google Container Registry (GCR) image registry in the altostrat-images project can be deployed to the cluster while minimizing development time. What should you do?

Options:

Create a custom builder for Cloud Build that will only push images to gcr.io/altostrat-images.

Use a Binary Authorization policy that includes the whitelist name pattern gcr.io/attostrat-images/.

Add logic to the deployment pipeline to check that all manifests contain only images from gcr.io/altostrat-images.

Add a tag to each image in gcr.io/altostrat-images and check that this tag is present when the image is deployed.

Buy Now

Questions 17

Your team is running microservices in Google Kubernetes Engine (GKE) You want to detect consumption of an error budget to protect customers and define release policies What should you do?

Options:

Create SLIs from metrics Enable Alert Policies if the services do not pass

Use the metrics from Anthos Service Mesh to measure the health of the microservices

Create a SLO Create an Alert Policy on select_slo_bum_rate

Create a SLO and configure uptime checks for your services Enable Alert Policies if the services do not pass

Buy Now

Questions 18

You are leading a DevOps project for your organization. The DevOps team is responsible for managing the service infrastructure and being on-call for incidents. The Software Development team is responsible for writing, submitting, and reviewing code. Neither team has any published SLOs. You want to design a new joint-ownership model for a service between the DevOps team and the Software Development team. Which responsibilities should be assigned to each team in the new joint-ownership model?

Options:

Option A

Option B

Option C

Option D

Buy Now

Answer:

Explanation:

The correct answer is D. Option D.

According to the DevOps best practices, a joint-ownership model for a service between the DevOps team and the Software Development team should follow these principles12:

The DevOps team and the Software Development team should share the responsibility and collaboration for managing the service infrastructure, performing code reviews, and adopting and sharing SLOs for the service.

The DevOps team and the Software Development team should have end-to-end ownership of the service, from design to development to deployment to operation to maintenance.

The DevOps team and the Software Development team should use common tools and processes to facilitate communication, coordination, and feedback.

The DevOps team and the Software Development team should align their goals and incentives with the business outcomes and customer satisfaction.

Option D is the only option that reflects these principles. Option D assigns both teams the responsibilities of managing the service infrastructure, performing code reviews, and adopting and sharing SLOs for the service. Option D also implies that both teams have end-to-end ownership of the service, as they are involved in every stage of the service lifecycle.Option D also encourages both teams to use common tools and processes, such as GitLab3, to collaborate and communicate effectively. Option D also aligns both teams with the business outcomes and customer satisfaction, as they use SLOs to measure and improve the service quality.

The other options are incorrect because they do not follow the DevOps best practices. Option A is incorrect because it assigns only the DevOps team the responsibility of managing the service infrastructure, which creates a silo between the two teams and reduces their collaboration. Option A also does not assign any responsibility for adopting and sharing SLOs for the service, which means that both teams lack a common metric for measuring and improving the service quality. Option B is incorrect because it assigns only the Software Development team the responsibility of performing code reviews, which creates a gap between the two teams and reduces their feedback. Option B also does not assign any responsibility for adopting and sharing SLOs for the service, which means that both teams lack a common metric for measuring and improving the service quality. Option C is incorrect because it assigns both teams the same responsibilities as option A and option B, which combines their drawbacks.

[Reference:, 5 key organizational models for DevOps teams | GitLab, 5 key organizational models for DevOps teams | GitLab.Building a Culture of Full-Service Ownership - DevOps.com, Building a Culture of Full-Service Ownership - DevOps.com.GitLab, GitLab., , , , , ]

Questions 19

You are developing a Node.js utility on a workstation in Cloud Workstations by using Code OSS. The utility is a simple web page, and you have already confirmed that all necessary firewall rules are in place. You tested the application by starting it on port 3000 on your workstation in Cloud Workstations, but you need to be able to access the web page from your local machine. You need to follow Google-recommended security practices. What should you do?

Options:

Allow public IP addresses in the Cloud Workstations configuration.

Use a browser running on a bastion host VM.

Run the gcloud compute start-iap-tunnel command to the Cloud Workstations VM.

Click the preview link in the Code OSS panel.

Buy Now

Questions 20

You support a user-facing web application. When analyzing the application’s error budget over the previous six months, you notice that the application has never consumed more than 5% of its error budget in any given time window. You hold a Service Level Objective (SLO) review with business stakeholders and confirm that the SLO is set appropriately. You want your application’s SLO to more closely reflect its observed reliability. What steps can you take to further that goal while balancing velocity, reliability, and business needs? (Choose two.)

Options:

Add more serving capacity to all of your application’s zones.

Have more frequent or potentially risky application releases.

Tighten the SLO match the application’s observed reliability.

Implement and measure additional Service Level Indicators (SLIs) fro the application.

Announce planned downtime to consume more error budget, and ensure that users are not depending on a tighter SLO.

Buy Now

Questions 21

You are designing a continuous delivery (CD) strategy for a new serverless application. The application is packaged as a container image, stored in Artifact Registry, and deployed to Cloud Run. Your design requires a staging environment, a fully-managed Google Cloud service, mandatory manual approval for production deployments, and a phased rollout to production. Your solution should minimize administrative overhead. What should you do?

Options:

Use Cloud Deploy to define a single delivery pipeline that promotes a release between a staging target and a production target. Configure the production target to require approval and to automatically execute a phased rollout that incrementally shifts traffic.

Use a Cloud Build trigger to initiate a GitOps workflow. Configure the trigger to update a manifest in a Git repository, which a controller on a GKE Autopilot cluster then synchronizes to manage a phased traffic rollout to the new revision.

Use Cloud Build to create a multi-stage pipeline. Configure the trigger to require approval before starting the build. Use the deploy command with the --traffic flag to incrementally shift traffic to the new revision in production.

Define two separate Cloud Deploy pipelines. Configure the first pipeline to deploy to staging, and configure the second pipeline to trigger and execute a phased, canary rollout to the production Cloud Run service.

Buy Now

Questions 22

You are responsible for the reliability of a high-volume enterprise application. A large number of users report that an important subset of the application’s functionality – a data intensive reporting feature – is consistently failing with an HTTP 500 error. When you investigate your application’s dashboards, you notice a strong correlation between the failures and a metric that represents the size of an internal queue used for generating reports. You trace the failures to a reporting backend that is experiencing high I/O wait times. You quickly fix the issue by resizing the backend’s persistent disk (PD). How you need to create an availability Service Level Indicator (SLI) for the report generation feature. How would you define it?

Options:

As the I/O wait times aggregated across all report generation backends

As the proportion of report generation requests that result in a successful response

As the application’s report generation queue size compared to a known-good threshold

As the reporting backend PD throughout capacity compared to a known-good threshold

Buy Now

Questions 23

You are using Terraform to manage infrastructure as code within a Cl/CD pipeline You notice that multiple copies of the entire infrastructure stack exist in your Google Cloud project, and a new copy is created each time a change to the existing infrastructure is made You need to optimize your cloud spend by ensuring that only a single instance of your infrastructure stack exists at a time. You want to follow Google-recommended practices What should you do?

Options:

Create a new pipeline to delete old infrastructure stacks when they are no longer needed

Confirm that the pipeline is storing and retrieving the terraform. if state file from Cloud Storage with the Terraform gcs backend

Verify that the pipeline is storing and retrieving the terrafom.tfstat* file from a source control

Update the pipeline to remove any existing infrastructure before you apply the latest configuration

Buy Now

Questions 24

Your company runs an ecommerce website built with JVM-based applications and microservice architecture in Google Kubernetes Engine (GKE) The application load increases during the day and decreases during the night Your operations team has configured the application to run enough Pods to handle the evening peak load You want to automate scaling by only running enough Pods and nodes for the load What should you do?

Options:

Configure the Vertical Pod Autoscaler but keep the node pool size static

Configure the Vertical Pod Autoscaler and enable the cluster autoscaler

Configure the Horizontal Pod Autoscaler but keep the node pool size static

Configure the Horizontal Pod Autoscaler and enable the cluster autoscaler

Buy Now

Questions 25

Your company runs applications in Google Kubernetes Engine (GKE). Several applications rely on ephemeral volumes. You noticed some applications were unstable due to the DiskPressure node condition on the worker nodes. You need

to identify which Pods are causing the issue, but you do not have execute access to workloads and nodes. What should you do?

Options:

Check the node/ephemeral_storage/used_bytes metric by using Metrics Explorer.

Check the metric by using Metrics Explorer.

Locate all the Pods with emptyDir volumes. use the df-h command to measure volume disk usage.

Locate all the Pods with emptyDir volumes. Use the du -sh * command to measure volume disk usage.

Buy Now

Questions 26

You are running an experiment to see whether your users like a new feature of a web application. Shortly after deploying the feature as a canary release, you receive a spike in the number of 500 errors sent to users, and your monitoring reports show increased latency. You want to quickly minimize the negative impact on users. What should you do first?

Options:

Roll back the experimental canary release.

Start monitoring latency, traffic, errors, and saturation.

Record data for the postmortem document of the incident.

Trace the origin of 500 errors and the root cause of increased latency.

Buy Now

Questions 27

Your company runs services by using multiple globally distributed Google Kubernetes Engine (GKE) clusters Your operations team has set up workload monitoring that uses Prometheus-based tooling for metrics alerts: and generating dashboards This setup does not provide a method to view metrics globally across all clusters You need to implement a scalable solution to support global Prometheus querying and minimize management overhead What should you do?

Options:

Configure Prometheus cross-service federation for centralized data access

Configure workload metrics within Cloud Operations for GKE

Configure Prometheus hierarchical federation for centralized data access

Configure Google Cloud Managed Service for Prometheus

Buy Now

Questions 28

You are running an application in a virtual machine (VM) using a custom Debian image. The image has the Stackdriver Logging agent installed. The VM has the cloud-platform scope. The application is logging information via syslog. You want to use Stackdriver Logging in the Google Cloud Platform Console to visualize the logs. You notice that syslog is not showing up in the "All logs" dropdown list of the Logs Viewer. What is the first thing you should do?

Options:

Look for the agent's test log entry in the Logs Viewer.

Install the most recent version of the Stackdriver agent.

Verify the VM service account access scope includes the monitoring.write scope.

SSH to the VM and execute the following commands on your VM: ps ax I grep fluentd

Buy Now

Questions 29

You have deployed a fleet Of Compute Engine instances in Google Cloud. You need to ensure that monitoring metrics and logs for the instances are visible in Cloud Logging and Cloud Monitoring by your company's operations and cyber

security teams. You need to grant the required roles for the Compute Engine service account by using Identity and Access Management (IAM) while following the principle of least privilege. What should you do?

Options:

Grant the logging.editor and monitoring.metricwriter roles to the Compute Engine service accounts.

Grant the Logging. admin and monitoring . editor roles to the Compute Engine service accounts.

Grant the logging. logwriter and monitoring. editor roles to the Compute Engine service accounts.

Grant the logging. logWriter and monitoring. metricWriter roles to the Compute Engine service accounts.

Buy Now

Answer:

Explanation:

The correct answer is D. Grant the logging.logWriter and monitoring.metricWriter roles to the Compute Engine service accounts.

According to the Google Cloud documentation, the Compute Engine service account is a Google-managed service account that is automatically created when you enable the Compute Engine API1.This service account is used by default to run your Compute Engine instances and access other Google Cloud services on your behalf1.To ensure that monitoring metrics and logs for the instances are visible in Cloud Logging and Cloud Monitoring, you need to grant the following IAM roles to the Compute Engine service account23:

The logging.logWriter role allows the service account to write log entries to Cloud Logging4.

The monitoring.metricWriter role allows the service account to write custom metrics to Cloud Monitoring5.

These roles grant the minimum permissions that are needed for logging and monitoring, following the principle of least privilege. The other roles are either unnecessary or too broad for this purpose.For example, the logging.editor role grants permissions to create and update logs, log sinks, and log exclusions, which are not required for writing log entries6. The logging.admin role grants permissions to delete logs, log sinks, and log exclusions, which are not required for writing log entries and may pose a security risk if misused. The monitoring.editor role grants permissions to create and update alerting policies, uptime checks, notification channels, dashboards, and groups, which are not required for writing custom metrics.

[Reference:, Service accounts, Service accounts.Setting up Stackdriver Logging for Compute Engine, Setting up Stackdriver Logging for Compute Engine.Setting up Stackdriver Monitoring for Compute Engine, Setting up Stackdriver Monitoring for Compute Engine.Predefined roles, Predefined roles.Predefined roles, Predefined roles.Predefined roles, Predefined roles. [Predefined roles], Predefined roles. [Predefined roles], Predefined roles., , , , , ]

Questions 30

Your company is migrating its production systems to Google Cloud. You need to implement site reliability engineering (SRE) practices during the migration to minimize customer impact from potential future incidents. Which two SRE practices should you implement?

Choose 2 answers

Options:

Ensure that full autonomy and permissions are only granted to the on-call team.

Automate common tasks to analyze key impact information and intelligently suggest mitigating actions for the on-call team.

Ensure that all teams can modify the production environment to resolve issues.

Create an alerting mechanism for your SRE team based on your system's internal behavior.

Create up-to-date playbooks with instructions for debugging and mitigating issues.

Buy Now

Answer:

B, E

Explanation:

Comprehensive and Detailed Explanation From General SRE Principles and Google Cloud Knowledge:

Site Reliability Engineering (SRE) emphasizes reliability, automation, and a data-driven approach to operations. The goal is to minimize the "time to detect" (TTD) and "time to resolve" (TTR) for incidents.

Option A (Ensure that full autonomy and permissions are only granted to the on-call team): While the on-call team needs appropriate permissions to act decisively during an incident, granting full autonomy and only to them can be a bottleneck and goes against the principle of least privilege if not carefully scoped. Broader teams might need specific, controlled access for their responsibilities. SRE encourages empowering teams but within a structured framework.

Option B (Automate common tasks to analyze key impact information and intelligently suggest mitigating actions for the on-call team): This is a core SRE practice. Automation reduces toil, speeds up response, and ensures consistency. Analyzing impact and suggesting mitigations helps the on-call team resolve issues faster and more effectively.

Option C (Ensure that all teams can modify the production environment to resolve issues): This is generally a bad practice and against SRE principles of controlled changes and reducing the blast radius of errors. Production changes should be managed, audited, and ideally automated, not open to modification by all teams, as this increases the risk of unintended incidents.

Option D (Create an alerting mechanism for your SRE team based on your system's internal behavior): While alerting is crucial, SRE emphasizes alerting on symptoms that affect users (Service Level Objectives - SLOs) rather than just internal behavior or causes. Alerting solely on internal behavior can lead to alert fatigue and may not correlate directly with user impact. Good alerting focuses on user-facing impact first.

Option E (Create up-to-date playbooks with instructions for debugging and mitigating issues): Playbooks (or runbooks) are essential in SRE. They document known issues, troubleshooting steps, and mitigation procedures. Keeping them up-to-date ensures that on-call engineers can respond to incidents quickly and consistently, even for less common issues, thereby minimizing customer impact.

Therefore, automating incident response tasks (B) and maintaining clear, actionable playbooks (E) are two key SRE practices to implement for minimizing customer impact.

Reference (Based on SRE principles):

The SRE books by Google (e.g., "Site Reliability Engineering: How Google Runs Production Systems") heavily emphasize automation to reduce toil and the importance of playbooks for incident management.

Google Cloud SRE solutions: https://cloud.google.com/sre

Specifically, regarding playbooks and automation:"Playbooks should be living documents, updated regularly as systems change and new incidents provide new lessons."

"SREs aim to automate repetitive tasks (toil) to free up time for engineering projects that improve reliability."

Questions 31

You are running a web application that connects to an AlloyDB cluster by using a private IP address in your default VPC. You need to run a database schema migration in your CI/CD pipeline by using Cloud Build before deploying a new version of your application. You want to follow Google-recommended security practices. What should you do?

Options:

Set up a Cloud Build private pool to access the database through a static external IP address. Configure the database to only allow connections from this IP address. Execute the schema migration script in the private pool.

Create a service account that has permission to access the database. Configure Cloud Build to use this service account and execute the schema migration script in a private pool.

Add the database username and encrypted password to the application configuration file. Use these credentials in Cloud Build to execute the schema migration script.

Add the database username and password to Secret Manager. When running the schema migration script, retrieve the username and password from Secret Manager.

Buy Now

Answer:

Explanation:

To securely connect Cloud Build to an AlloyDB cluster using a private IP address and adhere to Google-recommended security practices, you need to address two main aspects:

Network Connectivity:Ensuring Cloud Build can reach the private IP of the AlloyDB cluster.

Authentication/Credential Management:Securely authenticating Cloud Build to the AlloyDB cluster.

Let's break down why Option B is the most suitable:

Cloud Build Private Pool:AlloyDB is accessed via a private IP in your VPC. Cloud Build's default build environment runs on Google-managed infrastructure outside your VPC and cannot directly access private IP addresses. To enable this, you must use aCloud Build private pool. A private pool can be configured with VPC peering to your default VPC, allowing build steps running within that pool to access resources like your AlloyDB cluster via their private IPs. Option B correctly includes "execute the schema migration script in a private pool."

Service Account with Permissions (IAM Database Authentication):AlloyDB supports IAM database authentication. This is a Google-recommended security practice because it allows you to manage database access using Google Cloud's Identity and Access Management (IAM) rather than relying on traditional database passwords.

You would create a dedicated service account for Cloud Build (or use the private pool's service account).

This service account would be granted the necessary IAM roles to connect to the AlloyDB instance (e.g., roles/alloydb.client) and a database-level IAM role for login (e.g., roles/alloydb.user or roles/alloydb.admin depending on the permissions needed for schema migration).

Cloud Build would then be configured to use this service account. The "permission to access the database" in Option B refers to these IAM permissions. This method avoids managing and distributing database passwords.

Analyzing the options:

A. Set up a Cloud Build private pool to access the database through a static external IP address...

While using a private pool is correct for network access, routing this through a staticexternalIP for a resource that has aprivateIP is generally not the first-choice secure pattern if direct private access is feasible. It adds complexity and a potential external exposure point, even if firewalled. The aim is to keep traffic within the private network as much as possible.

B. Create a service account that has permission to access the database. Configure Cloud Build to use this service account and execute the schema migration script in a private pool.

This option correctly combines the use of aprivate pool(for private IP network access) with aservice account having permissions(strongly implying IAM database authentication for AlloyDB, which is a best practice). This is a secure and robust approach.

C. Add the database username and encrypted password to the application configuration file...

Storing credentials, even if "encrypted" (the method and key management for encryption are unspecified and problematic), in application configuration files checked into source control or packaged with the application is a significant security risk and not a recommended practice.

D. Add the database username and password to Secret Manager. When running the schema migration script, retrieve the username and password from Secret Manager.

UsingSecret Managerto store database usernames and passwords is a Google-recommended practiceifyou are using password-based authentication. However, this optionalonedoes not solve the network connectivity issue for Cloud Build to reach the private IP of AlloyDB. You would still need a private pool. While D is good for secret management, B offers a more comprehensive solution that includes both the network aspect and implies a more modern authentication method (IAM database auth). If the question forced a choice between only doing secure credential storage (D) or doing IAM auth + private networking (B), B is more complete for the overall task.

Conclusion:Option B is the most aligned with Google-recommended security practices as it addresses both the necessary private network connectivity via a Cloud Build private pool and promotes the use of IAM-based database authentication for AlloyDB, which is generally preferred over managing passwords.

References (General Concepts):

Cloud Build Private Pools for VPC Access:Google Cloud documentation for Cloud Build explicitly details using private pools to connect to resources in a VPC network.

See:https://www.google.com/search?q=https://cloud.google.com/build/docs/private-pools/accessing-private-resources-with-private-pools

AlloyDB IAM Database Authentication:Google Cloud documentation for AlloyDB highlights IAM database authentication as a secure method.

See:https://www.google.com/search?q=https://cloud.google.com/alloydb/docs/iam-authentication

Secret Manager:If password authentication were the only option, Secret Manager would be the recommended way to store those credentials.

See:https://cloud.google.com/secret-manager

Option B synergizes the benefits of private networking and modern IAM-based authentication for a comprehensive secure solution.

Questions 32

Your organization is using Helm to package containerized applications Your applications reference both public and private charts Your security team flagged that using a public Helm repository as a dependency is a risk You want to manage all charts uniformly, with native access control and VPC Service Controls What should you do?

Options:

Store public and private charts in OCI format by using Artifact Registry

Store public and private charts by using GitHub Enterprise with Google Workspace as the identity provider

Store public and private charts by using Git repository Configure Cloud Build to synchronize contents of the repository into a Cloud Storage bucket Connect Helm to the bucket by using https: // [bucket] .srorage.googleapis.com/ [holnchart] as the Helm repository

Configure a Helm chart repository server to run in Google Kubernetes Engine (GKE) with Cloud Storage bucket as the storage backend

Buy Now

Questions 33

As a Site Reliability Engineer, you support an application written in GO that runs on Google Kubernetes Engine (GKE) in production. After releasing a new version Of the application, you notice the applicationruns for about 15 minutes and then restarts. You decide to add Cloud Profiler to your application and now notice that the heap usage grows constantly until the application restarts. What should you do?

Options:

Add high memory compute nodes to the cluster.

Increase the memory limit in the application deployment.

Add Cloud Trace to the application, and redeploy.

Increase the CPU limit in the application deployment.

Buy Now

Questions 34

Your company is using HTTPS requests to trigger a public Cloud Run-hosted service accessible at the https://booking-engine-abcdef .a.run.app URL You need to give developers the ability to test the latest revisions of the service before the service is exposed to customers What should you do?

Options:

Runthegcioud run deploy booking-engine —no-traffic —-ag dev command Use the https://dev----booking-engine-abcdef. a. run. app URL for testing

Runthegcioud run services update-traffic booking-engine —to-revisions LATEST*! command Use the ht tps: //booking-engine-abcdef. a. run. ape URL for testing

Pass the curl -K "Authorization: Hearer S(gclcud auth print-identity-token)" auth token Use the https: / /booking-engine-abcdef. a. run. app URL to test privately

Grant the roles/run. invoker role to the developers testing the booking-engine service Use the https: //booking-engine-abcdef. private. run. app URL for testing

Buy Now

Questions 35

You are using Stackdriver to monitor applications hosted on Google Cloud Platform (GCP). You recently deployed a new application, but its logs are not appearing on the Stackdriver dashboard.

You need to troubleshoot the issue. What should you do?

Options:

Confirm that the Stackdriver agent has been installed in the hosting virtual machine.

Confirm that your account has the proper permissions to use the Stackdriver dashboard.

Confirm that port 25 has been opened in the firewall to allow messages through to Stackdriver.

Confirm that the application is using the required client library and the service account key has proper permissions.

Buy Now

Questions 36

You need to introduce postmortems into your organization. You want to ensure that the postmortem process is well received. What should you do?

Choose 2 answers

Options:

Create a designated team that is responsible for conducting all postmortems.

Encourage new employees to conduct postmortems to learn through practice.

Ensure that writing effective postmortems is a rewarded and celebrated practice.

Encourage your senior leadership to acknowledge and participate in postmortems.

Provide your organization with a forum to critique previous postmortems.

Buy Now

Questions 37

Your company uses a CI/CD pipeline with Cloud Build and Artifact Registry to deploy container images to Google Kubernetes Engine (GKE). Images are tagged with the latest commit hash and promoted to production after successful testing in the development and pre-production environments. A recent production deployment caused the application to fail due to untested integration functionality, requiring a disruptive manual rollback. During the rollback, you noticed many old and unused container images accumulating in Artifact Registry. You need to improve rollout and rollback management and clean up the old container images. What should you do?

Options:

Adopt Cloud Deploy for managing deployments, and schedule a Cloud Build job for container image cleanup.

Deploy Cloud Service Mesh across the GKE clusters, and manually clean up Artifact Registry images.

Adopt Cloud Deploy for managing deployments, and implement an Artifact Registry cleanup policy.

Set up a rollback pipeline in Cloud Build, and implement an Artifact Registry cleanup policy.

Buy Now

Questions 38

Your team is designing a new application for deployment into Google Kubernetes Engine (GKE). You need to set up monitoring to collect and aggregate various application-level metrics in a centralized location. You want to use Google Cloud Platform services while minimizing the amount of work required to set up monitoring. What should you do?

Options:

Publish various metrics from the application directly to the Slackdriver Monitoring API, and then observe these custom metrics in Stackdriver.

Install the Cloud Pub/Sub client libraries, push various metrics from the application to various topics, and then observe the aggregated metrics in Stackdriver.

Install the OpenTelemetry client libraries in the application, configure Stackdriver as the export destination for the metrics, and then observe the application's metrics in Stackdriver.

Emit all metrics in the form of application-specific log messages, pass these messages from the containers to the Stackdriver logging collector, and then observe metrics in Stackdriver.

Buy Now

Questions 39

You recently migrated an ecommerce application to Google Cloud. You now need to prepare the application for the upcoming peak traffic season. You want to follow Google-recommended practices. What should you do first to prepare for the busy season?

Options:

Migrate the application to Cloud Run, and use autoscaling.

Load test the application to profile its performance for scaling.

Create a Terraform configuration for the application's underlying infrastructure to quickly deploy to additional regions.

Pre-provision the additional compute power that was used last season, and expect growth.

Buy Now

Answer:

Explanation:

The first thing you should do to prepare your ecommerce application for the upcoming peak traffic season is to load test the application to profile its performance for scaling. Load testing is a process of simulating high traffic or user demand on your application and measuring how it responds.Load testing can help you identify any bottlenecks, errors, or performance issues that might affect your application during the busy season1.Load testing can also help you determine the optimal scaling strategy for your application, such as horizontal scaling (adding more instances) or vertical scaling (adding more resources to each instance)2.

There are different tools and methods for load testing your ecommerce application on Google Cloud, depending on the type and complexity of your application.For example, you can use Cloud Load Balancing to distribute traffic across multiple instances of your application, and use Cloud Monitoring to measure the latency, throughput, and error rate of your application3.You can also use Cloud Functions or Cloud Run to create serverless load generators that can simulate user requests and send them to your application4. Alternatively, you can use third-party tools such as Apache JMeter or Locust to create and run load tests on your application.

By load testing your ecommerce application before the peak traffic season, you can ensure that your application is ready to handle the expected load and provide a good user experience. You can also use the results of your load tests to plan and implement other steps to prepare your application for the busy season, such as migrating to a more scalable platform, creating a Terraform configuration for deploying to additional regions, or pre-provisioning additional compute power.

Questions 40

You are working with a government agency that requires you to archive application logs for seven years. You need to configure Stackdriver to export and store the logs while minimizing costs of storage. What should you do?

Options:

Create a Cloud Storage bucket and develop your application to send logs directly to the bucket.

Develop an App Engine application that pulls the logs from Stackdriver and saves them in BigQuery.

Create an export in Stackdriver and configure Cloud Pub/Sub to store logs in permanent storage for seven years.

Create a sink in Stackdriver, name it, create a bucket on Cloud Storage for storing archived logs, and then select the bucket as the log export destination.

Buy Now

Questions 41

Your company recently migrated to Google Cloud. You need to design a fast, reliable, and repeatable solution for your company to provision new projects and basic resources in Google Cloud. What should you do?

Options:

Use the Google Cloud console to create projects.

Write a script by using the gcloud CLI that passes the appropriate parameters from the request. Save the script in a Git repository.

Write a Terraform module and save it in your source control repository. Copy and run the apply command to create the new project.

Use the Terraform repositories from the Cloud Foundation Toolkit. Apply the code with appropriate parameters to create the Google Cloud project and related resources.

Buy Now

Questions 42

Your development team has created a new version of their service’s API. You need to deploy the new versions of the API with the least disruption to third-party developers and end users of third-party installed applications. What should you do?

Options:

Introduce the new version of the API.Announce deprecation of the old version of the API.Deprecate the old version of the API.Contact remaining users of the old API.Provide best effort support to users of the old API.Turn down the old version of the API.

Announce deprecation of the old version of the API.Introduce the new version of the API.Contact remaining users on the old API.Deprecate the old version of the API.Turn down the old version of the API.Provide best effort support to users of the old API.

Announce deprecation of the old version of the API.Contact remaining users on the old API.Introduce the new version of the API.Deprecate the old version of the API.Provide best effort support to users of the old API.Turn down the old version of the API.

Introduce the new version of the API.Contact remaining users of the old API.Announce deprecation of the old version of the API.Deprecate the old version of the API.Turn down the old version of the API.Provide best effort support to users of the old API.

Buy Now

Questions 43

You need to run a business-critical workload on a fixed set of Compute Engine instances for several months. The workload is stable with the exact amount of resources allocated to it. You want to lower the costs for this workload without any performance implications. What should you do?

Options:

Purchase Committed Use Discounts.

Migrate the instances to a Managed Instance Group.

Convert the instances to preemptible virtual machines.

Create an Unmanaged Instance Group for the instances used to run the workload.

Buy Now

Questions 44

You need to create a Cloud Monitoring SLO for a service that will be published soon. You want to verify that requests to the service will be addressed in fewer than 300 ms at least 90% Of the time per calendar month. You need to identify the metric and evaluation method to use. What should you do?

Options:

Select a latency metric for a request-based method of evaluation.

Select a latency metric for a window-based method of evaluation.

Select an availability metric for a request-based method of evaluation.

Select an availability metric for a window-based method Of evaluation.

Buy Now

Questions 45

You need to define Service Level Objectives (SLOs) for a high-traffic multi-region web application. Customers expect the application to always be available and have fast response times. Customers are currently happy with the application performance and availability. Based on current measurement, you observe that the 90th percentile of latency is 120ms and the 95th percentile of latency is 275ms over a 28-day window. What latency SLO would you recommend to the team to publish?

Options:

90th percentile – 100ms95th percentile – 250ms

90th percentile – 120ms95th percentile – 275ms

90th percentile – 150ms95th percentile – 300ms

90th percentile – 250ms95th percentile – 400ms

Buy Now

Questions 46

You are reviewing your deployment pipeline in Google Cloud Deploy You must reduce toil in the pipeline and you want to minimize the amount of time it takes to complete an end-to-end deployment What should you do?

Choose 2 answers

Options:

Create a trigger to notify the required team to complete the next step when manual intervention is required

Divide the automation steps into smaller tasks

Use a script to automate the creation of the deployment pipeline in Google Cloud Deploy

Add more engineers to finish the manual steps.

Automate promotion approvals from the development environment to the test environment

Buy Now

Questions 47

You are configuring a CI pipeline. The build step for your CI pipeline integration testing requires access to APIs inside your private VPC network. Your security team requires that you do not expose API traffic publicly. You need to implement a solution that minimizes management overhead. What should you do?

Options:

Use Cloud Build private pools to connect to the private VPC.

Use Cloud Build to create a Compute Engine instance in the private VPC. Run the integration tests on the VM by using a startup script.

Use Cloud Build as a pipeline runner. Configure a cross-region internal Application Load Balancer for API access.

Use Cloud Build as a pipeline runner. Configure a global external Application Load Balancer with a Google Cloud Armor policy for API access.

Buy Now

Questions 48

You are managing an application that exposes an HTTP endpoint without using a load balancer. The latency of the HTTP responses is important for the user experience. You want to understand what HTTP latencies all of your users are experiencing. You use Stackdriver Monitoring. What should you do?

Options:

• In your application, create a metric with a metricKind set to DELTA and a valueType set to DOUBLE.• In Stackdriver's Metrics Explorer, use a Slacked Bar graph to visualize the metric.

• In your application, create a metric with a metricKind set to CUMULATIVE and a valueType set to DOUBLE.• In Stackdriver's Metrics Explorer, use a Line graph to visualize the metric.

• In your application, create a metric with a metricKind set to gauge and a valueType set to distribution.• In Stackdriver's Metrics Explorer, use a Heatmap graph to visualize the metric.

• In your application, create a metric with a metricKind. set toMETRlc_KIND_UNSPECIFIEDanda valueType set to INT64.• In Stackdriver's Metrics Explorer, use a Stacked Area graph to visualize the metric.

Buy Now

Questions 49

Your company allows teams to self-manage Google Cloud projects, including project-level Identity and Access Management (IAM). You are concerned that the team responsible for the Shared VPC project might accidentally delete the project, so a lien has been placed on the project. You need to design a solution to restrict Shared VPC project deletion to those with the resourcemanager.projects.updateLiens permission at the organization level. What should you do?

Options:

Enable VPC Service Controls for the container.googleapis.com API service.

Revoke the resourcemanager.projects.updateLiens permission from all users associated with the project.

Enable the compute.restrictXpnProjectLienRemoval organization policy constraint.

Instruct teams to only perform IAM permission management as code with Terraform.

Buy Now

Answer:

Explanation:

Comprehensive and Detailed Explanation From General Google Cloud IAM and Organization Policy Knowledge:

The core requirement is to prevent accidental deletion of a Shared VPC host project, even by project owners, by ensuring that only users with a specific permission at the organization level can remove the lien that protects the project.

A lien (resourcemanager.projects.delete) has already been placed on the project. This prevents its deletion. The challenge is to prevent the removal of this lien by project-level administrators.

The permission to remove a lien is resourcemanager.projectLiens.update (or resourcemanager.projects.updateLiens as stated in the question, which implies a broader update capability including liens).

Option A (Enable VPC Service Controls for the container.googleapis.com API service): VPC Service Controls are for data exfiltration prevention by creating service perimeters. They do not directly control IAM permissions for lien management or project deletion.

Option B (Revoke the resourcemanager.projects.updateLiens permission from all users associated with the project): While this would prevent project-level users from removing the lien, it doesn't enforce therequirement that only users with this permission at the organization level can remove it. A project owner could potentially re-grant themselves this permission at the project level if not otherwise restricted. The goal is a stronger, centrally enforced restriction.

Option C (Enable the compute.restrictXpnProjectLienRemoval organization policy constraint): This is specifically designed for the scenario described.Organization Policies allow centralized control over resource configurations across the organization.

The compute.restrictXpnProjectLienRemoval constraint, when enforced (set to True), restricts the removal of liens on Shared VPC host projects. Only users who have the resourcemanager.projectLiens.update permission (or resourcemanager.projects.updateLiens) granted at the organization level can then remove such liens. This prevents project owners or other project-level principals from removing the lien unless they also have this specific permission at the org level.

Option D (Instruct teams to only perform IAM permission management as code with Terraform): While Infrastructure as Code (IaC) is a good practice for managing IAM, it's an operational guideline and doesn't technically enforce the restriction on lien removal. A user with sufficient project-level IAM permissions could still manually remove the lien via the console or gcloud if not prevented by an organization policy.

Therefore, enabling the compute.restrictXpnProjectLienRemoval organization policy is the direct and most effective way to meet the requirement.

Reference (Based on Google Cloud Organization Policy and Shared VPC documentation):

Google Cloud documentation on Resource Manager Liens: https://cloud.google.com/resource-manager/docs/project-liens

Google Cloud documentation on Organization Policy Constraints: https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints

Specifically, the compute.restrictXpnProjectLienRemoval constraint: "When set to true, liens on Shared VPC host projects can only be removed by users that have resourcemanager.projectLiens.update permission on the organization." (or similar wording indicating org-level permission is required). This constraint ensures that the protection afforded by the lien on a critical Shared VPC host project cannot be easily circumvented at the project level.

Questions 50

You manage several production systems that run on Compute Engine in the same Google Cloud Platform (GCP) project. Each system has its own set of dedicated Compute Engine instances. You want to know how must it costs to run each of the systems. What should you do?

Options:

In the Google Cloud Platform Console, use the Cost Breakdown section to visualize the costs per system.

Assign all instances a label specific to the system they run. Configure BigQuery billing export and query costs per label.

Enrich all instances with metadata specific to the system they run. Configure Stackdriver Logging to export to BigQuery, and query costs based on the metadata.

Name each virtual machine (VM) after the system it runs. Set up a usage report export to a Cloud Storage bucket. Configure the bucket as a source in BigQuery to query costs based on VM name.

Buy Now

Questions 51

You are the Operations Lead for an ongoing incident with one of your services. The service usually runs at around 70% capacity. You notice that one node is returning 5xx errors for all requests. There has also been a noticeable increase in support cases from customers. You need to remove the offending node from the load balancer pool so that you can isolate and investigate the node. You want to follow Google-recommended practices to manage the incident and reduce the impact on users. What should you do?

Options:

1. Communicate your intent to the incident team.2. Perform a load analysis to determine if the remaining nodes can handle the increase in traffic offloaded from the removed node, and scale appropriately.3. When any new nodes report healthy, drain traffic from the unhealthy node, and remove the unhealthy node from service.

1. Communicate your intent to the incident team.2. Add a new node to the pool, and wait for the new node to report as healthy.3. When traffic is being served on the new node, drain traffic from the unhealthy node, and remove the old node from service.

1 . Drain traffic from the unhealthy node and remove the node from service.2. Monitor traffic to ensure that the error is resolved and that the other nodes in the pool are handling the traffic appropriately.3. Scale the pool as necessary to handle the new load.4. Communicate your actions to the incident team.

1 . Drain traffic from the unhealthy node and remove the old node from service.2. Add a new node to the pool, wait for the new node to report as healthy, and then serve traffic to the new node.3. Monitor traffic to ensure that the pool is healthy and is handling traffic appropriately.4. Communicate your actions to the incident team.

Buy Now

Answer:

Explanation:

The correct answer is A. Communicate your intent to the incident team. Perform a load analysis to determine if the remaining nodes can handle the increase in traffic offloaded from the removed node, and scale appropriately. When any new nodes report healthy, drain traffic from the unhealthy node, and remove the unhealthy node from service.

This answer follows the Google-recommended practices for incident management, as described in the Chapter 9 - Incident Response, Google SRE Book1. According to this source, some of the best practices are:

Maintain a clear line of command. Designate clearly defined roles. Keep a working record of debugging and mitigation as you go. Declare incidents early and often.

Communicate your intent before taking any action that might affect the service or the incident response. This helps to avoid confusion, duplication of work, or unintended consequences.

Perform a load analysis before removing a node from the load balancer pool, as this might affect the capacity and performance of the service. Scale the pool as necessary to handle the expected load.

Drain traffic from the unhealthy node before removing it from service, as this helps to avoid dropping requests or causing errors for users.

Answer A follows these best practices by communicating the intent to the incident team, performing a load analysis and scaling the pool, and draining traffic from the unhealthy node before removing it.

Answer B does not follow the best practice of performing a load analysis before adding or removing nodes, as this might cause overloading or underutilization of resources.

Answer C does not follow the best practice of communicating the intent before taking any action, as this might cause confusion or conflict with other responders.

Answer D does not follow the best practice of draining traffic from the unhealthy node before removing it, as this might cause errors for users.

[References:, 1:Chapter 9 - Incident Response, Google SRE Book, , , , , ]

Questions 52

Your organization is running multiple Google Kubernetes Engine (GKE) clusters in a project. You need to design a highly-available solution to collect and query both domain-specific workload metrics and GKE default metrics across all clusters, while minimizing operational overhead. What should you do?

Options:

Use Prometheus Operator to install Prometheus in every cluster and scrape the metrics. Ensure that a Thanos sidecar is enabled on every Prometheus instance. Configure Thanos in the central cluster. Query the central Thanos instance.

Use Prometheus Operator to install Prometheus in every cluster and scrape the metrics. Configure remote-write to one central Prometheus. Query the central Prometheus instance.

Enable managed collection on every GKE cluster. Query the metrics in Cloud Monitoring.

Enable managed collection on every GKE cluster. Query the metrics in BigQuery.

Buy Now

Questions 53

You currently store the virtual machine (VM) utilization logs in Stackdriver. You need to provide an easy-to-share interactive VM utilization dashboard that is updated in real time and contains information aggregated on a quarterly basis. You want to use Google Cloud Platform solutions. What should you do?

Options:

1. Export VM utilization logs from Stackdriver to BigOuery.2. Create a dashboard in Data Studio.3. Share the dashboard with your stakeholders.

1. Export VM utilization logs from Stackdriver to Cloud Pub/Sub.2. From Cloud Pub/Sub, send the logs to a Security Information and Event Management (SIEM) system.3. Build the dashboards in the SIEM system and share with your stakeholders.

1. Export VM utilization logs (rom Stackdriver to BigQuery.2. From BigQuery. export the logs to a CSV file.3. Import the CSV file into Google Sheets.4. Build a dashboard in Google Sheets and share it with your stakeholders.

1. Export VM utilization logs from Stackdriver to a Cloud Storage bucket.2. Enable the Cloud Storage API to pull the logs programmatically.3. Build a custom data visualization application.4. Display the pulled logs in a custom dashboard.

Buy Now

Questions 54

You are building and deploying a microservice on Cloud Run for your organization Your service is used by many applications internally You are deploying a new release, and you need to test the new version extensively in the staging and production environments You must minimize user and developer impact. What should you do?

Options:

Deploy the new version of the service to the staging environment Split the traffic, and allow 1 % of traffic through to the latest version Test the latest version If the test passes gradually roll out the latest version to the staging and production environments

Deploy the new version of the service to the staging environment Split the traffic, and allow 50% of traffic through to the latest version Test the latest version If the test passes, send all traffic to the latest version Repeat for the production environment

Deploy the new version of the service to the staging environment with a new-release tag without serving traffic Test the new-release version If the test passes; gradually roll out this tagged version Repeat for the production environment

Deploy a new environment with the green tag to use as the staging environment Deploy the new version of the service to the green environment and test the new version If the tests pass, send all traffic to the green environment and delete the existing staging environment Repeat for the production environment

Buy Now

Questions 55

You support a Node.js application running on Google Kubernetes Engine (GKE) in production. The application makes several HTTP requests to dependent applications. You want to anticipate which dependent applications might cause performance issues. What should you do?

Options:

Instrument all applications with Stackdriver Profiler.

Instrument all applications with Stackdriver Trace and review inter-service HTTP requests.

Use Stackdriver Debugger to review the execution of logic within each application to instrument all applications.

Modify the Node.js application to log HTTP request and response times to dependent applications. Use Stackdriver Logging to find dependent applications that are performing poorly.

Buy Now

Questions 56

You manage your company's primary revenue-generating application. You have an error budget policy in place that freezes production deployments when the application is close to breaching its SLO. A number of issues have recently occurred, and the application has exhausted its error budget. You need to deploy a new release to the application that includes a feature urgently required by your largest customer. You have been told that the release has passed all unit tests. What should you do?

Options:

Start the deployment of the feature immediately.

Delay the deployment of the feature until the error budget is replenished.

Re-run the unit tests, and start the deployment of the feature if the tests pass.

Deploy the feature to a subset of users, and gradually roll out to all users if there are no errors reported.

Buy Now

Answer:

Explanation:

Comprehensive and Detailed Explanation From SRE Principles:

This scenario presents a classic SRE conflict: maintaining reliability (as dictated by the exhausted error budget and deployment freeze) versus delivering an urgent business requirement. The error budget policy is there for a reason – to protect users from further instability.

A. Start the deployment of the feature immediately: This directly violates the established error budget policy and the deployment freeze. While the feature is urgent, deploying without caution when the system is already unstable (as indicated by the exhausted error budget) is highly risky and could exacerbate existing problems or introduce new ones, further impacting revenue and customer trust.

B. Delay the deployment of the feature until the error budget is replenished: This strictly adheres to the policy but might not be acceptable given the "urgently required by your largest customer" clause. SRE principles allow for reasoned exceptions and risk management, not just blind adherence if the business context is compelling enough and risks are managed.

C. Re-run the unit tests, and start the deployment of the feature if the tests pass: Unit tests are foundational but insufficient to guarantee a complex application will perform reliably in production, especially when the system is already indicating instability (exhausted error budget). Passing unit tests doesn't negate the risk signaled by the depleted error budget.

D. Deploy the feature to a subset of users, and gradually roll out to all users if there are no errors reported: This is the most balanced SRE approach in this situation. It acknowledges the urgency while attempting to mitigate risk:Risk Mitigation: A canary release (deploying to a small subset of users) limits the potential negative impact if the new feature introduces new errors or worsens existing instability.

Observation: It allows for careful monitoring of the new release in the production environment with real users.

Data-Driven Decision: The decision to proceed with a wider rollout is based on observed behavior ("if there are no errors reported"), not just assumptions.

Controlled Rollout: A gradual rollout allows for quick rollback if issues arise.

While an exhausted error budget signals a deployment freeze, critical business needs can sometimes necessitate a carefully managed exception. A canary release is a standard SRE technique for deploying changes with reduced risk, making it the most appropriate course of action when faced with such conflicting priorities. The team would also need to communicate clearly about the risks and the rationale for this exception. It's implied that this urgent feature might also fix existing issues or is critical enough to warrant the carefully managed risk.

Reference (Based on SRE principles from Google's SRE books and general practices):

Error Budgets: "The SRE Book" (Site Reliability Engineering: How Google Runs Production Systems) discusses error budgets and deployment freezes. An exhausted error budget typically means no more risky changes until reliability improves.

Canary Releases: This is a fundamental practice for safely deploying new versions. It's about testing in production with a small percentage of traffic.

Managing Risk: SRE is about managing risk, not eliminating it entirely. In situations like this, a calculated risk with strong mitigation (canary, monitoring, rollback plan) can be justified for critical business needs. The decision involves weighing the risk of deploying against the risk of not deploying the urgent feature.

Option D represents a pragmatic SRE approach to navigate this difficult situation by minimizing the blast radius of the change.

Questions 57

You are managing the production deployment to a set of Google Kubernetes Engine (GKE) clusters. You want to make sure only images which are successfully built by your trusted CI/CD pipeline are deployed to production. What should you do?

Options:

Enable Cloud Security Scanner on the clusters.

Enable Vulnerability Analysis on the Container Registry.

Set up the Kubernetes Engine clusters as private clusters.

Set up the Kubernetes Engine clusters with Binary Authorization.

Buy Now

Questions 58

You are running a real-time gaming application on Compute Engine that has a production and testing environment. Each environment has their own Virtual Private Cloud (VPC) network. The application frontend and backend servers are located on different subnets in the environment's VPC. You suspect there is a malicious process communicating intermittently in your production frontend servers. You want to ensure that network traffic is captured for analysis. What should you do?

Options:

Enable VPC Flow Logs on the production VPC network frontend and backend subnets only with a sample volume scale of 0.5.

Enable VPC Flow Logs on the production VPC network frontend and backend subnets only with a sample volume scale of 1.0.

Enable VPC Flow Logs on the testing and production VPC network frontend and backend subnets with a volume scale of 0.5. Apply changes intesting before production.

Enable VPC Flow Logs on the testing and production VPC network frontend and backend subnets with a volume scale of 1.0. Apply changes in testing before production.

Buy Now

Questions 59

You are investigating issues in your production application that runs on Google Kubernetes Engine (GKE). You determined that the source Of the issue is a recently updated container image, although the exact change in code was not identified. The deployment is currently pointing to the latest tag. You need to update your cluster to run a version of the container that functions as intended. What should you do?

Options:

Create a new tag called stable that points to the previously working container, and change the deployment to point to the new tag.

Apply the latest tag to the previous container image, and do a rolling update on the deployment.

Build a new container from a previous Git tag, and do a rolling update on the deployment to the new container.

Alter the deployment to point to the sha2 56 digest of the previously working container.

Buy Now

Questions 60

You are using Jenkins to orchestrate your CI/CD pipelines deploying your applications to GKE and on-premises VMs. After expanding your on-premises infrastructure, all resource utilization is normal, but you notice that initiating on-premises deployments is taking longer than expected. You need to accelerate the initiation of on-premises deployments. What should you do?

Options:

Implement a blue-green deployment strategy for your on-premises VMs.

Upgrade your on-premises VMs to have faster CPUs and more memory.

Increase the number of Jenkins executor threads.

Configure Jenkins to use an artifact repository in your on-premises environment.

Buy Now

Exam Code: Professional-Cloud-DevOps-Engineer

Exam Name: Google Cloud Certified - Professional Cloud DevOps Engineer Exam

Last Update: Mar 31, 2026

Questions: 201

Professional-Cloud-DevOps-Engineer PDF

$25.5 ~~$84.99~~

Add to Cart

Professional-Cloud-DevOps-Engineer Engine

Professional-Cloud-DevOps-Engineer Testing Engine

$30 ~~$99.99~~

Add to Cart

Professional-Cloud-DevOps-Engineer PDF + Engine

Professional-Cloud-DevOps-Engineer PDF + Testing Engine

$40.5 ~~$134.99~~

Add to Cart

Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: clap70

clapgeek logo

Professional-Cloud-DevOps-Engineer Google Cloud Certified - Professional Cloud DevOps Engineer Exam Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation: