A cook has to clean their kitchen at some point, right? It’s not just to humor the health inspection but also to keep things going as fluent and hygienic as possible. In the world of software engineering, this is no different: you’ll want to make sure that when you start your day, your pots and pans are clean.
In this tutorial, we’ll craft a low-cost, cloud-native tool to keep your Google Cloud projects shiny. And what’s more, after completing this, you’ll be able to automate many more tasks using the same toolset!
You can find a ready-to-go version of this setup called ZUNA (Zap Unused and Non-permanent Assets) on GitHub.
When using cloud services, you can create new assets in a breeze. Even when your project is fully terraformed, you may still encounter some dirty footprints in your environment. Maybe it was that one time you quickly had to verify something by creating a Cloud SQL instance, or those cleanup scripts that occasionally fail when the integration tests go crazy.
Indeed, a system can fail at any step: What if the instance running the tests breaks down? What if an unexpected exception occurs? What if the network is down? Any such failure can lead to resources not being cleaned up. In the end, all these dangling resources will cost you: either in direct resource cost, or in the form of toil1.
I do recognize that resources not being cleaned up might be the last thing on your mind when a production setup fails. Nevertheless, it’s still an essential aspect of maintaining a healthy environment, whether for development or production purposes. But don’t let this keep you from building auto-healing production setups!
We will create a system responsible for the automatic cleanup of specific resources in a GCP project. We can translate this into the following task: check periodically for labeled resources, and remove them.
Ideally, the system is quick to set up, flexible, and low-cost. By the end of this post, our setup will look as follows:
We will use the following GCP services to achieve this:
Using these services will quickly get you up and running while allowing multiple resources to be added later on. Moreover, as you’ll see later in this tutorial, this entire solution costs less than $1 per month.
gcloudcommand) installed (you can also use Cloud Shell)
We’ll chop this up into multiple steps:
First, we create a topic and a subscription so we have something to clean up.
We’ll attach the label
autodelete: true, so our script can automatically detect
which resources are up for removal:
When you list the resources, you should see the labels:
When you go to the cloud console, you should see the label appear on your newly created Pub/Sub subscription:
Alright, we now have a resource that is up for deletion! When working with real resources, you can either label them manually or let your resource provisioning script take care of this. Next up: making sure we have permissions to delete these resources.
To facilitate development later on, it’s best to work with a Service Account from the get-go. This account will be bound to your script when it executes and will provide it with the correct permissions to manage (in our case, delete) the resources.
These commands create a service account that lives in your project (identified by
Next, it crafts a public-private key pair
of which the private part is downloaded into the file
sa-key.json. This file can now
be used to authenticate your script, as we will see in the next section.
First, let’s make sure that we have the correct permissions to list and remove subscriptions.
Create the following file called
Next, execute the following script:
The script creates a new role, specifically for our application ZUNA, with the two permissions we need. The role definition lives inside our project (this is important when referencing the role). Next, the role is assigned to the service account on a project level. This means that the permissions apply to all the subscription resources that live inside our project.
It is time to remove our resource using a Python script! You can quickly setup a Python 3 virtual environment as follows:
Now you can create a python file
clean_subscriptions.py with the following contents:
Conceptually, the following happens:
Note that the actual removal is still disabled for safety reasons.
You can enable it by setting the last line to
You can run the script as follows:
Because we make use of Google’s Python client library, we can pass in our service account using the
GOOGLE_APPLICATION_CREDENTIALS environment variable. The script will then automatically
inherit the roles/permissions we assigned to the service account.
The output of the script should resemble the following:
That’s correct: only one of our two subscriptions is up for removal. Now let’s move this to GCP!
We can easily wrap the previous section’s script in a Cloud Function. A Cloud Function is a piece of code that can be triggered using an HTTP endpoint or a Pub/Sub message. We’ll choose the latter as Cloud Scheduler can directly post messages to Pub/Sub: an ideal combination!
This code should be placed in
main.py and is a simple wrapper for our function.
You can test it locally by running
You’ll notice from the output that our function from the previous step is executed;
we also reserved some space for future resources (the
# TODO lines).
The additional function
app_zuna will be the Cloud Function’s entry point.
Currently, it just prints the payload it receives from Pub/Sub and subsequently
calls the cleanup function. This makes it behave similar to the local execution.
Deploying can be done with the following script:
Several important highlights:
app_zunato make sure this function is called when the Cloud Function is hit
When you run this script,
gcloud will package up your local resources and send them to the cloud.
Note that you can exclude specific resources using the
.gcloudignore file, which is created when you run the command the first time.
When the upload completes, a Cloud Function instance is created that will run your code for every message that appears in the
In case you get an error that resembles
Cloud Functions API has not been used in project ... before or it is disabled.
you still need to enable the Cloud Functions API in your project.
This can easily be done with the following commands
or using the Cloud Console (see this documentation):
gcloud services enable cloudfunctions.googleapis.com
gcloud services enable cloudbuild.googleapis.com
You can easily test the cloud function by sending a message to the newly created Pub/Sub topic:
Or in the Cloud Console using the “Publish Message” option directly on the topic:
You can view the logs using
Or in the Cloud Console:
Notice how our print statements appear in the output: the Pub/Sub message payload is logged, as well as the informational messages about which subscriptions have been deleted.
We now have a fully functioning cleanup system; the only missing piece is automation. For this, we employ Cloud Scheduler, which is a managed cron-service, for those of you who are familiar with it.
In this script, we create a new scheduled “job” that will publish the specified message to the Pub/Sub topic that our Cloud Function is listening to.
Note that the timezone is set specifically for my use case.
Omitting this would make it default to
Hence, you might want to change this to accommodate your needs.
TZ database names on this page
should be used.
When creating the job, you might get a message that your project does not have an App Engine app yet. You should create one before continuing2, but make sure you choose the correct region.
Your output of the Cloud Scheduler job creation should look like this:
Every Friday, this scheduler will trigger. But, we get the additional benefit of manual triggering.
Option 1 is the following
Option 2 is via the UI, where we get a nice
RUN NOW button:
Both options are great when you’d like to perform that occasional manual cleanup. After execution, you should see the new output in your Cloud Function’s logs.
Well, when you’re done testing this, you should cleanup, right?
The following script contains the
to cleanup the resources that were created above:
That’s it. You’re all done now! Go and enjoy the time you gained, or continue reading to find out how much this setup will cost you.
At the beginning of this tutorial, we stated that we chose these specific services to help keep the costs low. Let’s investigate the cost model to verify this is the case.
256MB, which we can tune down to
128MBusing the deployment flag
--memory=128. This adjustment will make every invocation even cheaper. [source]
Hence, even for a setup where the free tiers are not applicable anymore, we don’t expect a cost that is higher than $0.20 per month.
Adding more resource types would definitely be useful. Checkout the ZUNA repository3 for hooks to cleanup Dataflow jobs, Pub/Sub topics, BigQuery datasets & tables, etc.
You could check when certain resources were created and build in an expiration time. This will significantly reduce the risk of interfering with test runs. It’s also good to know that some services have expiration built-in4
Terraforming this setup is also a good idea. In that case, it could automatically be part of your deployment pipeline.
We’ve set up a Cloud Function that scans your current project
for resources labeled with
autodelete: true and removes them.
The Cloud Function only has limited permissions and
is triggered periodically using Cloud Scheduler
and a Pub/Sub topic.
We succeeded in building an automatic system that we can also trigger manually. It’s flexible as we can easily add code to clean up other types of resources. It was quick to set up, especially since we kept the bash scripts (although terraforming it would be nice). The cost is low as the service usage is minimal, and all services use a pay-as-you-go model. Using this setup will probably keep you in the free tier.
Finally, since the components that we used are generic, the resulting setup translates directly into a helpful blueprint for countless other automation use cases.
It seems Cloud Scheduler is still tied to App Engine, hence the requirement to have an App Engine app. ↩︎