Caching Agents

Clouddriver Caching Agents in Spinnaker Clouddriver caching agents discover your infrastructure components and cache data for Spinnaker™ to use.

How Spinnaker discovers and caches infrastructure elements

Clouddriver is a Spinnaker service that discovers and caches cloud provider infrastructures like AWS EC2 instances, Kubernetes pods, and Docker images. A caching agent scheduler in Clouddriver runs caching agents for providers in separate threads at scheduled intervals. These agents perform infrastructure queries, index data, and save the results in a cache store, which is usually a Redis or SQL datastore.

Caching agent

Caching agents look for resources in your cloud infrastructure and store the results in a cache-store. Spinnaker creates caching agents per account and, in some cases, per region for each provider. Each caching agent specializes in a specific type of resources, such as server groups, load balancers, security groups, or instances.

The number of caching agents varies greatly between providers and Clouddriver configurations. For example, AWS may have 16 to 20 agents per region performing tasks like caching the status of IAM roles, instances, and VPCs. AWS may also have some agents operating on a global scale to perform tasks such as cleaning up detached instances. Kubernetes, on the other hand, may have a few agents per cluster that cache custom resources and Kubernetes manifests.

Cache store

Clouddriver stores cloud resources in the cache-store. You can use a variety of types:

  • Redis - The default and most widely used implementation.

  • SQL - It is a recommended database store. How to configure Clouddriver to use a SQL database, refer here.

  • In memory - This is a memory cache that is not used for actual Spinnaker deployments.

This single cache store is updated by one or more Clouddriver instances.

Caching agent scheduler

The caching agent scheduler runs caching agents across all Clouddriver instances at regular intervals. The multiple types of schedulers are:

  • Redis-backed scheduler: It locks agents by reading/writing a key to Redis.

  • Redis-backed sort scheduler: It locks agents by reading/writing a key to Redis and manages the execution order of agents.

  • SQL-backed scheduler: It locks agents by inserting a row in a table with a unique constraint - Inefficient; other schedulers are preferable.

  • Default scheduler: It doesn’t lock. Do not use this if you intend to run more than one Clouddriver instance.

The cache store does not dictate the type of agent scheduler. For instance, you could use the SQL cache-store along with the Redis-backed scheduler.

If you read the Clouddriver source code, you see references to cats (Cache All The Stuff), which is the framework that manages agent scheduler, agents, and cache store.

How do the cache-store, scheduler, and agent work together?

When Clouddriver starts, it inspects its configuration and instantiates the cache store and the agent scheduler. Clouddriver instantiates agents per account and region for each enabled provider and adds them to the scheduler. When the scheduler runs, Clouddriver contacts agents that aren't running on their own instance. The scheduler attempts to obtain a lock on the agent type/account/region to ensure that only a single Clouddriver instance caches a given resource at any given time. If Clouddriver obtains a lock, the agent will run in its own thread. When the agent completes, Clouddriver updates the lock Time To Live (TTL) to correspond to the next desired execution time.

On-demand caching agents

Most cloud mutating operations are not synchronous. For example, when Clouddriver sends a request to AWS to launch a new EC2 instance, the API call returns successfully but the EC2 instance takes a while to be ready. This happens when Spinnaker uses on-demand caching agents.

On-demand caching agents are, as their name implies, created on demand by the client (Orca) in tasks such as Force Cache Refresh or Wait for Up Instances. They are used to ensure cache freshness and keep things up to date when a resource is created or effectively deleted.

When using a cache store like Redis that works across multiple Clouddriver instances, Clouddriver waits for the next regular caching agent of the same type to run before declaring the cache consistent. It gives the cache store one more chance to replicate its state to other replicas in the case of Redis.

Last updated