Cost Categories, Infrastructure and Tags

Looking at a minimum set of tags for the categorization of costs within infrastructure

Table Of Contents

Today I Explained

Tags are an extremely useful mechanism within AWS for managing both permissions, categorization, associations & relationships within infrastructure. As IAM supports policies that permit or restrict permissions based on the presence of a tag, it can be tempting to make use of it. However, this can be a risky approach as it runs the risk of creating IAM policies that are difficult to make attestations about.

When thinking about tags, it can be a good approach to consider what decisions were made to argue for applying a tag to a resource. The most common of these is traceability. The ability to make assertions that a resource belongs to a certain deployment, is provisioned using a specific module, or a resource that has runbooks associated with it for the management of secrets or rotation of credentials. Having these properties made available as tags makes it easier to discover connections both programmatically, and during operations.

As a starting point, the first step of enabling traceab ility is just some sort of reference between the application and infrastructure:

[ Resource ]
     |
     |
     V
  [ App ]
 - Runbooks
 - Documentation
 - Diagrams

The introduction of a single tag allows us to trace back properties about the infrastructure such as:

  • What is the purpose of this resource in the architecture?
  • How does data flow to & from this resource?
  • What kind of operations actions are performed with this resource?
  • Where does this resource fit within the broader picture of the system?
  • Who is responsible for the resource?
  • Where do I allocate costs for this resource?

Although these properties aren’t directly made available on the resource, the inclusion of a tag that allows connecting the resource to an application makes tihs possible. With this “application” tag, it is now possible to connect the resource to an internal content management tool (like a wiki). The first challenge this design encounters is how to handle multiple deployments of the same application. In this case, multiple copies of a similar resource can exist for an application, but they are considered distinct.

[ Resource ]    [ Resources ]
     |               |
     |               |
     -----------------
              |
           [ App ]

This makes it no longer possible to uniquely idenify the resource, as there is no longer a single resource. For these cases, it lends itself to needing a means of uniquely identifying the resource within the “scope”. The scope in this case being an AWS Account Region, but this could vary based on what is the smallest granularity of deployment. To facilitate this uniqueness, one could add a direct link to the configuration of the infrastructure, or a globally unique identifier (GUID) to act as a lookup key for the resources.

  ( App )          ( App )
[ Resource ]    [ Resources ]
     |               |
     |               |
     |               |
     V               V
[ Tenant A ]    [ Tenant B ]

Although we’ve regained the ability to distinguish between the resources, it isn’t really a viable way to allocate costs. Previously as only one resource existed, the entire application could be bundled into a single cost centre. With the addition of the unique property, it is possible to identify the resource, but it doesn’t allow for policy decisions to be made around how to allocate the costs, without just explicitly assigning the deployments to a given cost-centre.

In some ways, it may be possible to make generalizations about the environment where the resource lives, but this isn’t always the case. Experiments are conducted in production, infrastructure will be considered differently than application code. For these kind of cases, one can end up defaulting to the assignment of a cost centre to explicitly assign where a resource contribute costs in the organization. Then the responsibility of determining the cost centre can be computed by a priority order based on the set value, the application, or the environment the resource is operating within.