Restricted IAM Roles within continuous integration

Limiting permissions of IAM Roles for continuous integration roles

Table Of Contents

Today I Explained

When setting up continuous integration, one of the first challenges faced by newcomers is providing credentials to the service to authenticate with AWS. Typically this will be done with IAM Access Keys, potentially something with IAM IoT Devices, or the more recent direction of OpenID Connect if your continuous integration service supports it ¹².

Sometimes while assisting with setting these up, the topic of securing the pipeline will come up. Specifically the case of:

What happens if these keys are leaked or compromised?

A surprisingly involved question, as it serves as a entrypoint into the topics of rotation, revocation, detection & response, as well as threat modelling in general. For this case, we’ll instead focus on the case of a set of IAM Access Keys that have become compromised.

The first challenge is even knowing that the keys have been compromised. Services exists for identifying & responding to leaked secrets, but this assumes that the compromised secrets are known to others besides the attackers. The most likely way that you’ll know the keys have been compromised is that the IAM User will have scanned the infrastructure for opportunities, and if found, engaged in nefarious deeds.

Upon knowing that automated tooling will likely scan your infrastructure for opportunities, you may consider restricting the permissions of the IAM User to the bare minimum, and requiring an AssumeRole action to elevate to an IAM Role with practical permissions. This way you’ll have an early detection mechanism, that if the IAM User ever encounters Deny events when trying to List/Describe infrastructure, it may be compromised.

                     ┌─────────
   x x  x  x         │
  ┌──────────┐  +    │
x │ IAM User ├─────► IAM Role
  └──────────┘       │
  x  x  x  x         │
                     └─────────

A diagram showing an IAM User with deny events, and a single allowed permission of AssumeRole to an IAM Role with the baseline permissions

An IAM Policy for an IAM User such as this would likely possess the identity permission of ec2:DescribeRegion for asserting permissions, and sts:AssumeRole for elevating to the elevated role.

[
  {
    "Sid": "AllowIdentity",
    "Effect": "Allow",
    "Action": [
        "ec2:DescribeRegions"
    ],
    "Resource": "*"
  },
  {
      "Sid": "AllowElevationToCIRole",
      "Effect": "Allow",
      "Action": [
          "sts:AssumeRole"
      ],
      "Resource": "arn:aws:iam::account-id:role/..."
  }
]

With sufficient monitoring for scannig or probing, it would be possible to support automated revocation of these credentials in the event of a potential compromise.

A note on Threat Models

In practice, this kind of approach is missing anchoring. The concept makes sense, in that it allows for a early warning system for potential compromised of a continuous integration system. What is missing is the threat modelling of the risks faced by the organization, for which this approach is will assist in reducing those risks.

For the credentials to be leaked, it requires one of:

  • The continuous integration service (SaaS) has had an internal threat actor breach their storage mechanism for these credentials, and exfiltrate them to a means of executing the attack
  • The continuous integration service (SaaS) has had a misconfiguration or vulnerability that has resulted in exfiltration of the credentials
  • An organization internal threat actor has leaked the continuous integration credentials
  • The internal procedures for internal operations has resulted in exposure of the credentials

The above isn’t a complete list, but covers some of possible situations

Without identifying the risks, it is difficult to argue that this approach helps to mitigate risks. To illustrate it with an example, consider an organization that publishes container images from GitHub Actions. The risk of a compromised secret is that the threat actor would be able to publish malicious or compromised images into Elastic Container Registry (ECR).

If the infrastructure is pulling from these repositories at latest, then it could see malicious software running the next time an image is pulled. With this context, it seems more prudent to recommend:

  • Using a fixed tag instead of the latest for pulling the container images
  • Using immutable tags within the repository, preventing overwriting existing in-use tags
  • Requiring that images are signed in GitHub Actions, and vetted at time of deployment

Although not a complete list (or the only approach), by understanding the structure of the problem, it can yield better reasoning for the adoption of a given pattern.