Mirroring external containers images to internal registries

Using Dependabot & Crane for on-update mirroring of container images to internal registries

Table Of Contents

Today I Explained

When working with container images from third-party registries (DockerHub, Quay, GitHub Container Registry), it can sometimes result in you failing to pull the container images. One such example of this error is:

CannotPullContainerError: inspect image has been retried 5 time(s): httpReaderSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/manifests/sha256:<....> Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

This kind of error occurs due to the rate limiting restrictions that DockerHub places on anonymous or free tier users. This kind of issue can spring up when working with Terraform modules, Helm Charts or CloudFormation Templates that default (or hardcode) the fully qualified domain name (FQDN) for the container image to DockerHub. Unless an alternative registry is used, DockerHub is used as the source of the image which for anonymous pulls often hit the rate limit.

In practice, addressing this often means a one-off copying of external images into internal registries using docker pull & docker push or crane with a “will automate” to come at a later date. This can be necessary as not all public images have setup alternative mirrors on public registries such as GitHub Container Registry, or Quay.

If an existing build workflow exists, the opportunity to automate the copying can exist by extending the build to support publishing the container image using docker build and docker push. This could be accomplished by adding an empty Dockerfile` with a build step like so:

FROM postgres:14

# > Mechanism for _mirroring_ docker hub image into internal registry
# docker build -t internal-registry/dbbackend:v1 . && docker push internal-registry/dbbackend:v1

An alternative approach that can be considered is leveraging the capabilities of dependency updater bots such as Dependabot or Renovate. These regularly scan your Git repositories for outdated dependencies, and if any are found will automatically raise a pull request updating the dependency. With a repository that contains many single line Dockerfile’s containing a FROM directive, it would allow these dependency updater bots to continuously raise pull requests when new versions are available. Combined with a continous integration pipeline, such as GitHub Actions, this could be responsible for mirroring these images into an internal registry.

A repository containing the Dockerfiles could be organized by registry & prefix for the images:

Repository
├─►docs
├─►...
└─►images
   ├─►docker.io
   │  │
   │  ├─►library/postgres/Dockerfile
   │  │
   │  └─►amazon/aws-lambda-python/Dockerfile
   ├─►quay.io
   │  │
   │  └─►prometheus/node-exporter/Dockerfile
   └─►ghcr.io
      └─►hadolint/hadolint/Dockerfile

As the dependency updater bot identifies new versions of the container images within the public registry, a continuous integration workflow can be responsible for mirroring the published images into the internal registries, using a simple loop over each of the Dockerfile files.

find images/. -name 'Dockerfile' |
  while read
    # Read the second part of 'FROM' within the Dockerfile
    fqdn="$(cat "$dockerfile" | grep 'FROM' | cut -d' ' -f2)"
    image="$(echo "$fqdn" | cut -d'/' -f2-)"

    crane cp $fqdn $mirror_registry/$image
  done

A note on first-publish

Not all container registries auto-create a new package/repository whenever a container image is published. As is the case with Elastic Container Registry (ECR) within AWS, it is necessary to first create the repository that will contain the image before it can be mirrored.

This would need to be factored into the workflow that is responsible for setting up the container image for mirroring, which could be contained within the repository, or included as a pre-step in the workflow before mirroring occurs.

A note on CVEs within the external images

The latest release of third party images are not guaranteed to have no security alerts related to the image. Although it is possible to include manual steps to be run through when a new dependency pull request is raised, another option is to take advantage of the continuous integration workflow.

In this workflow it is an option to scan the image that will be mirrored, and report any results within the pull request. If it fails to pass scanning, one would need to “sign-off” on the alerts raised.