Precomputed APIs using AWS S3 buckets

Using AWS S3 buckets to provide precomputed APIs through the 's3' protocol

Table Of Contents

Today I Explained

Internal datasets in organizations can be tricky, because the internal nature of the data means that approaches for making them accessible are as needs arise. This can mean that internl datasets are distributed in ways such as:

  • Baking the dataset into existing packaging mechanisms, such as container images, libraries or binaries
  • Publish documents containing the data, resulting in copy & pasting throughout the organization
  • Embedding the data into existing API layers, fitting it within unstructured domain fields
  • Allowing open access to databases, granting network access as needed
  • Using git to store the dataset as files performing data entry to update the fields (or as source of truth)
  • A one-off solution to provide API access using varied technologies, but is not in a maintained state

Each of these approaches has their own benefits and trade-offs around data access, lifecycles and maintenance. For some datasets one of these approaches may make sense, while for others it will not be suitable.

One of these cases in which it can be frustrating is with organization metadata that is high-read, low-write. These are datasets in which changes occur on the timescale of weeks or months, while the reading of the dataset will happen significantly more frequently. Publishing writes to all the locations can be a combination of software releases, data entry or manual crafting of update statements. None of which are desirable approaches.

An alternative approach that can be considered is the adoption of the filesystem as an API. Similar to treating the filesystem as a database, the folder layout can represent the route of a web request. The routes will point to a file that is the precomputed result of the web request. In practice, this might look like:

.
└── api
    ├── v1
    |   ├── application
    |   |   ├─> webserver
    |   |   └─> infra
    |   └── team
    |       ├─> administrators
    |       └─> maintainers
    └── v2
        ├── application
        |   ├─> webserver
        |   └─> infra
        └── team
            ├─> administrators
            └─> maintainers

A folder structure with a top-level folder called ‘api’, and two children folders called ‘v1’ and ‘v2’. Within each of these folders is the same folder structure containing two folders called ‘applications’ and ’teams’. The folders of ‘applications’ and ’teams’ contain the names of applications and teams respectively.

In this approach, the files represent the results that would be returned when making requests of the API. These may possess duplicate data within them, as these files would be generated from the domain model itself. As the dataset is known at publish time, it is possible to precompute the response to all routes of the API.

Given this folder & file structure, these files can then be published into an AWS S3 bucket. With the API published into the storage bucket, requests can be made using the S3 API, or with the AWS CLI using commands like:

# The usage of '-' will print the file to stdout
aws s3 cp s3://metadata.aeydr.dev/api/v1/application/webserver -
{
  "name": "webserver",
  "maintainers": [],
  "git": {
    "url": "https://..."
  }
}

Although similar to running a lightweight HTTP server with the dataset bundled with the environment, this use of the S3 bucket allows for access across the entire AWS Organization through the use of bucket policies without any cross-account networking requirements. As this is a storage bucket, it does not require any running compute, only the costs associated with storing & transferring the data.