Skip to content

Validate Self-Managed Imports

Context

During self-managed imports, we should ensure that images moved into the database remain whole and valid. This will help us detect unforeseen issues and instill greater confidence in the import process for our users.

Possible Approaches

Crane

crane is a popular utility that can validate images. We could use this tool alongside the import process in an off the shelf manner.

Internal Tooling

crane uses this package from Google to perform its validation. We could use this library directly inside the import tool to perform validation.

Scaling

It's possible that too container registry images will be imported within a single import to reasonably validate all of them. Therefore, we need to:

  • Perform the import through to the end of pre-import
  • Ensure the registry is in read-only mode
  • Randomly select a subset of tagged images to validate retrieved from object storage
  • Validate those images, saving only the ones that pass validation (data repair is outside of scope)
  • Persist the results
  • Continue the import to the end of step 2
  • Revalidate the same subset, retrieving them from the database
  • Continue the import on success

Open Questions

How to sample images?

Since we're waiting until after pre-import is completed, we'll likely be able to perform a random sample from the database directly to identify a list of manifests to validate from object storage. Assuming this works, we'll be able to generate a good sample as well as scale the sample up as the total number of manifests increases.

How long does it take to validate images?

Since this step needs to be done during a read-only period, we need to balance how confident we are that the import worked as expected as well as the availability of the registry. Therefore, the speed at which we can perform imports is an important limiting factor and likely differs environment to environment.

Edited by Hayley Swimelar
OSZAR »