The Invisible Rewrite: Modernizing the Kubernetes Image Promoter

Every container image you pull from registry.k8s.io got there throughkpromo, the Kubernetes image promoter. It copies images from staging registries to production, signs them with cosign, replicates signatures across more than 20 regional mirrors, and generatesSLSA provenance attestations. If this tool breaks, no Kubernetes release ships. Over the past few weeks, we rewrote its core from scratch, deleted 20% of the codebase, made it dramatically faster, and nobody noticed. That was the whole point.

A bit of history

The image promoter started in late 2018 as an internal Google project byLinus Arver. The goal was simple: replace the manual, Googler-gated process of copying container images into k8s.gcr.io with a community-owned, GitOps-based workflow. Push to a staging registry, open a PR with a YAML manifest, get it reviewed and merged, and automation handles the rest. KEP-1734 formalized this proposal.

In early 2019, the code moved to kubernetes-sigs/k8s-container-image-promoter and grew quickly. Over the next few years,Stephen Augustus consolidated multiple tools(cip, gh2gcs, krel promote-images, promobot-files) into a single CLI called kpromo. The repository was renamed topromo-tools.Adolfo Garcia Veytia (Puerco) added cosign signing and SBOM support. Tyler Ferrara built vulnerability scanning. Carlos Panato kept the project in a healthy and releasable state. 42 contributors made about 3,500 commits across more than 60 releases.

It worked. But by 2025 the codebase carried the weight of seven years of incremental additions from multiple SIGs and subprojects. The READMEsaid it plainly:you will see duplicated code, multiple techniques for accomplishing the same thing, and several TODOs.

The problems we needed to solve

Production promotion jobs for Kubernetes core images regularly took over 30 minutes and frequently failed with rate limit errors. The core promotion logic had grown into a monolith that washard to extend and difficult to test, making new features like provenance or vulnerability scanning painful to add.

On the SIG Release roadmap,two work items had been sitting for a while: "Rewrite artifact promoter" and"Make artifact validation more robust". We had discussed these at SIG Release meetings and KubeCons, and the open research spikes onproject board #171 captured eight questions that needed answers before we could move forward.

One issue to answer them all

In February 2026, we opened issue #1701 ("Rewrite artifact promoter pipeline") and answered all eight spikes in a single tracking issue. The rewrite was deliberately phased so that each step could be reviewed, merged, and validated independently. Here is what we did:

Phase 1: Rate Limiting (#1702).Rewrote rate limiting to properly throttle all registry operations with adaptive backoff.

Phase 2: Interfaces (#1704).Put registry and auth operations behind clean interfaces so they can be swapped out and tested independently.

Phase 3: Pipeline Engine (#1705).Built a pipeline engine that runs promotion as a sequence of distinct phases instead of one large function.

Phase 4: Provenance (#1706).Added SLSA provenance verification for staging images.

Phase 5: Scanner and SBOMs (#1709).Added vulnerability scanning and SBOM support. Flipped the default to the new pipeline engine. At this point we cutv4.2.0 and let it soak in production before continuing.

Phase 6: Split Signing from Replication (#1713).Separated image signing from signature replication into their own pipeline phases, eliminating the rate limit contention that caused most production failures.

Phase 7: Remove Legacy Pipeline (#1712).Deleted the old code path entirely.

Phase 8: Remove Legacy Dependencies (#1716).Deleted the audit subsystem, deprecated tools, and e2e test infrastructure.

Phase 9: Delete the Monolith (#1718).Removed the old monolithic core and its supporting packages. Thousands of lines deleted across phases 7 through 9.

Each phase shipped independently.v4.3.0 followed the next day with the legacy code fully removed.

With the new architecture in place, a series of follow-up improvements landed:parallelized registry reads(#1736),retry logic for all network operations(#1742),per-request timeouts to prevent pipeline hangs(#1763),HTTP connection reuse(#1759),local registry integration tests(#1746),the removal of deprecated credential file support(#1758),a rework of attestation handling to use cosign's OCI APIs and the removal of deprecated SBOM support(#1764),and a dedicated promotion record predicate type registered with thein-toto attestation framework (#1767).These would have been much harder to land without the clean separation the rewrite provided.v4.4.0 shipped all of these improvements and enabled provenance generation and verification by default.

The new pipeline

The promotion pipeline now has seven clearly separated phases:

graph LR Setup --> Plan --> Provenance --> Validate --> Promote --> Sign --> Attest
PhaseWhat it does
SetupValidate options, prewarm TUF cache.
PlanParse manifests, read registries, compute which images need promotion.
ProvenanceVerify SLSA attestations on staging images.
ValidateCheck cosign signatures, exit here for dry runs.
PromoteCopy images server-side, preserving digests.
SignSign promoted images with keyless cosign.
AttestGenerate promotion provenance attestations using a dedicated in-toto predicate type.

Phases run sequentially, so each one gets exclusive access to the full rate limit budget. No more contention. Signature replication to mirror registries is no longer part of this pipeline and runs as adedicated periodic Prow job instead.

Making it fast

With the architecture in place, we turned to performance.

Parallel registry reads (#1736):The plan phase reads 1,350 registries. We parallelized this and the plan phase dropped from about 20 minutes to about 2 minutes.

Two-phase tag listing (#1761):Instead of checking all 46,000 image groups across more than 20 mirrors, we first check only the source repositories. About 57% of images have no signatures at all because they were promoted before signing was enabled. We skip those entirely,cutting API calls roughly in half.

Source check before replication (#1727):Before iterating all mirrors for a given image, we check if the signature exists on the primary registry first. In steady state where most signatures are already replicated, this reduced the work from about 17 hours to about 15 minutes.

Per-request timeouts (#1763):We observed intermittent hangs where a stalled connection blocked the pipeline for over 9 hours. Every network operation now has its own timeout and transient failures are retried automatically.

Connection reuse (#1759):We started reusing HTTP connections and auth state across operations, eliminating redundant token negotiations. This closed along-standing request from 2023.

By the numbers

Here is what the rewrite looks like in aggregate.

  • Over 40 PRs merged, 3 releases shipped (v4.2.0, v4.3.0, v4.4.0)
  • Over 10,000 lines added and over 16,000 lines deleted, a net reduction of about 5,000 lines (20% smaller codebase)
  • Performance drastically improved across the board
  • Robustness improved with retry logic, per-request timeouts, and adaptive rate limiting
  • 19 long-standing issues closed

The codebase shrank by a fifth while gaining provenance attestations, a pipeline engine, vulnerability scanning integration, parallelized operations, retry logic, integration tests against local registries, and a standalone signature replication mode.

No user-facing changes

This was a hard requirement. The kpromo cip command accepts the same flags and reads the same YAML manifests. Thepost-k8sio-image-promo Prow job continued working throughout. The promotion manifests inkubernetes/k8s.io did not change. Nobody had to update their workflows or configuration.

We caught two regressions early in production. One (#1731)caused a registry key mismatch that made every image appear as "lost" so that nothing was promoted. Another (#1733)set the default thread count to zero, blocking all goroutines. Both were fixed within hours. The phased release strategy (v4.2.0 with the new engine, v4.3.0 with legacy code removed) gave us a clear rollback path that we fortunately never needed.

What comes next

Signature replication across all mirror registries remains the most expensive part of the promotion cycle. Issue #1762 proposes eliminating it entirely by havingarcheio (the registry.k8s.io redirect service) route signature tag requests to a single canonical upstream instead of per-region backends. Another option would be to move signing closer to the registry infrastructure itself. Both approaches need further discussion with the SIG Release and infrastructure teams, but either one would remove thousands of API calls per promotion cycle and simplify the codebase even further.

Thank you

This project has been a community effort spanning seven years. Thank you toLinus,Stephen,Adolfo,Carlos,Ben,Marko,Lauri,Tyler,Arnaud, and many others who contributed code, reviews, and planning over the years. The SIG Release and Release Engineering communities provided the context, the discussions, and the patience for a rewrite of infrastructure that every Kubernetes release depends on.

If you want to get involved, join us in#release-management on the Kubernetes Slack or check out therepository.