01What is this, in one breath?
The short version before the details.
Greentic builds digital workers — packaged pieces of automation. "Cloud deployment" is the machinery that takes one of those packages and runs it on real cloud servers (like Kubernetes or AWS), keeps track of which version is live, lets you shift users onto a new version a little at a time, and lets you undo instantly if something goes wrong — all while keeping secrets safe and recording exactly what happened.
02Why we rebuilt it
The old way worked for one simple case and broke for everything real. Four problems forced a redesign.
🏷️"Environment" was just a label
The word "environment" (like dev or prod) was a sticky note, not a real thing the system understood. Where to store secrets, where to send logs, how to deploy — none of it was actually attached to the environment.
🔀Deploying and "going live" were the same act
Pushing version 2 simply overwrote version 1 on the spot. There was no way to put v2 up quietly, send it 1% of traffic, and keep v1 ready as a safety net. Every serious platform separates these two steps; we didn't.
📦"What plugs into the environment" vs "what we run" was mixed up
The thing that gives an environment its abilities (a secrets store, a deployer) and the thing we actually run on it (the customer's automation) have totally different lifecycles — but they were tangled together.
🔑Credentials and provider types were hard-coded
Customers' admins won't hand over master keys, and the list of supported clouds shouldn't require recompiling the core every time it grows. The old design fought both of these.
Four business goals set the direction: Zain (a telecom customer) goes live on Kubernetes — so Kubernetes is the first target; monthly hosted consulting needs repeatable, priceable deploys; Canonical/Ubuntu wants Snap + Juju deployers without recompiling; and reselling partners need a safe update store plus per-customer revenue tracking.
03The big ideas (the six pillars)
Every feature we built rests on these six decisions.
🌍1 — The Environment is the real target
An environment is now a first-class thing the system fully understands — a "place" where workers run. Its abilities (how to deploy, where secrets live, where logs go, where sessions are stored) are plug-ins you snap in, one per slot. You push workloads to an environment.
🧩2 — Two kinds of things: plug-ins vs workloads
An env-pack is a capability plug-in for the environment (deployer, secrets, telemetry, sessions, state). An app bundle is the actual automation you run. One environment can host many bundles at once. The bundle never carries environment settings — it just asks for capabilities by name.
🔀3 — Deploying ≠ shifting traffic
A Revision is a frozen, never-changing snapshot of a deployed version. A Traffic Split is a separate dial that decides what share of users each revision gets. You can deploy quietly, then move the dial gradually — and roll the dial back instantly without redeploying.
🪄4 — Setup wizards stay where they are
Rather than rebuild every setup wizard into one giant engine, each existing wizard just learned to work per-environment, and secret answers now route straight into the environment's secret store instead of being written to disk in plain text.
🔑5 — Credentials are first-class, with two honest modes
Mode A: you hand the deployer a minimum-privilege key and it checks that key actually has exactly the permissions it needs — no more, no less. Mode B: you give it a master key just once; it works out the smallest set of permissions and produces a "rules pack" your admin can review and apply offline. Master keys are never stored.
🧮6 — Providers grow by publishing, not recompiling; usage is tracked per customer
Adding a new cloud (greentic.deployer.k8s, greentic.deployer.aws-ecs, …) is just publishing a new env-pack with a name — the core never changes. And every piece of usage is stamped with who it was for, so reselling partners can be billed and given revenue share.
04The journey — what we actually built
The work shipped as a long series of small, safe steps. Here is the story in order.
Before rewriting anything, we closed security holes in the existing system: stop secrets from accidentally getting packed into shipped archives, and harden the "unzip" code so a malicious package can't escape its folder or trick us with symlinks. You don't renovate a house while the doors are unlocked.
We created the new object model in a dedicated greentic-deploy-spec library: Environment, Revision, Traffic Split, Deployment, Credentials. We added the capability-slot system, an env-pack registry (so providers plug in by name), the gtc op operator command surface, a "deny everything unless explicitly allowed" security stance, and the design of the HTTP contract a remote store would later speak.
The machinery that makes rollouts real: an in-process router that holds all ready revisions and enforces the traffic dial; blue-green & drain so old versions keep serving in-flight users while new ones warm up; health gates that only mark a revision "ready" once it passes its checks; signed artifacts (cryptographic signatures so you can trust what you're running); and rollout telemetry so every action is traceable. Multiple bundles can now live in one environment side by side.
The two-mode credentials contract became real, plus "preflight" checks that confirm an environment is ready before you deploy, env-pack setup wizards, and scoped message routing (so an incoming chat message reaches exactly the right deployed worker). This is what made the model usable end-to-end, not just on paper.
This is where the model meets a real cloud. Kubernetes is first because the customer Zain runs there. Phase D is itself a train of small PRs — detailed in the next section. AWS comes second, reusing the exact same contract.
Finish the Kubernetes slice end-to-end on a real test cluster, then bring AWS, then GCP, Azure and Snap/Juju online — each one is just another env-pack against the same proven contract.
05Where we are right now — the Kubernetes slice
Phase D ships as a careful sequence. Two big foundations landed first, then the actual Kubernetes deployer.
Foundation 1 — A "driving test" for every deployer
Before building any cloud deployer, we wrote a shared conformance suite: a set of automated checks that every deployer (Kubernetes, AWS, anything future) must pass — things like "deploying the same thing twice is safe," "traffic always adds up to 100%," and "you can't accidentally affect another customer's deployment." A reference local-process deployer proves the test works. Kubernetes and AWS both have to pass the identical test, so they can't quietly behave differently.
Foundation 2 — A safe "filing cabinet" for environment state
Production can't keep its truth in plain local files. So we built the operator store server — a small service that records the real state of every environment with the guarantees production needs: locks so two people editing at once can't corrupt anything, idempotency so a retried request doesn't double-apply, role-based access so only authorized people can change things, an append-only audit log, and backup & restore. The deployer talks to it over a clean HTTP contract, so the same commands work whether state lives on your laptop or in the cloud.
The Kubernetes deployer itself (the PR-5 train)
| Step | What it delivers | Status |
|---|---|---|
| 5.0 — Scaffold | The deterministic half: code that turns an environment into the exact Kubernetes blueprints (namespace, the always-on "router," a worker per version, networking rules, security hardening), plus the permission checks and a minimal-rights "rules pack" for the cluster admin. No live cluster calls yet — but every blueprint is computed and tested. | Merged |
5.1 — op env render |
A command that prints those Kubernetes blueprints to a folder without applying them. This lets the customer choose how to deploy: apply directly, hand the files to their own GitOps pipeline, or review them first. It safely cleans up stale files so old, retired versions can't accidentally come back. | Merged |
| 5.2 — Typed Kubernetes client | The piece that actually talks to a real Kubernetes cluster: it applies the blueprints (server-side, declaratively), deletes cleanly, and runs real permission probes ("am I allowed to do exactly these operations?"). It's fully unit-tested against a mock cluster at the network level — no real cluster needed to prove it's correct. | This PR (#304) |
| 5.3 — Wire it up + real-cluster test | Connect the client to the deploy commands (so warm, traffic set etc. actually drive the cluster), thread the wizard answers through, and prove the whole thing end-to-end on a throwaway local Kubernetes cluster (kind). |
Next |
👉 The PR open right now (#304)
It adds the real Kubernetes client behind the seams the scaffold left open: a cluster half that applies and deletes Kubernetes objects safely and idempotently, and a validator half that asks the cluster "do I have exactly the permissions I claim to need?" A deliberately small, closed routing table maps each blueprint type to the right Kubernetes address — so an unexpected object type is caught as a bug instead of guessing a wrong URL. Everything is proven with mock-network tests; binding it to live deploy commands and the real-cluster run is the very next step (5.3).
06Every real change, by phase
The concrete work — actual pull requests, in plain words. Numbers link to GitHub. Many milestones were several PRs across several repos; the key ones are listed.
Most of the work lives in greentic-deployer (the engine) and its inner greentic-deploy-spec library (the shared vocabulary). The rest threads the new model through greentic-operator, greentic-start, greentic-runner, greentic-bundle, greentic-setup, greentic-config, greentic-distributor-client, greentic-telemetry, greentic-types, greentic-interfaces, greentic-qa, greentic-secrets and greentic-dw.
Phase 0 — Lock the doors (security hotfix)
Close existing leaks before rebuilding on top of them.
| What it did (plain words) | PRs |
|---|---|
| Stop secrets from ever being written into a shipped bundle archive. | bundle#112setup#109 |
| A scanner + automated build gate that fails if a plaintext secret leaks into a bundle. | bundle#115 |
| Keep non-secret config working, but mark the old plaintext path as deprecated. | setup#111start#160operator#70 |
Harden the "unzip" code so a malicious package can't escape its folder or abuse symlinks — and replace a risky external unsquashfs tool with safe in-process extraction. |
bundle#116start#158setup#110 |
Phase A — Foundations (the new vocabulary)
The object model, storage, command surface and security stance that everything else builds on.
| Milestone — what it did (plain words) | Key PRs |
|---|---|
A1 · The vocabulary. A dedicated greentic-deploy-spec library defining Environment, Revision, Traffic Split, Deployment and Credentials — the shared language for everything. |
deployer#196 |
| A2 · Safe storage. How environment state is written to disk safely: atomic writes, a lock per environment, and a backup before every change. | #197#198 |
A3 · The operator commands. The whole gtc op surface: env, env-packs, bundles, revisions, traffic, credentials, secrets, config. |
deployer#200operator#71greentic#220 |
A4 · A default "local" environment. Auto-create a local environment with sensible default plug-ins on first setup; heal it if plug-ins are missing. |
#204#206#216 |
A4b · Retire the old "dev" label. A checked, one-shot migration from the old dev string to local, with a temporary warning alias if anything still uses it. |
deployer#207config#43 +4 |
| A5 · Revision state machine. The staged→warming→ready→draining→archived lifecycle with safe transitions, plus a guard that refuses to archive a revision still serving traffic. | dist-client#146#208#210 |
| A6 · State layout migration. A one-shot, fail-loud migration of the old on-disk state layout to the new env-pack layout. | deployer#211 |
| A7 · Audit + deny-by-default. An append-only audit log and an authorization gate on every state-changing command — non-local changes fail closed until allowed. | deployer#213 |
| A8 · The remote contract (design). The HTTP contract a production store would speak: compare-and-swap, idempotency replay, role-based access, audit shape, backup/restore, corruption detection. | deployer#214operator#74 |
| A9 · The plug-in registry. Providers (deployers, secret stores, etc.) register by name — adding one never changes core code. | deployer#215 |
| A10 · Environment-aware wizards. Every setup wizard learned the environment it runs in, and secret answers now route to the environment's secret store. | qa#46bundle#117 +3 |
| Preflight checks. Verify the right CLI versions, cloud auth and cluster access (not just "is the binary installed") before deploying — with honest "install X with Y" errors. | deployer#217 |
| Slim, secure images. Tiny "distroless" container images — non-root, statically linked, under 30 MB. | start#166deployer#219 +2 |
Phase B — The moving parts (rollouts become real)
The machinery that actually serves traffic across versions, plus signing and telemetry.
| Milestone — what it did (plain words) | Key PRs |
|---|---|
| B0 · Multi-bundle runtime config. Load runtime config that holds many deployments/bundles in one environment; boot without a bundle folder. | start#167 |
| B1 · The dispatcher (the dial). The in-process router that picks which revision each request hits — by trusted header, signed cookie, session pin, or weighted random. | start#168 |
| B2 · Many revisions side by side. Re-key the running-pack index by (tenant, deployment, bundle, revision) so multiple revisions coexist in one runtime. | runner#345 |
| B3 · Per-revision routing. The ingress actually routes each incoming request to the chosen revision. | start#170 |
| B4 · Make it live + Ready-only traffic. The producer that turns traffic splits into live runtime config, and enforces that only Ready revisions receive traffic. | deployer#221operator#80 |
B5–B6 · Hand-typeable traffic + sticky sessions. Make gtc op traffic ergonomic; add a pluggable session-pin store (in-memory or Redis). |
deployer#222start#171 |
| B7 · Blue-green drain. A drain/evict model so an old revision finishes its in-flight sessions before it's torn down. | runner#346start#172 |
| B8–B9 · Static routing + health gate. Mirror revision routing on the static route table; only mark a revision Ready after it passes health checks. | operator#81deployer#223start#175 |
| B10 · Per-customer billing + revenue share. Each deployment carries a billing customer; revenue-share changes write a signed, versioned policy document. | deployer#224 |
B12 · Secret references, not values. Bundles carry secret:// references; secret answers are dropped from plaintext setup files and written as refs. |
bundle#121start#179deployer#225 |
| Signed artifacts. Real cryptographic (DSSE / Ed25519) signature verification of bundles and pack lists, with a per-environment trust root that fails closed. | dist-client#152bundle#122deployer#226 |
| Rollout telemetry. Stamp every execution with pack/rollout IDs and export them to tracing, so every action is traceable. | telemetry#72deployer#228runner#366 |
Phase C — Credentials, runtime config & wizards
What made the model usable end-to-end, not just on paper.
| Milestone — what it did (plain words) | Key PRs |
|---|---|
C1 · Credentials contract. The two-mode credentials model + gtc op credentials commands (requirements / bootstrap / rotate). |
deployer#251 |
| C2 · Reference deployer creds. The local-process credentials implementation other deployers copy. | deployer#252 |
| C3 · AWS credentials. AWS-ECS credentials: a real identity check (STS) plus minimal-permission IAM bootstrap. | deployer#253 |
| C4 · Runtime-config channel. A channel so running components can read non-secret runtime config provided by the environment. | interfaces#154runner#421start#240 |
C5 · Resolve runtime:// refs. Resolve runtime references (e.g. a load-balancer URL discovered after deploy) against the environment's discovered values, with hot reload. |
runner#424start#241deployer#254 |
| C6 · Wizard questions per env-pack. Each env-pack can ship its own setup questions, surfaced through one shared seam. | deployer#255 |
| C7 · Secret vs non-secret answers. Split wizard answers into secret-references and non-secret channels, finalized into per-pack config when a revision is staged. | bundle#127setup#135deployer#256 +1 |
Phase D — The real cloud (Kubernetes, for Zain)
First the two foundations every production cloud needs, then the actual Kubernetes deployer.
Foundation A — the shared "driving test" + the production store
| What it did (plain words) | Key PRs |
|---|---|
| Conformance suite. A shared safety test every deployer must pass (idempotency, traffic sums to 100%, no cross-customer interference) + a reference local-process deployer. | deployer#257 |
| Typed verbs over HTTP (PR-3 train). Turn the on-disk store's internal closures into ~28 typed verb methods, add an HTTP client that speaks the A8 contract, and pick local-vs-remote store at runtime. | #259…#275#276 |
| The store server (PR-4 train). The production "filing cabinet" service: every verb group remote end-to-end, an idempotency replay ledger, a durable audit log, role-based access, and backup/restore — all six A8 guarantees. | #278#286…#295#296 |
Declarative gtc op env apply. Describe a whole environment in one manifest file and apply it, with a --check gate for CI convergence. |
#279#282#284 |
Foundation B — the Kubernetes deployer itself (the PR-5 train)
| Step — what it did (plain words) | PR | Status |
|---|---|---|
| 5.0 · Scaffold. Turn an environment into the exact Kubernetes blueprints — namespace, an always-on router, a worker per version, networking rules, "Restricted" security hardening — plus permission checks and a minimal-rights rules pack for the cluster admin. All computed and tested; no live cluster calls yet. | #297 | Merged |
5.1 · op env render. Print those blueprints to a folder without applying — so the customer can apply directly, hand them to GitOps, or review first. Safely deletes stale files so a retired version can't sneak back via kubectl apply -f. |
#298 | Merged |
| 5.2 · Typed Kubernetes client. The piece that talks to a real cluster: applies blueprints (declarative server-side apply), deletes cleanly and idempotently, and runs real permission probes ("am I allowed exactly these operations?"). Fully tested against a mock cluster at the network level. A small closed routing table maps each blueprint type to the right address, so an unexpected type is caught as a bug rather than guessing a URL. | #304 | Open now |
5.3 · Wire it up + real-cluster test. Connect the client to the deploy commands (so warm, traffic set drive the cluster), thread the wizard answers through, and prove the whole flow end-to-end on a throwaway local Kubernetes cluster (kind). |
— | Next |
What the Kubernetes deployer produces
For one environment it renders a namespace, one always-on router (2+ replicas, with a disruption budget) that owns the traffic dial, one worker per revision (labelled with its revision ID and carrying its identity as environment variables), a runtime-config map the router reloads, and network policies that deny everything except the allow-list. Every pod is hardened to the "Restricted" security profile: non-root, no privilege escalation, read-only filesystem, dropped capabilities, resource limits. The router — not Kubernetes' own routing — is the single source of truth for the split.
07Glossary — the words in plain English
Keep this handy; every term above maps to something simple.
- Environment where
- A "place" workers run — like
local,dev, orzain-prod. It now fully understands its own abilities through plug-ins. - Env-pack plug-in
- A capability you snap into an environment: a deployer, a secrets store, a telemetry exporter, a sessions backend. One per slot. New ones are added by publishing, not recompiling.
- Bundle workload
- The actual automation you run — the customer's digital worker. Many bundles can share one environment.
- Revision snapshot
- A frozen, never-changing record of one deployed version. Like a numbered photograph — you never edit it, you just take a new one.
- Traffic Split the dial
- The setting that decides what share of users each revision receives. Moving it is separate from deploying, and it can be reverted instantly.
- Deployment per customer
- One customer's specific running copy of a bundle in an environment. Two customers can run the same bundle with separate histories and rollback.
- Credentials keys
- The access the deployer needs. Either you give a minimal key it verifies, or a master key once that it uses to mint a minimal one + a rules pack for your admin.
- Conformance suite driving test
- A shared set of automated checks every cloud deployer must pass, so Kubernetes and AWS can't behave differently in dangerous ways.
- Operator store server filing cabinet
- The production-grade service that safely records environment state, with locks, audit log, access control, and backups.
- Kubernetes (K8s) the cloud
- A widely-used system for running containerized apps. It's Greentic's first real cloud target, because the customer Zain runs there.
- Router & worker K8s shape
- On Kubernetes, one always-on router receives all traffic and applies the dial; each version runs as its own worker. The router — not Kubernetes' own routing — is the source of truth for the split.
The one-sentence takeaway
We replaced a system that could only overwrite one version in place with a real, cloud-agnostic deployment platform — where environments are first-class, deploying and going-live are separate dials, every provider passes the same safety test, and Kubernetes (for Zain) is the first real cloud now coming online.