Greentic Cloud Deployment — Explained Simply

01What is this, in one breath?

The short version before the details.

Greentic builds digital workers — packaged pieces of automation. "Cloud deployment" is the machinery that takes one of those packages and runs it on real cloud servers (like Kubernetes or AWS), keeps track of which version is live, lets you shift users onto a new version a little at a time, and lets you undo instantly if something goes wrong — all while keeping secrets safe and recording exactly what happened.

Analogy. Think of an airport. A plane is the workload you want to fly (the digital worker). The airport is the environment with all its facilities — fuel, control tower, gates (these are the "plug-ins"). Deploying a new plane to the airport is one act; letting passengers board it is a separate act. You can park the new plane, test it, board 1% of passengers, and if it misbehaves, send everyone back to the old plane that's still sitting at the gate. That separation is the heart of everything below.

02Why we rebuilt it

The old way worked for one simple case and broke for everything real. Four problems forced a redesign.

🏷️"Environment" was just a label

The word "environment" (like dev or prod) was a sticky note, not a real thing the system understood. Where to store secrets, where to send logs, how to deploy — none of it was actually attached to the environment.

🔀Deploying and "going live" were the same act

Pushing version 2 simply overwrote version 1 on the spot. There was no way to put v2 up quietly, send it 1% of traffic, and keep v1 ready as a safety net. Every serious platform separates these two steps; we didn't.

📦"What plugs into the environment" vs "what we run" was mixed up

The thing that gives an environment its abilities (a secrets store, a deployer) and the thing we actually run on it (the customer's automation) have totally different lifecycles — but they were tangled together.

🔑Credentials and provider types were hard-coded

Customers' admins won't hand over master keys, and the list of supported clouds shouldn't require recompiling the core every time it grows. The old design fought both of these.

Four business goals set the direction: Zain (a telecom customer) goes live on Kubernetes — so Kubernetes is the first target; monthly hosted consulting needs repeatable, priceable deploys; Canonical/Ubuntu wants Snap + Juju deployers without recompiling; and reselling partners need a safe update store plus per-customer revenue tracking.

03The big ideas (the six pillars)

Every feature we built rests on these six decisions.

🌍1 — The Environment is the real target

An environment is now a first-class thing the system fully understands — a "place" where workers run. Its abilities (how to deploy, where secrets live, where logs go, where sessions are stored) are plug-ins you snap in, one per slot. You push workloads to an environment.

🧩2 — Two kinds of things: plug-ins vs workloads

An env-pack is a capability plug-in for the environment (deployer, secrets, telemetry, sessions, state). An app bundle is the actual automation you run. One environment can host many bundles at once. The bundle never carries environment settings — it just asks for capabilities by name.

🔀3 — Deploying ≠ shifting traffic

A Revision is a frozen, never-changing snapshot of a deployed version. A Traffic Split is a separate dial that decides what share of users each revision gets. You can deploy quietly, then move the dial gradually — and roll the dial back instantly without redeploying.

🪄4 — Setup wizards stay where they are

Rather than rebuild every setup wizard into one giant engine, each existing wizard just learned to work per-environment, and secret answers now route straight into the environment's secret store instead of being written to disk in plain text.

🔑5 — Credentials are first-class, with two honest modes

Mode A: you hand the deployer a minimum-privilege key and it checks that key actually has exactly the permissions it needs — no more, no less. Mode B: you give it a master key just once; it works out the smallest set of permissions and produces a "rules pack" your admin can review and apply offline. Master keys are never stored.

🧮6 — Providers grow by publishing, not recompiling; usage is tracked per customer

Adding a new cloud (greentic.deployer.k8s, greentic.deployer.aws-ecs, …) is just publishing a new env-pack with a name — the core never changes. And every piece of usage is stamped with who it was for, so reselling partners can be billed and given revenue share.

04The journey — what we actually built

The work shipped as a long series of small, safe steps. Here is the story in order.

Done In progress Next up

✓

Phase 0 — Lock the doors first Done

Before rewriting anything, we closed security holes in the existing system: stop secrets from accidentally getting packed into shipped archives, and harden the "unzip" code so a malicious package can't escape its folder or trick us with symlinks. You don't renovate a house while the doors are unlocked.

✓

Phase A — The foundations (the new vocabulary) Done

We created the new object model in a dedicated greentic-deploy-spec library: Environment, Revision, Traffic Split, Deployment, Credentials. We added the capability-slot system, an env-pack registry (so providers plug in by name), the gtc op operator command surface, a "deny everything unless explicitly allowed" security stance, and the design of the HTTP contract a remote store would later speak.

✓

Phase B — The moving parts Done

The machinery that makes rollouts real: an in-process router that holds all ready revisions and enforces the traffic dial; blue-green & drain so old versions keep serving in-flight users while new ones warm up; health gates that only mark a revision "ready" once it passes its checks; signed artifacts (cryptographic signatures so you can trust what you're running); and rollout telemetry so every action is traceable. Multiple bundles can now live in one environment side by side.

✓

Phase C — Credentials, preflight & messaging Done

The two-mode credentials contract became real, plus "preflight" checks that confirm an environment is ready before you deploy, env-pack setup wizards, and scoped message routing (so an incoming chat message reaches exactly the right deployed worker). This is what made the model usable end-to-end, not just on paper.

●

Phase D — The real cloud: Kubernetes (for Zain) In progress

This is where the model meets a real cloud. Kubernetes is first because the customer Zain runs there. Phase D is itself a train of small PRs — detailed in the next section. AWS comes second, reusing the exact same contract.

→

Still ahead Next

Finish the Kubernetes slice end-to-end on a real test cluster, then bring AWS, then GCP, Azure and Snap/Juju online — each one is just another env-pack against the same proven contract.

05Where we are right now — the Kubernetes slice

Phase D ships as a careful sequence. Two big foundations landed first, then the actual Kubernetes deployer.

Foundation 1 — A "driving test" for every deployer

Before building any cloud deployer, we wrote a shared conformance suite: a set of automated checks that every deployer (Kubernetes, AWS, anything future) must pass — things like "deploying the same thing twice is safe," "traffic always adds up to 100%," and "you can't accidentally affect another customer's deployment." A reference local-process deployer proves the test works. Kubernetes and AWS both have to pass the identical test, so they can't quietly behave differently.

Foundation 2 — A safe "filing cabinet" for environment state

Production can't keep its truth in plain local files. So we built the operator store server — a small service that records the real state of every environment with the guarantees production needs: locks so two people editing at once can't corrupt anything, idempotency so a retried request doesn't double-apply, role-based access so only authorized people can change things, an append-only audit log, and backup & restore. The deployer talks to it over a clean HTTP contract, so the same commands work whether state lives on your laptop or in the cloud.

The Kubernetes deployer itself (the PR-5 train)

Step	What it delivers	Status
5.0 — Scaffold	The deterministic half: code that turns an environment into the exact Kubernetes blueprints (namespace, the always-on "router," a worker per version, networking rules, security hardening), plus the permission checks and a minimal-rights "rules pack" for the cluster admin. No live cluster calls yet — but every blueprint is computed and tested.	Merged
5.1 — `op env render`	A command that prints those Kubernetes blueprints to a folder without applying them. This lets the customer choose how to deploy: apply directly, hand the files to their own GitOps pipeline, or review them first. It safely cleans up stale files so old, retired versions can't accidentally come back.	Merged
5.2 — Typed Kubernetes client	The piece that actually talks to a real Kubernetes cluster: it applies the blueprints (server-side, declaratively), deletes cleanly, and runs real permission probes ("am I allowed to do exactly these operations?"). It's fully unit-tested against a mock cluster at the network level — no real cluster needed to prove it's correct.	This PR (#304)
5.3 — Wire it up + real-cluster test	Connect the client to the deploy commands (so `warm`, `traffic set` etc. actually drive the cluster), thread the wizard answers through, and prove the whole thing end-to-end on a throwaway local Kubernetes cluster (kind).	Next

👉 The PR open right now (#304)

It adds the real Kubernetes client behind the seams the scaffold left open: a cluster half that applies and deletes Kubernetes objects safely and idempotently, and a validator half that asks the cluster "do I have exactly the permissions I claim to need?" A deliberately small, closed routing table maps each blueprint type to the right Kubernetes address — so an unexpected object type is caught as a bug instead of guessing a wrong URL. Everything is proven with mock-network tests; binding it to live deploy commands and the real-cluster run is the very next step (5.3).

06Every real change, by phase

The concrete work — actual pull requests, in plain words. Numbers link to GitHub. Many milestones were several PRs across several repos; the key ones are listed.

phases (4 done, 1 in progress)

100+

pull requests merged

~15

repositories touched

K8s

first cloud target (for Zain)

Most of the work lives in greentic-deployer (the engine) and its inner greentic-deploy-spec library (the shared vocabulary). The rest threads the new model through greentic-operator, greentic-start, greentic-runner, greentic-bundle, greentic-setup, greentic-config, greentic-distributor-client, greentic-telemetry, greentic-types, greentic-interfaces, greentic-qa, greentic-secrets and greentic-dw.

Done

Phase 0 — Lock the doors (security hotfix)

Close existing leaks before rebuilding on top of them.

What it did (plain words)	PRs
Stop secrets from ever being written into a shipped bundle archive.	bundle#112 setup#109
A scanner + automated build gate that fails if a plaintext secret leaks into a bundle.	bundle#115
Keep non-secret config working, but mark the old plaintext path as deprecated.	setup#111 start#160 operator#70
Harden the "unzip" code so a malicious package can't escape its folder or abuse symlinks — and replace a risky external `unsquashfs` tool with safe in-process extraction.	bundle#116 start#158 setup#110

Done

Phase A — Foundations (the new vocabulary)

The object model, storage, command surface and security stance that everything else builds on.

Milestone — what it did (plain words)	Key PRs
A1 · The vocabulary. A dedicated `greentic-deploy-spec` library defining Environment, Revision, Traffic Split, Deployment and Credentials — the shared language for everything.	deployer#196
A2 · Safe storage. How environment state is written to disk safely: atomic writes, a lock per environment, and a backup before every change.	#197 #198
A3 · The operator commands. The whole `gtc op` surface: env, env-packs, bundles, revisions, traffic, credentials, secrets, config.	deployer#200 operator#71 greentic#220
A4 · A default "local" environment. Auto-create a `local` environment with sensible default plug-ins on first setup; heal it if plug-ins are missing.	#204 #206 #216
A4b · Retire the old "dev" label. A checked, one-shot migration from the old `dev` string to `local`, with a temporary warning alias if anything still uses it.	deployer#207 config#43 +4
A5 · Revision state machine. The staged→warming→ready→draining→archived lifecycle with safe transitions, plus a guard that refuses to archive a revision still serving traffic.	dist-client#146 #208 #210
A6 · State layout migration. A one-shot, fail-loud migration of the old on-disk state layout to the new env-pack layout.	deployer#211
A7 · Audit + deny-by-default. An append-only audit log and an authorization gate on every state-changing command — non-local changes fail closed until allowed.	deployer#213
A8 · The remote contract (design). The HTTP contract a production store would speak: compare-and-swap, idempotency replay, role-based access, audit shape, backup/restore, corruption detection.	deployer#214 operator#74
A9 · The plug-in registry. Providers (deployers, secret stores, etc.) register by name — adding one never changes core code.	deployer#215
A10 · Environment-aware wizards. Every setup wizard learned the environment it runs in, and secret answers now route to the environment's secret store.	qa#46 bundle#117 +3
Preflight checks. Verify the right CLI versions, cloud auth and cluster access (not just "is the binary installed") before deploying — with honest "install X with Y" errors.	deployer#217
Slim, secure images. Tiny "distroless" container images — non-root, statically linked, under 30 MB.	start#166 deployer#219 +2

Done

Phase B — The moving parts (rollouts become real)

The machinery that actually serves traffic across versions, plus signing and telemetry.

Milestone — what it did (plain words)	Key PRs
B0 · Multi-bundle runtime config. Load runtime config that holds many deployments/bundles in one environment; boot without a bundle folder.	start#167
B1 · The dispatcher (the dial). The in-process router that picks which revision each request hits — by trusted header, signed cookie, session pin, or weighted random.	start#168
B2 · Many revisions side by side. Re-key the running-pack index by (tenant, deployment, bundle, revision) so multiple revisions coexist in one runtime.	runner#345
B3 · Per-revision routing. The ingress actually routes each incoming request to the chosen revision.	start#170
B4 · Make it live + Ready-only traffic. The producer that turns traffic splits into live runtime config, and enforces that only Ready revisions receive traffic.	deployer#221 operator#80
B5–B6 · Hand-typeable traffic + sticky sessions. Make `gtc op traffic` ergonomic; add a pluggable session-pin store (in-memory or Redis).	deployer#222 start#171
B7 · Blue-green drain. A drain/evict model so an old revision finishes its in-flight sessions before it's torn down.	runner#346 start#172
B8–B9 · Static routing + health gate. Mirror revision routing on the static route table; only mark a revision Ready after it passes health checks.	operator#81 deployer#223 start#175
B10 · Per-customer billing + revenue share. Each deployment carries a billing customer; revenue-share changes write a signed, versioned policy document.	deployer#224
B12 · Secret references, not values. Bundles carry `secret://` references; secret answers are dropped from plaintext setup files and written as refs.	bundle#121 start#179 deployer#225
Signed artifacts. Real cryptographic (DSSE / Ed25519) signature verification of bundles and pack lists, with a per-environment trust root that fails closed.	dist-client#152 bundle#122 deployer#226
Rollout telemetry. Stamp every execution with pack/rollout IDs and export them to tracing, so every action is traceable.	telemetry#72 deployer#228 runner#366

Done

Phase C — Credentials, runtime config & wizards

What made the model usable end-to-end, not just on paper.

Milestone — what it did (plain words)	Key PRs
C1 · Credentials contract. The two-mode credentials model + `gtc op credentials` commands (requirements / bootstrap / rotate).	deployer#251
C2 · Reference deployer creds. The local-process credentials implementation other deployers copy.	deployer#252
C3 · AWS credentials. AWS-ECS credentials: a real identity check (STS) plus minimal-permission IAM bootstrap.	deployer#253
C4 · Runtime-config channel. A channel so running components can read non-secret runtime config provided by the environment.	interfaces#154 runner#421 start#240
C5 · Resolve `runtime://` refs. Resolve runtime references (e.g. a load-balancer URL discovered after deploy) against the environment's discovered values, with hot reload.	runner#424 start#241 deployer#254
C6 · Wizard questions per env-pack. Each env-pack can ship its own setup questions, surfaced through one shared seam.	deployer#255
C7 · Secret vs non-secret answers. Split wizard answers into secret-references and non-secret channels, finalized into per-pack config when a revision is staged.	bundle#127 setup#135 deployer#256 +1

In progress

Phase D — The real cloud (Kubernetes, for Zain)

First the two foundations every production cloud needs, then the actual Kubernetes deployer.

Foundation A — the shared "driving test" + the production store

What it did (plain words)	Key PRs
Conformance suite. A shared safety test every deployer must pass (idempotency, traffic sums to 100%, no cross-customer interference) + a reference local-process deployer.	deployer#257
Typed verbs over HTTP (PR-3 train). Turn the on-disk store's internal closures into ~28 typed verb methods, add an HTTP client that speaks the A8 contract, and pick local-vs-remote store at runtime.	#259…#275 #276
The store server (PR-4 train). The production "filing cabinet" service: every verb group remote end-to-end, an idempotency replay ledger, a durable audit log, role-based access, and backup/restore — all six A8 guarantees.	#278 #286…#295 #296
Declarative `gtc op env apply`. Describe a whole environment in one manifest file and apply it, with a `--check` gate for CI convergence.	#279 #282 #284

Foundation B — the Kubernetes deployer itself (the PR-5 train)

Step — what it did (plain words)	PR	Status
5.0 · Scaffold. Turn an environment into the exact Kubernetes blueprints — namespace, an always-on router, a worker per version, networking rules, "Restricted" security hardening — plus permission checks and a minimal-rights rules pack for the cluster admin. All computed and tested; no live cluster calls yet.	#297	Merged
5.1 · `op env render`. Print those blueprints to a folder without applying — so the customer can apply directly, hand them to GitOps, or review first. Safely deletes stale files so a retired version can't sneak back via `kubectl apply -f`.	#298	Merged
5.2 · Typed Kubernetes client. The piece that talks to a real cluster: applies blueprints (declarative server-side apply), deletes cleanly and idempotently, and runs real permission probes ("am I allowed exactly these operations?"). Fully tested against a mock cluster at the network level. A small closed routing table maps each blueprint type to the right address, so an unexpected type is caught as a bug rather than guessing a URL.	#304	Open now
5.3 · Wire it up + real-cluster test. Connect the client to the deploy commands (so `warm`, `traffic set` drive the cluster), thread the wizard answers through, and prove the whole flow end-to-end on a throwaway local Kubernetes cluster (kind).	—	Next

What the Kubernetes deployer produces

For one environment it renders a namespace, one always-on router (2+ replicas, with a disruption budget) that owns the traffic dial, one worker per revision (labelled with its revision ID and carrying its identity as environment variables), a runtime-config map the router reloads, and network policies that deny everything except the allow-list. Every pod is hardened to the "Restricted" security profile: non-root, no privilege escalation, read-only filesystem, dropped capabilities, resource limits. The router — not Kubernetes' own routing — is the single source of truth for the split.

07Glossary — the words in plain English

Keep this handy; every term above maps to something simple.

Environment where: A "place" workers run — like local, dev, or zain-prod. It now fully understands its own abilities through plug-ins.
Env-pack plug-in: A capability you snap into an environment: a deployer, a secrets store, a telemetry exporter, a sessions backend. One per slot. New ones are added by publishing, not recompiling.
Bundle workload: The actual automation you run — the customer's digital worker. Many bundles can share one environment.
Revision snapshot: A frozen, never-changing record of one deployed version. Like a numbered photograph — you never edit it, you just take a new one.
Traffic Split the dial: The setting that decides what share of users each revision receives. Moving it is separate from deploying, and it can be reverted instantly.
Deployment per customer: One customer's specific running copy of a bundle in an environment. Two customers can run the same bundle with separate histories and rollback.
Credentials keys: The access the deployer needs. Either you give a minimal key it verifies, or a master key once that it uses to mint a minimal one + a rules pack for your admin.
Conformance suite driving test: A shared set of automated checks every cloud deployer must pass, so Kubernetes and AWS can't behave differently in dangerous ways.
Operator store server filing cabinet: The production-grade service that safely records environment state, with locks, audit log, access control, and backups.
Kubernetes (K8s) the cloud: A widely-used system for running containerized apps. It's Greentic's first real cloud target, because the customer Zain runs there.
Router & worker K8s shape: On Kubernetes, one always-on router receives all traffic and applies the dial; each version runs as its own worker. The router — not Kubernetes' own routing — is the source of truth for the split.

The one-sentence takeaway

We replaced a system that could only overwrite one version in place with a real, cloud-agnostic deployment platform — where environments are first-class, deploying and going-live are separate dials, every provider passes the same safety test, and Kubernetes (for Zain) is the first real cloud now coming online.