From the Trenches: Onboarding as Code
From the Trenches is a series on real distributed systems built under real production load where I’ve been fortunate enough to be a part of the team. Our first post is about building a system that allows developers and PMs to spin up new onboarding workflows with simple JSON based configs.
This story comes from a payments company I used to work at where we operated multiple products across different geographies, each with variations in their onboarding flows. The system needed to support shared components across journeys while allowing region-specific customizations. The bigger constraint came from our product managers who wanted the ability to reorder onboarding steps and run A/B tests without waiting for engineering cycles.
Onboarding as Code — why this exists
Hard-coding workflows meant every experiment required developer time and deployment overhead. Hence, we built a JSON-based configuration system where PMs could modify component order, swap in different variants, and adjust copy through config files. This moved most onboarding changes from the engineering backlog to self-service configuration.
Design goals (constraints we held the line on)
- Change config, not code. Flip a template, not a cluster.
- Deterministic & testable. Every transition is explicit; no “magic ifs” buried in handlers.
- Blame-free retries. Idempotent writes; zero duplicate KYC calls.
- Small blast radius. Versioned templates, canary rollout, instant rollback.
- Observability by default. Per-component funnels, latency SLIs, audit events.
System at a glance
A JSON-templated onboarding engine backed by a reusable SDK:
- Versioned templates define workflows, components, and milestones; the SDK compiles them into a state machine.
- Explicit navigation (
next
,next_if
) keeps paths clear and verifiable. - Actions via outbox (workers, retries, timeouts) keep side-effects off the request path.
- Separation of concerns. The onboarding service owns flow/state; domain services own verification.
- First-class telemetry. Consistent events/metrics to spot drop-offs and regressions fast.
Architecture
Happy-path flow
- Client → LB → PGOS (API service embedding the OBS SDK).
- SDK loads a workflow template, creates an instance, persists state in PGOS DB.
PUT
saves data, validates, computes next component, enqueues side-effects (if any).- Workers execute outbox actions (e.g., KYC/bank verification) against domain services.
- Repeat until all milestones complete; audit + metrics emitted throughout.
Core concepts
- Workflow: Named template parameterized by
metadata
(country, product, locale, version), composed of milestones and components. - Component: Smallest interactive unit (form/upload/review/custom). Holds fields, validation, optional
meta
(defaults/options), andsteps
that trigger actions when completed. - Milestone: Labeled group of components used for progress and gating.
- Navigation:
start
,next
, ornext_if
(conditional) control transitions. - Actions:
event
,service_call
, orjob
executed via outbox. - Sync:
sync_to_account_service
toggles writes to external profile systems.
Final API Endpoints
Endpoint | Method | Description |
---|---|---|
/api/workflows |
POST |
Creates a new workflow and returns the initial state. |
/api/workflows/{workflow_id} |
GET |
Retrieves the current state of a workflow, including progress and the current component. |
/api/workflows/{workflow_id} |
PUT |
Saves the state of the current component, validates fields, and returns the next component or errors. |
SDK flow with initialization (tiny samples)
1) Init
|
|
2) Save workflow (pseudocode)
|
|
Example workflow config (trimmed)
|
|
Sample GET workflow response
|
|
Sample PUT workflow response
Scenario: Saving the state of the current component
Request Payload:
|
|
Response
|
|
Runtime behavior & extensibility (quick notes)
- SDK enforces required fields/regex, applies
meta.defaults
, and resolvesmeta.options
dynamically. - On
PUT
success, SDK computesnext
or evaluatesnext_if
to decide transitions. - Transitions update progress and emit
component.saved
,component.validated
,milestone.completed
. - Failures return
validation_errors
and keep the user on the same component. - Add a new component by implementing it in the SDK and referencing its
id
/component_name
in the workflow JSON. - Introduce a new action type by adding an adapter (e.g., to call a domain service) and referencing it in
steps[].action_on_complete
. - Version templates via
metadata.version
and keep multiple templates hot-loaded for A/B or country variants.