Build Strategy

Top-Down vs Bottom-Up

Two approaches to building enterprise RL environments. One starts from reality and strips away what's sensitive. The other starts from imagination and tries to add realism. The difference matters.

01 — Two Approaches

Starting from reality vs. starting from scratch.

Top-Down

Recommended — start here

Take a real project you've already delivered. Strip all PII, PHI, client names, and proprietary data. What remains — the PRDs, stories, Slack debates, code, reviews, test plans — becomes the environment. It's real because it came from reality.

Pros

  • Artifacts are authentically messy — real debates, real pushback, real constraints
  • Contains the "unknown unknowns" — edge cases that are hard to invent from scratch
  • Much faster to produce at scale — the content already exists, you're editing not creating
  • Genuinely out-of-distribution — these patterns don't exist in public training data
  • Higher value to AI labs because the texture is real, not synthetic

Cons

  • Requires access to past project artifacts (Jira, Confluence, Slack, repos)
  • Sanitization is non-trivial — must be thorough to avoid IP/privacy violations
  • Needs legal review to confirm sanitized output is safe to commercialize
  • Some projects may not survive sanitization — if the domain details ARE the interesting part
  • Client contracts may restrict even sanitized derivative use

Bottom-Up

Use to fill gaps

Start from scratch. Domain experts and engineers design what an enterprise environment should look like and craft every artifact — original PRDs, synthetic code, authored team discussions — drawing on their collective experience of how enterprises operate.

Pros

  • Can start immediately without needing access to past project data
  • Full creative control over what the environment tests
  • Zero risk of accidentally leaking real client information
  • Can target known model weaknesses by design
  • No legal review needed — everything is original creation

Cons

  • Artifacts can feel too polished — may lack the organic texture of real projects
  • Harder to capture "unknown unknowns" — the unexpected workarounds and institutional knowledge
  • More time per environment — creating from scratch takes more effort than sanitizing existing work
  • Scaling is slower — each environment requires significant creative investment
  • AI-assisted content generation can produce patterns already in training data, which may reduce out-of-distribution value

02 — Top-Down Process

From real project to packaged environment.

The top-down process takes a completed enterprise engagement and systematically transforms it into a sellable RL training environment.

Top-Down Pipeline

5 Steps
01
Select Project
Pick a completed engagement that represents a common enterprise pattern. Ideally one with good artifact retention — Jira history, Confluence pages, Slack exports, git repos, design docs.
02
Inventory Artifacts
Catalog everything: PRDs, technical scoping docs, ADRs, Jira exports, Slack/Teams threads, design documents, API specs, code repositories, PR reviews, test plans, deployment runbooks, incident reports.
03
Sanitize
Remove all PII (names, emails, SSNs), PHI (medical records, patient data), client identifiers (company names, project codes, internal URLs, proprietary business logic). Replace with synthetic equivalents that preserve shape and complexity.
04
Generalize
Abstract client-specific details into industry patterns. "Acme Bank" instead of the real name. Generated account numbers in the same format. Synthetic data with identical structure. The patterns are real; the specifics are not.
05
Package & Verify
Build Docker environment with all artifacts organized by SDLC phase. Write verification scripts for each stage. Independent review to confirm nothing identifiable remains. Validate that the environment actually tests what matters.

03 — Bottom-Up Process

Building from domain expertise when no project exists.

Use bottom-up for scenarios you know exist in the market but haven't delivered as projects. The quality depends entirely on the depth of domain knowledge applied.

Bottom-Up Pipeline

6 Steps
01
Define Scenario
Domain experts describe the enterprise pattern from experience: "At banks like X, migrating Y typically involves Z." No specific client, just the pattern observed across multiple engagements.
02
Draft Artifacts
Write the PRD, scoping doc, team discussion threads, architecture decisions. Domain experts drive content; engineers handle structure. AI tools help generate volume but experts review for authenticity.
03
Build Codebase
Scaffold the multi-service architecture, database schemas, infrastructure configs. Include realistic technical debt, existing tests (some flaky), and the kind of workarounds that real codebases accumulate.
04
Add Organizational Context
Create team structure docs, approval workflows, change management process, cross-team dependencies. This is where domain knowledge matters most — generic org context is obvious and low-value.
05
Expert Review
Domain experts review every artifact for authenticity: "A real DBA wouldn't approve this schema," "This Slack thread is too polite — real teams push back harder." Iterate until it passes the smell test.
06
Package & Verify
Same packaging as top-down: Docker environment, verification scripts for each SDLC stage, metadata, documentation.

04 — Recommendation

Start top-down. Fill gaps bottom-up.

Top-down first, bottom-up to fill gaps.

Start with top-down for the first 10–20 environments. Real projects give you authentic artifacts fast. The messy reality is the product — the Slack thread where someone says "this won't work because the batch job locks the table" is worth more than a perfectly crafted synthetic scenario.

Then use bottom-up to fill specific gaps — scenarios you know exist in the market but haven't delivered as projects. These bottom-up environments will be informed by patterns you've already seen across the top-down ones, so they'll be more realistic than if you started bottom-up from day one.

The key question for the team: Do we have a completed project — ideally financial services — where we still have access to the artifacts (Jira, Confluence, Slack, repo, design docs)? That becomes our first environment.

05 — The Product Vision

A machine that turns past work into a new revenue stream.

The end state is a repeatable process — a pipeline that takes a real enterprise project and produces a packaged RL training environment that AI labs will buy.

Input

Completed enterprise project with full artifact history (PRDs, Jira, Slack, code, reviews, tests, runbooks)

Our Pipeline

Inventory, sanitize PII/PHI/IP, generalize, build Docker sandbox, write verification scripts

Output

Packaged RL environment with 8 SDLC layers, multi-stage verification, ready to sell to AI labs