Our Methodology

The Full Enterprise RL Environment

Current RL environments test whether a model can edit code and pass a test. Our environments test whether a model can operate as an enterprise software engineer — navigating requirements, making architectural decisions, and delivering within organizational constraints.

01 — The Gap

What exists today vs. what enterprises need.

Existing RL environments do a great job of training models on coding tasks. We take it further — adding the full enterprise context so models learn to operate as software engineers within real organizational constraints.

Standard RL Environments

The foundation
  • Single Dockerfile with a focused codebase
  • Clear instruction file describing the task
  • Model edits code, runs tests, pass/fail
  • Great for training core coding ability
  • Efficient to produce and evaluate at scale
  • Well-suited for general-purpose model improvement
  • Trains models to write code effectively

Full Enterprise RL Environments

Where we add value
  • Multi-service architecture with real infrastructure
  • Complete SDLC artifacts: PRDs, designs, stories, discussions
  • Model navigates ambiguity, makes decisions, implements
  • Business context drives every technical choice
  • Team discussions, review comments, architectural debates
  • Regulatory and compliance verification built into tests
  • Trains models to be enterprise software engineers

02 — The 8 Layers

Every environment contains the full picture.

A real enterprise project is not just code. It's requirements, debates, decisions, constraints, reviews, and organizational reality. Our environments capture all of it.

01 Business Context

PRD Business case Regulatory requirements brief Stakeholder map Success metrics Competitive context

02 Discovery & Scoping

Technical scoping doc Architecture Decision Records Team discussion threads Meeting notes Risk assessment Vendor evaluation Dependency map

03 Planning & Design

Epics & stories Acceptance criteria Technical design doc API contracts (OpenAPI) Data model / ERD Sequence diagrams Implementation plan Security threat model

04 The Codebase

Multi-service repo Infrastructure as Code CI/CD pipeline configs Database migrations Environment configs Existing technical debt Existing tests (some flaky)

05 Review & Quality

PR review history Code style guide Security review checklist Architecture board feedback Performance benchmarks Quality gates

06 Testing

Test plan Test cases (unit, integration, E2E) UAT scenarios Regression suite Performance test criteria Test data specs Known bugs / tech debt

07 Operations

Deployment runbook Rollback plan Monitoring dashboards Alerting rules On-call procedures SLA definitions

08 Organizational Context

Team structure Approval workflows Change management process Cross-team dependencies Institutional knowledge Escalation paths

03 — Anatomy

What an environment looks like on disk.

payment-gateway-migration/ context/ business/ prd.md # Product requirements regulatory-brief.md # Compliance requirements driving this stakeholders.md # Who owns what, who approves discovery/ scoping-doc.md # Technical feasibility analysis adr-001-event-sourcing.md # Architecture decision records adr-002-kafka-vs-rabbitmq.md team-discussion-thread.md # Sanitized Slack/Teams thread risk-assessment.md planning/ epics.md # Jira-style epics with acceptance criteria stories.md # User stories with story points technical-design.md # Component architecture, data flow api-contract.yaml # OpenAPI spec data-model.sql # Schema design implementation-plan.md # Phased rollout, feature flags reviews/ pr-review-history.md # Past PR reviews showing team standards security-checklist.md # Security team's review template style-guide.md # Code conventions operations/ deployment-runbook.md # Step-by-step prod deploy monitoring.md # What to watch, alerting rules sla.md # 99.95% uptime, 200ms p95 environment/ Dockerfile # Multi-service sandbox docker-compose.yml # Services + Kafka + Postgres + Redis services/ payment-api/ # The main service codebase fraud-detection/ # Downstream consumer settlement/ # Batch settlement service legacy-gateway/ # The system being replaced infra/ terraform/ # IaC configs ci-cd/ # Pipeline definitions db/ migrations/ # Existing schema migrations seed-data.sql # Synthetic test data tests/ test.sh # Test harness entry point test_functional.py # Does it work correctly? test_regression.py # Did anything else break? test_compliance.py # Audit trail, PII masking, regulatory format test_performance.py # Latency, throughput, no degradation test_security.py # Auth, encryption, input validation task.toml # Metadata: difficulty, timeouts, resources instruction.md # What the model sees

04 — Multi-Skill Verification

Test the model at any stage of the SDLC.

Because the environment contains the full project lifecycle, we can set the task entry point at any stage — and verify the model's output against what actually matters at that stage.

Task TypeWhat the Model DoesVerification
Requirements → DesignGiven PRD + team discussions, produce a technical design documentAll requirements addressed, architecture is sound, edge cases covered
Design → StoriesGiven design doc, break into epics and stories with acceptance criteriaCompleteness, dependency ordering, story sizing, no gaps
Stories → ImplementationGiven stories + existing codebase, implement the featureTests pass, code review standards met, no regression, compliance checks
Code → ReviewGiven a PR with planted issues, provide thorough reviewCatches bugs, security issues, style violations, suggests improvements
Incident → FixGiven an incident report + codebase, find root cause and fixFix resolves the issue, no new problems, includes post-mortem
Migration → DeliveryGiven legacy system + target spec, plan and execute migrationFunctional parity, backward compatibility, no data loss, performance maintained
Full SDLCGiven just the PRD, produce everything through to working codeMulti-stage verification at each phase of delivery

05 — Our Approach

Built top-down from real projects.

We don't imagine what enterprise environments should look like. We take real projects we've delivered, strip all identifying and sensitive information, and what remains becomes the environment. The artifacts are authentic because they came from reality.

01

Select Project

Pick a completed enterprise engagement — a migration, integration, or platform build that represents a common pattern.

02

Inventory Artifacts

Catalog every artifact: PRDs, design docs, Jira exports, discussion threads, code, reviews, test plans, runbooks.

03

Sanitize

Strip all PII, PHI, client names, proprietary business data. Replace with synthetic equivalents that preserve the shape and complexity.

04

Generalize

Abstract client-specific details into industry patterns. "Acme Bank" instead of the real name, synthetic data with the same structure.

05

Package & Verify

Build Docker environment, write verification scripts for each SDLC stage, validate that the environment tests what matters.

06 — Our Differentiation

What sets our environments apart.

Deep Domain Experience

Years of hands-on enterprise implementation gives us an intuitive understanding of how these projects actually unfold — the change management processes, the DBA pushback, the compliance requirements. That lived experience shapes every environment we build.

Rooted in Real Work

Our environments are derived from real project patterns, not imagined from scratch. The messy discussion thread where someone says "this won't work because the batch job locks the table" — that texture comes from having been there.

Full SDLC Verification

We go beyond testing whether code passes a unit test. Our verification spans the entire delivery lifecycle — requirements understanding, architectural decisions, code quality, compliance, and operational readiness.

Compounding Value

Each environment teaches models how enterprises actually work, which makes them better at helping enterprises — which is exactly what AI labs want to offer their customers. Better environments lead to better models lead to stronger demand.