Enterprise Data Glossary Platform

01 Automated Data Cataloging (Code & DB Scanners)

Technical Team ↓

Data attributes are auto-discovered from source code, database schemas, and running systems. Engineers don't separately "catalog" their data — the platform catalogs itself as code is written and deployed.

In-Code Annotations (Java / Python / TypeScript)

Engineers annotate fields directly in their code. The scanner extracts these during the build process.

// Java example — Portfolio Accounting service
@DataAttribute(
  name = "trade_settlement_date",
  description = "T+2 settlement date for equity trades per DTCC rules",
  owner = "portfolio.accounting@bank.com",
  domain = "TRADE_SETTLEMENT",
  pii = false,
  lineage = { "trade_capture.execution_date", "calendar.settlement_rules" }
)
private LocalDate settlementDate;

Jenkins / CI-CD Pipeline Integration

Build stage — glossary-plugin scans source code for @DataAttribute annotations

Validation stage — checks for duplicate attribute names, missing ownership, PII without classification

Publish stage — attributes pushed to Glossary API; delta from previous build shown in PR

Fail-fast — build fails if new PII attribute is defined without owner; forces compliance upfront

Database Schema Scanner

Scheduled scan — agents connect to target databases (Oracle, Postgres, SQL Server, Snowflake, DB2)

Metadata extraction — tables, columns, data types, nullability, foreign keys, indexes

Sampling — reads sample rows (anonymized) to improve classification accuracy

Change detection — compares against previous scan; flags new columns, dropped tables, type changes

Confluence / Wiki Crawler

Crawls configured Confluence spaces on a schedule

LLM extraction of data attribute definitions from narrative text

Links documentation to corresponding code/DB attributes

Flags drift between documented definition and actual implementation

Systems & Integrations

Glossary SDK (Java, Python, TS, C#) Jenkins Plugin GitHub Actions GitLab CI Azure DevOps Bitbucket Pipelines Confluence API JDBC / ODBC Connectors

02 Manual Data Entry Interface

Data Owners ↓

Not all data is in code. Fund NAV definitions, trading rules, counterparty reference data, and operational SOPs often live in Excel or team heads. Data owners need a simple, non-technical UI to maintain this data.

Entry Surfaces

Web form UI — guided forms with required fields, auto-suggestions from IB domain ontology, validation

Bulk Excel upload — downloadable template; uploaded file validated; errors shown inline; partial loads supported

Excel add-in — edit directly in Excel; changes sync to the platform on save

Mobile-friendly UI — senior data owners can approve changes on the go

Guided Data Capture

Domain-aware prompts — if owner selects "Fund Services" domain, system offers templates for NAV, AUM, subscriptions

Relationship suggestions — "This looks similar to 'fund.nav_daily' in Fund Accounting. Is it the same?"

Inline PII detection — real-time warning if entered value patterns match PII (SSN, email, phone)

Auto-save drafts — work-in-progress never lost; multi-session editing

Required Metadata

Attribute name Business definition Data type Allowed values Source system(s) Owner Domain PII classification Regulatory tags Update cadence

03 Ownership & Change Management Workflow

Admins & Owners ↓

People leave, teams reorganize, business responsibilities shift. Without an ownership transition workflow, data becomes orphaned and governance falls apart.

Ownership Transfer Workflow

Initiation — outgoing owner, admin, or manager initiates transfer of N attributes

Proposed successor — new owner proposed; system validates they have relevant domain access

Acceptance — proposed owner must explicitly accept; receives email + in-app notification

Handover brief — system auto-generates summary: definitions, downstream consumers, recent changes, open issues

Manager approval — both outgoing and incoming managers sign off

Effective date — transfer takes effect; audit log preserves history; consumers notified

Orphaned Data Workflow

System detects orphaned attributes (owner left, no successor, no activity for N days)

Orphan report generated weekly; escalated to domain head

Domain head assigns interim owner; 30-day SLA for permanent reassignment

If SLA breached: escalation to divisional CTO / CDO

Change Request Workflow

Any user can propose a change to an attribute (definition, classification, lineage)

Owner receives notification; approves, rejects, or requests clarification

For high-impact changes (PII classification, regulatory attribute), compliance review required

Approved changes create new version; downstream consumers notified; old version preserved for audit

04 Data Hygiene Engine

Platform Automation ↓

The most important module — without automated hygiene, the glossary rots within a year. This is what makes the platform maintain its value over time.

Duplicate Detection

Exact match — same attribute name in same domain; flag immediately

Semantic similarity — ML model compares definitions; "trade_date" vs "execution_date" vs "transaction_date"

Value-pattern match — sampled values follow identical patterns (e.g., two attributes both holding ISINs)

Resolution workflow — owners of potential duplicates collaborate to merge, alias, or declare distinct

Broken Lineage Detection

Attribute declared lineage to source that no longer exists (deleted table, renamed column)

Chain interruptions: A → B → C, but B has no evidence of producing C

Orphaned downstream consumers: something depends on attribute that was deprecated

Weekly broken-lineage report; ticketed to owners with 14-day SLA

Unclear Ownership Detection

Owner email address bounces

Owner hasn't logged into platform for 90 days

Group ownership with no named escalation point

Ownership claimed by role that no longer exists in HR system

Data Quality Scoring

Every attribute gets a quality score (0-100) based on:

Completeness of metadata Ownership clarity Lineage completeness Documentation quality PII classification Downstream usage Freshness of updates Dispute history

05 Lineage Visualization & PII Heatmap

Executive + Compliance ↓

Interactive graph visualization of how data flows through the organization. PII is visually highlighted at every node so compliance can quickly assess exposure.

Example: Trade Settlement Lineage

order_management.client_order

→

trade_capture.execution

→

clearing.matched_trade

→

settlement.settled_trade

client.account_id

→

client.account_id

→

clearing.counterparty_ref

→

settlement.beneficiary_id

market_data.price_feed

→

trade_capture.notional

→

risk.position_exposure

Lineage Features

Upstream view — "Where does this attribute come from?" Trace back to originating system

Downstream view — "Who uses this attribute?" See all consumers including reports, regulatory filings

Column-level lineage — not just table-to-table; specific column transformations

Transformation detail — click an edge to see the SQL or code that produces the derivation

Impact analysis — "If I change this, what breaks?" Cascade view with affected systems

PII Heatmap

Organization-wide map of where PII is stored, processed, and transmitted

Classification levels: SSN/TIN, name, email, phone, address, DOB, account numbers, biometric

Filter by jurisdiction: GDPR scope, CCPA scope, India DPDP, Singapore PDPA

Risk scoring by PII volume × system criticality × access controls

Regulatory report export for audits (e.g., Article 30 records under GDPR)

06 Investment Banking Domain Knowledge

Core Intelligence ↓

The differentiator. The platform ships with built-in ontologies for investment banking domains. When scanning code or databases, it automatically recognizes attributes based on domain patterns — not just generic metadata extraction.

Wealth Management

Client lifecycle, portfolio management, financial planning, advisory

client_id ssn_tin client_name aum risk_tolerance investment_objective portfolio_id asset_allocation holdings benchmark suitability_score ips_version beneficiary_info fee_schedule

Equity Trading

Order management, execution, allocation, reporting

order_id ticker isin cusip sedol side order_type quantity limit_price fill_price venue execution_id commission settlement_date corporate_action

Derivatives

Options, futures, swaps, structured products

contract_id underlying strike expiry option_type notional premium delta gamma vega theta implied_vol cva xva isda_agreement csa_terms

Fund Services

NAV calculation, transfer agency, fund accounting, distribution

fund_id nav_per_unit total_assets aum subscription redemption distribution management_fee performance_fee expense_ratio unitholder_id valuation_date cutoff_time

Prime Brokerage

Margin, financing, securities lending, consolidated reporting

margin_account initial_margin variation_margin excess_equity buying_power financing_rate hypothecation_flag rehypo_limit concentration_risk stress_test_result

Securities Lending

Loans, collateral, recalls, corporate actions on loaned securities

loan_id lender borrower loaned_quantity rebate_rate fee_rate collateral_type collateral_value haircut recall_date indemnification

Customer & Account Onboarding

KYC, AML, CIP, documentation, approvals

customer_id legal_name dob tax_id address citizenship tax_residency pep_status sanctions_screen_result kyc_risk_tier cdd_edd_status source_of_wealth beneficial_owner fatca_status crs_classification

Post-Trade & Settlement

Clearing, settlement, reconciliation, regulatory reporting

trade_id settlement_date trade_date clearing_house settlement_cycle cash_account security_account failed_trade_flag reconciliation_break cat_report_id mifir_report_status emir_uti

How Domain Knowledge Powers Auto-Discovery

Scanner encounters column exec_px in trading database

Ontology matches pattern: equity trading → execution → fill_price alias

Auto-classification: domain = Equity Trading, standard_name = fill_price, PII = false

Suggested owner = team registered for Equity Trading domain; requires confirmation

Linked to existing canonical definition of fill_price in glossary

07 Executive Dashboards & Search

Senior Leadership ↓

What senior executives actually see. The platform synthesizes the technical metadata into business-meaningful views for regulatory, compliance, and technology leadership.

Regulatory Readiness Dashboard

% of regulatory-impact attributes with verified ownership

% with complete lineage to source

Open audit findings by severity

Time-to-respond benchmark for regulatory data requests

Coverage by regulation: GDPR, CCPA, MiFIR, EMIR, CAT, SOX, DPDP

Data Health Scorecard (by Business Unit)

Wealth Management: 87% coverage, 12 orphans, 34 duplicates pending resolution

Equity Trading: 93% coverage, 3 orphans, lineage 91% complete

Fund Services: 76% coverage, flag: significant manual data not yet cataloged

Trend analysis: quarter-over-quarter improvement/regression

Natural Language Search (LLM-powered)

Users search in plain English; LLM interprets against the metadata graph:

# Example queries
"Where do we process EU client PII for derivatives trading?"
# Returns: 14 systems, 47 attributes, ownership matrix, GDPR Art. 30 record

"Who owns NAV calculation for Alternative Funds?"
# Returns: Fund Accounting team, specific owner, last updated 3 days ago

"Show me orphaned attributes in Prime Brokerage"
# Returns: 8 attributes, last owners, suggested new owners

"What regulatory reports use trade_settlement_date?"
# Returns: MiFIR Transaction Reporting, CAT, internal Trade Blotter

08 Publishing & External Data Exchange

Technical + Compliance ↓

Reference data, regulatory reports, and client feeds all need to be published to external parties. The platform manages publishing as a governed, auditable function.

Internal Publishing

Downstream teams subscribe to attribute "channels" (e.g., "wealth.client_reference")

Changes publish as events to Kafka topics; downstream systems consume in real-time

Schema evolution managed; breaking changes require consumer sign-off before release

Versioned APIs (v1, v2) with deprecation schedules

External Publishing (Regulators, Clients, Counterparties)

Regulatory submissions — MiFIR, EMIR, CAT, FR Y-14 — data mapped from glossary to regulatory schemas

Client reference data — daily position feeds, consolidated reports, tax packages; mapped to client-specific formats

Counterparty exchange — trade confirmations, collateral statements via SWIFT, FpML, FIX

Partner integrations — API-based sharing of reference data with fund administrators, custodians

Governance Controls

PII attributes blocked from external publishing unless explicitly approved

Data contracts define exactly what each consumer can access

Complete audit trail: who accessed what, when, via what channel

Right-to-erasure workflow cascades through all external consumers

Protocols & Standards Supported

REST API GraphQL Kafka Streams SWIFT MT/MX FpML (Derivatives) FIX Protocol ISO 20022 SFTP Batch OData Parquet / Iceberg Tables

Data fragmentation at scale.

Data Scattered Everywhere

Unclear Ownership

Duplicate Definitions

PII Blindness

Broken Lineage

Stale Documentation

No Discovery

External Data Sharing

Three distinct personas, one platform.

Senior Executive Leadership

Technical Team

Data Owners

System landscape at a glance.

8 modules that solve the full problem.