sitkastack

A worked example

What audit-ready AI vendor risk classification actually looks like.

The document below is the rendered output of the sitkastack vendor-risk-triage framework v1.0.5, run against a third-party AI vendor submission with OSFI Guideline E-23 (2027) as the regulatory corpus.

It is not a mockup. It is the real artifact a regulated Canadian financial institution would hand to an auditor when defending a third-party AI vendor decision in 2027.

Why this matters now. As of today, federally regulated Canadian financial institutions have roughly 11 months to be able to defend AI/ML model risk decisions to OSFI with documented evidence. This is what that defense should look like.

↓ Framework output · rendered audit pack ↓

Vendor ID
demo-vendor-003-textlens
Jurisdiction
CA
Classification
SaaS
AI usage level
Operational decisions
Decision ID
d-2796d7d2-9df0-45aa-9df1-f9e3a24cc13a
Decision timestamp
2026-05-28 17:51:38 UTC

Classification rationale

The vendor processes financial PII through a third-party model with cross-border data flow, has incomplete training data provenance disclosures, retains production decision logs for only 90 days, and has not documented adversarial-input controls. These factors compound to warrant an elevated tier under the cited regulatory guidance. Conditional approval pending the listed mitigations.

Evidence cited

Input fieldReasoning
$.data_residencyCross-border residency triggers heightened scrutiny per retrieved guidance (osfi-e23:guideline-2027:page-12).
$.training_data_provenanceTraining data sourced from external aggregators without documented lineage falls short of OSFI E-23's requirement that model data be "traceable (that is, having documented lineage and provenance)" (osfi-e23:guideline-2027:page-14).
$.production_log_retention_daysProduction decision logs retained 90 days only. OSFI E-23 expects decommissioned models and documentation to be retained "for a set period as a benchmark or fallback"; the deploying institution should verify 90 days satisfies its own retention policy (osfi-e23:guideline-2027:page-19).
$.input_validation_controlsVendor disclosed no formal prompt-injection or adversarial-input controls. OSFI E-23 requires "performance of risk assessments for related risks prior to deployment such as cybersecurity risk, infrastructure vulnerabilities, and other potential operational risks" (osfi-e23:guideline-2027:page-18).

Required mitigations

  1. Obtain a signed data-processing addendum covering cross-border processing before go-live.
  2. Require vendor to document training data lineage and refresh provenance disclosures annually.
  3. Negotiate extension of production decision log retention to a minimum of 18 months, or document the institution's own retention as compensating control.
  4. Require vendor attestation of prompt-injection and adversarial-input controls; re-validate at next contract renewal.

Accountable owner

Senior Vendor Risk Manager

Confidence

Confidence: 0.74 (moderate)

Audit trail

Decision IDd-2796d7d2-9df0-45aa-9df1-f9e3a24cc13a
Decision timestamp2026-05-28 17:51:38 UTC
Input submission IDdemo-vendor-003-textlens
Input schema version1.0.0
Output schema version1.3.0
Agent versionvrt-agent-v1.0.5-function-function:_call:-prompt-69ef583c6dbe
Generated by sitkastack vendor-risk-triage framework v1.0.5. Document content reflects the deploying organization's submission and the framework's classification at decision time.

↑ End of framework output ↑

What you are looking at

  1. The tier classification is grounded, not vibes. "Tier 3 (elevated)" is not a hand-waved label. It is a structured classification driven by the cited evidence below and the relevant passages retrieved from OSFI E-23 itself. Same input, same retrieved corpus, same output. Byte-for-byte reproducible.
  2. The evidence citations are audit-traceable. Each evidence row references both a specific input field ($.data_residency) and the regulatory passage it triggered (osfi-e23:guideline-2027:page-12). That citation points to a specific page of the published OSFI guideline. An auditor can verify it in seconds.
  3. Required mitigations are auditable conditions. "Conditional approval" is not a soft maybe. It means the vendor is approved subject to verifying each listed mitigation is satisfied before go-live. This is the pattern regulated industries already use for vendor onboarding; the framework just makes the AI-specific conditions explicit.
  4. An accountable owner is named. Every audit eventually asks "who signed off?" Each decision is assigned to a role (here: Senior Vendor Risk Manager). The framework does not let an AI decision exist without a human owner.
  5. The confidence signal is calibrated, not invented. The 0.74 / moderate band comes from a confidence model that is independently evaluated using Brier score, ECE, and reliability bins in the framework's eval suite. If the framework reports 0.74, that means historically 0.74-band predictions have been correct about 74% of the time. Not a vibe number.
    this decision: 0.74 0.0 0.5 1.0 0 .5 1 predicted confidence observed accuracy

    Illustrative reliability diagram from the framework's eval suite. Dashed diagonal is perfect calibration. Dots are measured accuracy per confidence bin. The dotted vertical line marks this decision's 0.74 confidence; the framework's measured accuracy in that band is 0.73, so the prediction is well calibrated.

  6. The audit trail is the entire point. Decision ID, timestamp, schema versions, agent version with a content-hash of the system prompt. Two years from now an auditor can reproduce this exact decision and verify nothing in the prompt changed silently. This is what "audit-ready" means in practice.

How this was generated

$ vrt triage demo-vendor.json --corpus osfi-e23 --render
[INFO] Loaded corpus osfi-e23 from cache (23 chunks indexed via BM25)
[INFO] Retrieved top-5 chunks for vendor submission
[INFO] Agent classification complete: tier_3_elevated / conditional_approve
[INFO] Audit pack rendered to ./output/audit-pack.html
[INFO] Record validated against output-contract 1.3.0

One command. The framework reads the vendor submission, retrieves the relevant passages from the OSFI guideline already in its corpus, runs the agent, validates the output against a frozen JSON Schema, and renders the audit pack. The framework currently supports four anchor regulations: osfi-e23, nist-ai-rmf, sox-pl-107-204, and eu-ai-act. Each runs the same pipeline with corpus-specific retrieval. Full source on GitHub under Apache 2.0.

See the framework

The full framework is open source. Read the code, run it locally, fork it.

github.com/sitkastack/vendor-risk-triage →

Email Robyn

Questions about the framework, or running it against your own vendor submissions?

robyn@sitkastack.com →