We help you bring large language models into your product safely — with grounded retrieval, eval harnesses, and the guardrails that keep hallucinations out of production.
What we deliver
RAG pipelines with vector stores and semantic retrievers tuned for your domain.
Claude / OpenAI / Anthropic SDK integration with streaming and cost tracking.
Golden-set evals and acceptance criteria before scaling AI features to users.
Prompt-caching cost optimisation and latency reduction.
Hallucination guardrails: grounded retrieval, output validation, and safety classifiers tuned per use-case.
Agent orchestration: multi-step workflows with tool-use, state handoff, repair loops, and audit trails — built on LangChain / MCP / Temporal depending on your stack.
Spend caps and usage monitoring with alerts on cost anomalies.
When this fits
Teams adding LLM features to an existing SaaS but unsure about hallucination control.
You need semantic search to replace brittle keyword matching.
You're evaluating multiple LLM vendors and need a harness to pick between them.
Proven in practice
Reference builds from our own work that exercise this capability end to end.
Healthcare — hospital medical affairs
Clinical Education Operations Platform
The problem
Multi-institution clinical education ran on spreadsheets and email — manual tutor matching, untracked teaching hours, and sensitive trainee documents processed by hand. None of it was auditable, and student data crossed institutional boundaries it shouldn’t.
What we built
A unified platform for a hospital group’s clinical-education operations — managing students, tutors, and teaching-hour billing across multiple institutions, with AI-assisted document processing and matchmaking.
▸Three-tier LLM orchestration (Claude Haiku / Sonnet / Opus) routed per task class, with per-request and per-institution daily cost ceilings enforced in Redis.
▸Six-stage document pipeline that tokenises Singapore NRICs before any AI call, then routes by confidence — auto-commit above 0.90, human review between 0.70 and 0.90, blocked below.
▸AI-assisted matchmaking and a clinical-curriculum RAG that retrieves then LLM-re-ranks learning objectives, with every model output schema-validated.
▸Per-institution row-level security enforced in Postgres via a per-request session GUC, isolating tenants at the database layer.
▸PII-masked, append-only audit logging — every payload redacted and tenant-scoped before it is written.
Compliance teams can’t read every regulator notice, circular, and guideline in time — and generic AI invents clause numbers and obligations that no auditor will accept. Worse, when a rule quietly changes, no one can see what moved or whether the firm’s own internal policy still covers it until an inspection finds the gap.
What we built
A RegTech platform that crawls financial-regulator publications, extracts binding obligations, and answers compliance questions through a hallucination-guarded RAG interface — then continuously checks those obligations against the firm’s own policies and alerts on every gap.
▸Source-traced RAG: every answer cites its regulatory source chunk; ungrounded responses are rejected before they reach the user.
▸Regulatory-change alerting: a scheduled job re-checks every published obligation and raises an alert (webhook or email) the moment a new or amended rule is no longer covered — gaps tracked through to resolution.
▸Rule-version redlining: every version of a regulator document is retained, so the current source-of-truth can be shown side-by-side with any prior version as a word-level diff of exactly what changed.
▸Company-policy gap analysis: each regulatory obligation is semantically scored against the firm’s own internal policies, and any obligation with no adequate policy coverage is flagged as a tracked gap.
▸Obligation extraction with binding-weight classification across seven regulatory document types.
▸Append-only, banking-grade audit logging with OIDC / SAML-verified identity; on-premise / air-gapped deployment with local LLM serving (vLLM) supported.
FastAPI
Next.js 16
PostgreSQL + pgvector
Celery / Redis
AWS Bedrock / Claude
vLLM
Public sector — HR & recruitment
Multi-Tenant Recruitment Platform (Gov-Cloud)
The problem
Government hiring is slow, manual, and spread across disconnected tools — while strict data-isolation and in-country residency rules rule out most off-the-shelf recruitment platforms. Recruiters drown in resume screening that AI could triage.
What we built
A multi-tenant applicant-tracking SaaS for government agencies on government commercial cloud, with in-region AI for resume parsing and semantic candidate search.
▸Hard tenant isolation: a per-tenant database connection plus per-tenant KMS key aliases, with separate keys for general data, interview notes, and exports.
▸Enterprise SSO implemented end-to-end — SAML 2.0, OIDC (PKCE), SingPass NDI, and SCIM 2.0 directory sync — with SMS OTP and step-up MFA on sensitive actions.
▸In-region AI on Bedrock — resume parsing, semantic candidate search, and interview-note summarisation — each gated by a DLP scanner (NRIC / FIN / passport / contact patterns) before any model call.
▸Dual-control bulk export: a second approver and a step-up token are required, releasing a 15-minute presigned, tenant-key-encrypted download.
▸Fully Terraform-provisioned (17 modules) across multi-AZ, with a WORM audit bucket (S3 Object Lock, COMPLIANCE mode) and UUIDv7 time-ordered audit rows.