02·THERMOCLINE·200–800 m

How it thinks, stage by stage.

A fifteen-stage reasoning pipeline engineered for maritime safety. Hybrid retrieval, neural reranking, authority hierarchy, self-correcting search, citation validation, and a sufficient-context autorater — all running offline, on vessel hardware.

15Pipeline stages · per question
290+Tests · gate every build
< 25 sAverage latency · offline
27 MBRuntime binary · single file
Pipeline · fifteen stages

Every question. Every stage.

No single model answers your crew. Each question passes through an engineered sequence before a word reaches the bridge.

01

Question Sanitisation

Production-hardened input handling

Every question is cleaned of typographic characters that can break downstream processing — smart quotes, em dashes, ellipsis characters. This stage exists because a real crew member's question once crashed the system with a single special character.

02

Query Decomposition

Compound questions split into retrieval paths

"Fire drill procedures while transiting a TSS" becomes two separate searches — one for fire drills, one for traffic separation — then results are merged. Compound maritime questions are the norm, not the exception.

03

Domain Vocabulary Expansion

Bridging crew language to regulatory text

"Night transit" expands to include "hours of darkness, sunset, sunrise, navigation lights." Over twenty curated maritime vocabulary mappings ensure the right documents are found regardless of how the question is phrased.

04

Multi-Signal Retrieval

Semantic + keyword search, mathematically fused

Two independent search systems run in parallel. Semantic search catches conceptual matches. Keyword search catches exact terms. Reciprocal Rank Fusion automatically promotes documents that both systems agree on.

05

Relevance Feedback Expansion

Second-pass retrieval at zero LLM cost

The system examines its own top results and uses their mathematical representations to expand the search — catching documents too different in vocabulary for the initial search. A purely mathematical operation completed in milliseconds.

06

Neural Reranking

Cross-attention precision scoring

Twenty candidates pass through a specialised neural model that examines the question and each document together — not independently. This is the single most important quality step, producing calibrated confidence scores for every candidate.

07

Authority Hierarchy

SMS outranks regulation, by design

Every document carries its authority tier — vessel-specific, company SMS, fleet-type, regulation, or reference. A tier multiplier nudges rerank scores so your procedures lead, regulations corroborate. Near-ties break in favour of higher authority.

08

Confidence Gating

Multi-threshold quality control

Scores are normalised to a 0-to-1 scale and filtered. Below the hard threshold: discarded. Below the soft threshold: the language model is instructed to flag uncertainty. This prevents overconfident answers from marginally-relevant documents.

09

Self-Correcting Search

Detects its own failures and retries

When retrieval returns insufficient results, the system reformulates the question into formal regulatory vocabulary and retries. Results are re-scored against the original question to preserve intent. When this happens, the crew is told — the confidence badge reflects that we had to work harder.

10

Intelligent Context Assembly

Parent-child expansion and diversity enforcement

When a retrieved chunk is part of a longer procedure, the parent section is pulled in automatically — step 7 of a 9-step man-overboard procedure never arrives in the model's context without the other eight. Source diversity is enforced.

11

Attention-Optimised Ordering

Matching how language models actually read

Research shows language models pay most attention to the start and end of their input. The system reorders passages so the most relevant content occupies these high-attention positions, with weaker material in the middle.

12

Cited Answer Generation

Every claim traced to its source

A compact language model generates answers with inline numeric citations, resolved after generation to document name, section, and page number. Domain completeness rules enforce that safety-critical procedures always return every mandatory step — no summaries.

13

Citation Validation

The hallucination guard

Every bracketed source number in the answer is checked against the chunks that were actually retrieved. If the model invents "[47]" when only [1]–[5] exist, we catch it — the crew sees a warning instead of a confident fabrication.

14

Sufficient-Context Autorater

Based on Google ICLR 2025 research

A second, short model call checks whether the retrieved context actually contained enough information to answer the question — independent of the answer the model wrote. Sufficient, partial, or insufficient. The verdict corrects the keyword-based refusal detector in both directions.

15

Confidence Scoring and Safety Gates

High / Medium / Low with transparent reasoning

Every answer gets a composite confidence tier derived from reranker scores, source diversity, citation verification rate, and whether the pipeline had to self-correct. Refusal detection is length-aware. Verified answers are cached semantically with re-verification on every cache hit.

Architecture

Ranked by authority. Never by popularity.

Every document carries an authority tier. Retrieval fuses semantic and keyword signals; reranking weighs cross-attention scores; a tier multiplier breaks near-ties in favour of your own procedures.

VesselVessel-specific procedures — bridge, engine room, muster×1.3
SMSCompany Safety Management System manuals×1.2
FleetFleet-wide standing orders and circulars×1.1
RegulationSOLAS, MARPOL, COLREGs, flag-state rules×1.0
ReferenceBridge team management, maritime English, guidance×0.9
Principles

Six non-negotiables. Every build.

Offline-first

No cloud calls. Every stage runs on vessel hardware. The only shore-side component is bundle update delivery.

Deterministic

Near-zero temperature in generation. Tie-breakers follow authority hierarchy. Same question, same context, same answer.

Auditable

Every citation resolves to a document, section, and page. Every confidence score explains which of four signals produced it.

Length-aware

Refusal detection distinguishes a one-line "not in the bundle" from a long answer that merely acknowledges a scope gap.

Tested

290+ automated tests gate every build. 35-case adversarial maritime corpus runs on every release.

Transparent

Source tier, confidence tier, and verification status are visible to crew on every answer. No black box.

— The deep end —

Engineered for the moment it matters. Bring it onboard.