How it thinks, stage by stage.
A fifteen-stage reasoning pipeline engineered for maritime safety. Hybrid retrieval, neural reranking, authority hierarchy, self-correcting search, citation validation, and a sufficient-context autorater — all running offline, on vessel hardware.
Every question. Every stage.
No single model answers your crew. Each question passes through an engineered sequence before a word reaches the bridge.
Question Sanitisation
Production-hardened input handling
Every question is cleaned of typographic characters that can break downstream processing — smart quotes, em dashes, ellipsis characters. This stage exists because a real crew member's question once crashed the system with a single special character.
Query Decomposition
Compound questions split into retrieval paths
"Fire drill procedures while transiting a TSS" becomes two separate searches — one for fire drills, one for traffic separation — then results are merged. Compound maritime questions are the norm, not the exception.
Domain Vocabulary Expansion
Bridging crew language to regulatory text
"Night transit" expands to include "hours of darkness, sunset, sunrise, navigation lights." Over twenty curated maritime vocabulary mappings ensure the right documents are found regardless of how the question is phrased.
Multi-Signal Retrieval
Semantic + keyword search, mathematically fused
Two independent search systems run in parallel. Semantic search catches conceptual matches. Keyword search catches exact terms. Reciprocal Rank Fusion automatically promotes documents that both systems agree on.
Relevance Feedback Expansion
Second-pass retrieval at zero LLM cost
The system examines its own top results and uses their mathematical representations to expand the search — catching documents too different in vocabulary for the initial search. A purely mathematical operation completed in milliseconds.
Neural Reranking
Cross-attention precision scoring
Twenty candidates pass through a specialised neural model that examines the question and each document together — not independently. This is the single most important quality step, producing calibrated confidence scores for every candidate.
Authority Hierarchy
SMS outranks regulation, by design
Every document carries its authority tier — vessel-specific, company SMS, fleet-type, regulation, or reference. A tier multiplier nudges rerank scores so your procedures lead, regulations corroborate. Near-ties break in favour of higher authority.
Confidence Gating
Multi-threshold quality control
Scores are normalised to a 0-to-1 scale and filtered. Below the hard threshold: discarded. Below the soft threshold: the language model is instructed to flag uncertainty. This prevents overconfident answers from marginally-relevant documents.
Self-Correcting Search
Detects its own failures and retries
When retrieval returns insufficient results, the system reformulates the question into formal regulatory vocabulary and retries. Results are re-scored against the original question to preserve intent. When this happens, the crew is told — the confidence badge reflects that we had to work harder.
Intelligent Context Assembly
Parent-child expansion and diversity enforcement
When a retrieved chunk is part of a longer procedure, the parent section is pulled in automatically — step 7 of a 9-step man-overboard procedure never arrives in the model's context without the other eight. Source diversity is enforced.
Attention-Optimised Ordering
Matching how language models actually read
Research shows language models pay most attention to the start and end of their input. The system reorders passages so the most relevant content occupies these high-attention positions, with weaker material in the middle.
Cited Answer Generation
Every claim traced to its source
A compact language model generates answers with inline numeric citations, resolved after generation to document name, section, and page number. Domain completeness rules enforce that safety-critical procedures always return every mandatory step — no summaries.
Citation Validation
The hallucination guard
Every bracketed source number in the answer is checked against the chunks that were actually retrieved. If the model invents "[47]" when only [1]–[5] exist, we catch it — the crew sees a warning instead of a confident fabrication.
Sufficient-Context Autorater
Based on Google ICLR 2025 research
A second, short model call checks whether the retrieved context actually contained enough information to answer the question — independent of the answer the model wrote. Sufficient, partial, or insufficient. The verdict corrects the keyword-based refusal detector in both directions.
Confidence Scoring and Safety Gates
High / Medium / Low with transparent reasoning
Every answer gets a composite confidence tier derived from reranker scores, source diversity, citation verification rate, and whether the pipeline had to self-correct. Refusal detection is length-aware. Verified answers are cached semantically with re-verification on every cache hit.
Ranked by authority. Never by popularity.
Every document carries an authority tier. Retrieval fuses semantic and keyword signals; reranking weighs cross-attention scores; a tier multiplier breaks near-ties in favour of your own procedures.
Six non-negotiables. Every build.
No cloud calls. Every stage runs on vessel hardware. The only shore-side component is bundle update delivery.
Near-zero temperature in generation. Tie-breakers follow authority hierarchy. Same question, same context, same answer.
Every citation resolves to a document, section, and page. Every confidence score explains which of four signals produced it.
Refusal detection distinguishes a one-line "not in the bundle" from a long answer that merely acknowledges a scope gap.
290+ automated tests gate every build. 35-case adversarial maritime corpus runs on every release.
Source tier, confidence tier, and verification status are visible to crew on every answer. No black box.
— The deep end —