Engineering for maritime safety

Not just AI. Thirteen stages of engineered reliability.

Every question passes through a multi-stage pipeline that decomposes, retrieves, reranks, self-corrects, assembles, generates, and verifies — before a single word reaches your crew.

13
Pipeline stages
per question
215+
Automated tests
across every stage
27
Megabytes
entire runtime binary
<2s
Cold start
ready to serve

Why “just use AI” doesn't work at sea

The basic approach to AI document search is straightforward: convert a question into a vector, find the closest text, feed it to a language model. For simple questions against simple documents, it works.

But maritime documentation isn't simple.

A single vessel carries thousands of pages across dozens of documents. Your SMS uses “fire and emergency drill” while SOLAS says “fire drill and practice.” A crew member asking about fire procedures while transiting a traffic separation scheme is asking two questions — and needs answers from both your company procedures and international regulations.

A basic system retrieves text about fire drills OR traffic separation. Not both. Not cited. Not verified.

The answer pipeline

Every stage exists because we encountered a real failure mode — with real maritime documents, real crew questions, and real safety implications.

01

Question Sanitisation

Production-hardened input handling

Every question is cleaned of typographic characters that can break downstream processing — smart quotes, em dashes, ellipsis characters. This stage exists because a real crew member's question once crashed the system with a single special character.

02

Query Decomposition

Compound questions split into retrieval paths

"Fire drill procedures while transiting a TSS" becomes two separate searches — one for fire drills, one for traffic separation — then results are merged. Compound maritime questions are the norm, not the exception.

03

Domain Vocabulary Expansion

Bridging crew language to regulatory text

"Night transit" expands to include "hours of darkness, sunset, sunrise, navigation lights." Over twenty curated maritime vocabulary mappings ensure the right documents are found regardless of how the question is phrased.

04

Multi-Signal Retrieval

Semantic + keyword search, mathematically fused

Two independent search systems run in parallel. Semantic search catches conceptual matches. Keyword search catches exact terms. Reciprocal Rank Fusion automatically promotes documents that both systems agree on.

05

Relevance Feedback Expansion

Second-pass retrieval at zero LLM cost

The system examines its own top results and uses their mathematical representations to expand the search — catching documents too different in vocabulary for the initial search. A purely mathematical operation completed in milliseconds.

06

Neural Reranking

Cross-attention precision scoring

Twenty candidates pass through a specialised neural model that examines the question and each document together — not independently. This is the single most important quality step, producing calibrated confidence scores for every candidate.

07

Confidence Gating

Multi-threshold quality control

Scores are normalised to a 0-to-1 scale and filtered. Below the hard threshold: discarded. Below the soft threshold: the language model is instructed to flag uncertainty. This prevents overconfident answers from marginally-relevant documents.

08

Self-Correcting Search

Detects its own failures and retries

When retrieval returns insufficient results, the system reformulates the question and retries the entire pipeline. Results are reranked against the original question, preserving intent. Essential for multinational crews using English as a second language.

09

Intelligent Context Assembly

Parent expansion, sibling retrieval, diversity enforcement

Retrieved passages are expanded with surrounding procedural context. Source diversity is enforced — no single document can dominate. Company procedures and international regulations are guaranteed to appear together when relevant.

10

Attention-Optimised Ordering

Matching how language models actually read

Research shows language models pay most attention to the start and end of their input. The system reorders passages so the most relevant content occupies these high-attention positions, with weaker material in the middle.

11

Cited Answer Generation

Every claim traced to its source

A compact language model generates answers with inline citations — document name, section, and page number for every claim. Operating at near-deterministic settings appropriate for safety-critical operations where creativity is a liability.

12

Citation Guarantee

Post-generation verification and injection

Every answer is checked for source citations. If the language model omitted them — which happens occasionally with any AI — the system automatically appends citations from the documents used. Every non-refusal answer is guaranteed to cite its sources.

13

Safety Gates and Caching

Refusal detection, logging, instant repeat answers

Refusals are detected and flagged for shore-side knowledge gap analytics. Verified answers are cached semantically so near-identical questions return instantly. Every query is logged for operational intelligence — no personal data, just the question and the answer.

What this means in practice

Compound questions work

"Engine room fire while transiting a narrow channel" retrieves fire procedures AND navigation rules AND your SMS guidance — not just whichever topic matches first.

Vocabulary mismatches bridged

A Filipino cadet asking about "abandoning ship" finds the same answers as a British chief officer asking about "emergency evacuation protocol."

Every answer cites its source

Document name, section, and page number for every claim. Verifiable against the actual manual before a Port State Control inspection.

Low confidence is flagged

When the system isn't confident, it says so — clearly and explicitly. In safety-critical operations, an honest "I'm not sure" is worth more than a plausible guess.

Repeated questions are instant

The semantic cache eliminates the 15-20 second processing delay for questions already answered reliably. Real crews ask similar questions repeatedly.

Self-correcting under the hood

When retrieval fails — crew jargon, abbreviations, second-language phrasing — the system detects it, reformulates, and retries before the crew member notices.

Runs on vessel hardware. No internet required.

Everything described above — the multi-stage retrieval, the neural reranking, the language model, the caching — runs on a single mini-PC that costs less than a day of PSC detention.

MinimumRecommended
CPUx86_64, 4 coresx86_64, 8+ cores
RAM16 GB32 GB
Storage50 GB SSD100 GB SSD
GPUNot requiredNot required
NetworkVessel LAN onlyVessel LAN only

No GPU. No cloud. No internet connection during operation. Crew access it through a web browser on any device connected to the vessel network.

Built in Rust. 27 megabytes.

The runtime is written in Rust — a systems programming language designed for reliability and performance. The compiled binary is 27 megabytes. It starts in under two seconds.

There is no Python. No Docker requirement. No dependency management. No package updates. No runtime that needs patching.

A single file, copied to a vessel, that works.

fullfathom — v2.0

$ ./fullfathom --bundle ./maritime-docs

Loading bundle... ok (1.2s)

Embedding model... ok (0.4s)

Reranker model... ok (0.3s)

USearch index... 4,808 chunks

BM25 index... 4,808 entries

Ready on http://ship-ai.local:8080

215 tests passed. 0 failures.

This is not a wrapper around ChatGPT. It's not a prompt template connected to a vector database. It is a purpose-built system engineered for the specific constraints of shipboard operations where accuracy is non-negotiable and connectivity is unavailable.

If it sounds like more than your IT team would build in-house — that's because it is.