Methodology

How the record was built

Ingest

We pulled every full event-level dump from the seven major historical catalogs and three contemporary collectors. Each was normalized to a common event schema with provenance preserved: every event row carries the original record's source name and catalog entry, and the raw JSON is retained alongside.

Deduplication

Records were joined across catalogs on date, location, and witness description signatures. When more than one catalog described the same event we collapsed them into a single canonical event with a Phenomainon Case File ID (PCF-NNNNNN) and recorded the contributing catalog entries in sources. 26,499 events survive multi-source corroboration.

Scoring

Every event receives four scores from 0–100, each driven by a separate signal:

Evidence — corroboration count, presence of instrumentation, named witnesses.
Narrative — coherence and detail of the incident description.
Cinematic — visual specificity, suitability for reconstruction.
Famous — historical recognition and downstream citation.

Embeddings

Every event's incident description is embedded into a 1024-dim vector using Voyage AI's voyage-3.5 model. The vectors live in a pgvector HNSW index inside Railway Postgres, which powers semantic search and the related-cases tool.

What we DON'T do

We don't ascribe explanations to events. We don't reveal private witness names or identifying details. We don't generate visual material from cases for which the source record lacks specific physical description. The record is what was reported — nothing more, nothing less.