KaritKarma / News + Media
NewsForge
The wires go in.
One original story comes out.
NewsForge clusters duplicate coverage with multilingual vector embeddings, synthesizes one original article through your own AI provider, holds it for a human editor's approval, and publishes the approved story to up to ten CMS and social channels on its own schedule. Synthesis, not scraping. Humans approve, machines distribute.
- Lifecycle states
- 6
- Publish channels
- 10
- BYO AI providers
- 4
- Go test functions
- 848
- 01 ingestedstory.ingested
Source item embedded with the multilingual ONNX model
- 02 clusteredstory.clustered
pgvector matched an open cluster inside the collection window
- 03 synthesizedstory.synthesized
Tenant LLM drafted one original article; originality guard passed
- 04 in_reviewreview queue
Assigned to an editor; synthesis open for editing
- 05 approvedstory.approved
Distribution schedule set; empty schedules are rejected
- 06 publishedstory.published.*
Connector hub publishes to exactly the scheduled targets
editor
001 / What is NewsForge
NewsForge is KaritKarma's synthesis-first newsroom platform: seven Go services and a Next.js web app on Postgres 18 with pgvector and NATS JetStream. It ingests raw sources, clusters duplicate coverage by vector embedding, and synthesizes one original article per story through the tenant's own AI provider: Groq, OpenAI, Anthropic, or Gemini. A configurable originality guard scores every draft against its sources, a human editor must approve every story, and one approval publishes to up to ten channels, from Loom and WordPress to X, Facebook, and ViewCasta. Multi-tenant with row-level security, AES-256-GCM encrypted credentials, and an audit event on every gated action.
002 / What it does
Four jobs of a synthesis newsroom.
Each pillar maps to a real package in the codebase. The product is the integration of these four, not a list of feature ticks.
Wires and reporters, one front door.
01 / SourceA Go crawler parses feeds with gofeed and extracts article bodies with goquery, scheduled from a DB-backed source registry with a seen-URL dedup ledger. Per-source circuit breakers (fail threshold, cooldown, half-open probes) let a dead source back off without blocking the rest. Reporters file directly through the desk API with drafts, asset attachment, and an atomic filing advance.
Source / internal/ingestion/
Synthesis, not scraping.
02 / SynthesizeDuplicate coverage is clustered by multilingual ONNX sentence embeddings over pgvector. Once a cluster meets its configured minimum source count and collection window, the engine synthesizes one original article and persists it before publishing the event. LLM failures trip a circuit breaker and surface on a gauge; they are never silently continued.
Source / internal/synthesis/engine/
Original by measurement, approved by humans.
03 / GuardAn n-gram shingle-overlap guard scores every draft against its sources, from 0 (fully original) to 1 (fully copied), and re-prompts until the draft clears the tenant's threshold. Then mandatory human review: no story reaches a front page without an editor's approval and an explicit distribution schedule.
Source / internal/synthesis/synth/ + internal/editorial/
Translate once, serve forever.
04 / LocalizeStories are synthesized once in a canonical language and served verbatim with zero LLM calls. Any other language is translated on demand and cached per story and language. Failed translations are never cached, so one bad call never poisons the archive.
Source / internal/synthesis/localize/
003 / The lifecycle
Six states. Forward only.
Every story moves through the same six states in order. The transition function rejects illegal moves atomically and emits the matching event on NATS JetStream. The audit trail is the database, not a log file.
- State 01
ingested
A source item lands from the crawler or a reporter filing and is embedded with the multilingual ONNX sentence model. Per-source circuit breakers let a dead feed back off without blocking healthy ones.
- State 02
clustered
pgvector ANN search matches the item against open clusters, scoped by similarity threshold and category. Clusters gather sources for a configured collection window before synthesis fires.
- State 03
synthesized
The tenant's own LLM writes one original article per cluster. The originality guard measures n-gram shingle overlap against every source and re-prompts until the draft clears the configured threshold.
- State 04
in_review
Every synthesized story enters the human editorial queue. Editors pick up or are assigned stories and can edit the synthesis before deciding. Nothing skips this state.
- State 05
approved
Approval carries an explicit per-story distribution schedule of named targets. An approval with an empty schedule is rejected. Every decision lands in the audit log.
- State 06
published
The connector hub publishes to exactly the scheduled targets, choosing publish or update from the prior publication record, and emits a per-channel published event.
004 / Architecture
Seven Go services. One web app.
The whole platform is Go 1.26.4 on Postgres 18 with pgvector, NATS JetStream for events, and Prometheus across every service. The Next.js app serves four surfaces: the public site, /app for editorial, /dash for tenant admin, /console for operators, plus a per-tenant portal at /np/{slug}.
cmd/backbone
cmd/crawler
cmd/reporter
cmd/synthesis-engine
cmd/editorial
cmd/connector-hub
cmd/tenant
web/
005 / Bring your own AI
Your newsroom, your model.
Every tenant picks its own provider and supplies its own key: Groq, OpenAI, Anthropic, or Gemini. The engine resolves that tenant's provider, key, and originality threshold at synthesis time. An unsupported provider is an explicit logged refusal, never a silent fallback to a platform key.
Keys and channel credentials are encrypted at rest with AES-256-GCM. Provider endpoints, retries, timeouts, token limits, collection windows, and guard thresholds are all configuration: 198 distinct keys, and the required ones fail boot loudly instead of defaulting quietly.
006 / Humans in command
Humans approve. Machines distribute.
The pipeline is automated; the judgment is not. Every synthesized story waits in the review queue until an editor approves it with an explicit distribution schedule. Field reporters file straight into the same lifecycle, and a single-source filing may legally skip clustering.
Story assets are designed to ride on Foveio, KaritKarma's media layer: reporters attach assets by reference, and each publishing channel gets its image renditions from operator-configured named presets. The media client ships in NewsForge today; the pairing goes live as Foveio launches. NewsForge stores asset IDs, never media bytes.
Reporter desk
Filings with drafts, asset attachment by reference through the built-in media client, and an atomic advance into the lifecycle.
internal/ingestion/reporter/
AI assist, three kinds
Notes to draft body, body to headline plus summary, body to typed entities. Assist drafts; humans file.
reporter/assist.go
Editorial review queue
Assignment and pickup, synthesis editing, approve or hold. An approval with an empty schedule is rejected.
internal/editorial/
Audit on every gate
Filings, transitions, approvals, config and credential changes, invites. Metrics pre-seed never-hit channels as visible zeros.
internal/backbone/audit/
007 / Distribution
One approval. Ten front doors.
The connector hub registers ten channels at boot and publishes to exactly the targets the editor scheduled, never "all tenant targets". Publish versus update is decided by the prior publication record per channel.
- Capability guards: a connector whose declared capabilities reject a story performs zero network I/O, proven by a parity test over every registered connector.
- Per-channel image renditions are requested from Foveio, the planned media layer, via operator-configured named presets.
- Every published channel emits its own per-tenant event on NATS JetStream.
008 / Questions
Frequently asked.
Mirrored in FAQPage JSON-LD so search and answer engines can lift these verbatim.
What is NewsForge?
Does NewsForge replace my CMS, or sit alongside it?
How does the editorial lifecycle work?
How does NewsForge keep articles original?
Which AI providers does NewsForge support?
How does NewsForge handle languages?
Launch your newsroom
One desk. Ten front doors. Tomorrow's edition, already on press.
Brief us on the desk, the language, and the publish targets. You get a NewsForge tenant with the six-state lifecycle live, a reporter desk, your own AI provider key encrypted at rest, and a connector hub wired to your CMS and social channels.