Skip to content
Vol. III/Folio 2026Dhaka to Global/Edition 7c9155a
NewsForge

KaritKarma / News + Media

NewsForge

Dateline / Dhaka / 2026-06-12
Synthesis-first newsroom

The wires go in.
One original story comes out.

NewsForge clusters duplicate coverage with multilingual vector embeddings, synthesizes one original article through your own AI provider, holds it for a human editor's approval, and publishes the approved story to up to ten CMS and social channels on its own schedule. Synthesis, not scraping. Humans approve, machines distribute.

Lifecycle states
6
Publish channels
10
BYO AI providers
4
Go test functions
848
Story lifecycleForward only
  1. 01 ingestedstory.ingested

    Source item embedded with the multilingual ONNX model

  2. 02 clusteredstory.clustered

    pgvector matched an open cluster inside the collection window

  3. 03 synthesizedstory.synthesized

    Tenant LLM drafted one original article; originality guard passed

  4. 04 in_reviewreview queue

    Assigned to an editor; synthesis open for editing

  5. 05 approvedstory.approved

    Distribution schedule set; empty schedules are rejected

  6. 06 publishedstory.published.*

    Connector hub publishes to exactly the scheduled targets

Output: 10 channelsHuman approved

001 / What is NewsForge

NewsForge is KaritKarma's synthesis-first newsroom platform: seven Go services and a Next.js web app on Postgres 18 with pgvector and NATS JetStream. It ingests raw sources, clusters duplicate coverage by vector embedding, and synthesizes one original article per story through the tenant's own AI provider: Groq, OpenAI, Anthropic, or Gemini. A configurable originality guard scores every draft against its sources, a human editor must approve every story, and one approval publishes to up to ten channels, from Loom and WordPress to X, Facebook, and ViewCasta. Multi-tenant with row-level security, AES-256-GCM encrypted credentials, and an audit event on every gated action.

002 / What it does

Four jobs of a synthesis newsroom.

Each pillar maps to a real package in the codebase. The product is the integration of these four, not a list of feature ticks.

  1. Wires and reporters, one front door.

    01 / Source

    A Go crawler parses feeds with gofeed and extracts article bodies with goquery, scheduled from a DB-backed source registry with a seen-URL dedup ledger. Per-source circuit breakers (fail threshold, cooldown, half-open probes) let a dead source back off without blocking the rest. Reporters file directly through the desk API with drafts, asset attachment, and an atomic filing advance.

    Source / internal/ingestion/

  2. Synthesis, not scraping.

    02 / Synthesize

    Duplicate coverage is clustered by multilingual ONNX sentence embeddings over pgvector. Once a cluster meets its configured minimum source count and collection window, the engine synthesizes one original article and persists it before publishing the event. LLM failures trip a circuit breaker and surface on a gauge; they are never silently continued.

    Source / internal/synthesis/engine/

  3. Original by measurement, approved by humans.

    03 / Guard

    An n-gram shingle-overlap guard scores every draft against its sources, from 0 (fully original) to 1 (fully copied), and re-prompts until the draft clears the tenant's threshold. Then mandatory human review: no story reaches a front page without an editor's approval and an explicit distribution schedule.

    Source / internal/synthesis/synth/ + internal/editorial/

  4. Translate once, serve forever.

    04 / Localize

    Stories are synthesized once in a canonical language and served verbatim with zero LLM calls. Any other language is translated on demand and cached per story and language. Failed translations are never cached, so one bad call never poisons the archive.

    Source / internal/synthesis/localize/

003 / The lifecycle

Six states. Forward only.

Every story moves through the same six states in order. The transition function rejects illegal moves atomically and emits the matching event on NATS JetStream. The audit trail is the database, not a log file.

  1. State 01

    ingested

    A source item lands from the crawler or a reporter filing and is embedded with the multilingual ONNX sentence model. Per-source circuit breakers let a dead feed back off without blocking healthy ones.

  2. State 02

    clustered

    pgvector ANN search matches the item against open clusters, scoped by similarity threshold and category. Clusters gather sources for a configured collection window before synthesis fires.

  3. State 03

    synthesized

    The tenant's own LLM writes one original article per cluster. The originality guard measures n-gram shingle overlap against every source and re-prompts until the draft clears the configured threshold.

  4. State 04

    in_review

    Every synthesized story enters the human editorial queue. Editors pick up or are assigned stories and can edit the synthesis before deciding. Nothing skips this state.

  5. State 05

    approved

    Approval carries an explicit per-story distribution schedule of named targets. An approval with an empty schedule is rejected. Every decision lands in the audit log.

  6. State 06

    published

    The connector hub publishes to exactly the scheduled targets, choosing publish or update from the prior publication record, and emits a per-channel published event.

Transitions
Forward only
Illegal moves rejected atomically
Ingestion
Two paths
Crawler clusters; reporter filings may skip clustering
Events
5 NATS subjects
Per tenant, on JetStream

004 / Architecture

Seven Go services. One web app.

The whole platform is Go 1.26.4 on Postgres 18 with pgvector, NATS JetStream for events, and Prometheus across every service. The Next.js app serves four surfaces: the public site, /app for editorial, /dash for tenant admin, /console for operators, plus a per-tenant portal at /np/{slug}.

Go 1.26.4Postgres 18 + pgvectorNATS JetStreamONNX RuntimePrometheusNext.js 16
ServiceRoleStack
Backbone

cmd/backbone

Story store, lifecycle, tenancy RLS
Go, Postgres 18 + pgvector
Crawler

cmd/crawler

Feed ingestion + extraction
Go, gofeed, goquery
Reporter

cmd/reporter

Filings, drafts, AI assist
Go, Foveio media client
Synthesis engine

cmd/synthesis-engine

Clustering, synthesis, originality guard
Go, ONNX Runtime, pgvector
Editorial

cmd/editorial

Review queue, approval, scheduling
Go, audit, Prometheus
Connector hub

cmd/connector-hub

10-channel publishing fan-out
Go, capability registry
Tenant

cmd/tenant

Onboarding, secrets, billing
Go, AES-256-GCM vault
Web

web/

Public site, /app, /dash, /console
Next.js 16, React 19, Tailwind 4
848
Go test functions
377
Go files
39
SQL migrations
198
NF_* config keys

005 / Bring your own AI

Your newsroom, your model.

Every tenant picks its own provider and supplies its own key: Groq, OpenAI, Anthropic, or Gemini. The engine resolves that tenant's provider, key, and originality threshold at synthesis time. An unsupported provider is an explicit logged refusal, never a silent fallback to a platform key.

Keys and channel credentials are encrypted at rest with AES-256-GCM. Provider endpoints, retries, timeouts, token limits, collection windows, and guard thresholds are all configuration: 198 distinct keys, and the required ones fail boot loudly instead of defaulting quietly.

GroqOpenAIAnthropicGemini
tenant_ai_config / resolved at synthesis timeencrypted
providergroq | openai | anthropic | gemini
api_keyAES-256-GCM, tenant supplied
max_shingle_overlap0 original .. 1 copied, per tenant
NF_SYNTHESIS_*window, min sources, attempts
NF_EMBED_DIMrequired, fails boot if unset
unsupported providerexplicit refusal, logged
Source / internal/synthesis/provider/factory.go

006 / Humans in command

Humans approve. Machines distribute.

The pipeline is automated; the judgment is not. Every synthesized story waits in the review queue until an editor approves it with an explicit distribution schedule. Field reporters file straight into the same lifecycle, and a single-source filing may legally skip clustering.

Story assets are designed to ride on Foveio, KaritKarma's media layer: reporters attach assets by reference, and each publishing channel gets its image renditions from operator-configured named presets. The media client ships in NewsForge today; the pairing goes live as Foveio launches. NewsForge stores asset IDs, never media bytes.

Reporter desk

Filings with drafts, asset attachment by reference through the built-in media client, and an atomic advance into the lifecycle.

internal/ingestion/reporter/

AI assist, three kinds

Notes to draft body, body to headline plus summary, body to typed entities. Assist drafts; humans file.

reporter/assist.go

Editorial review queue

Assignment and pickup, synthesis editing, approve or hold. An approval with an empty schedule is rejected.

internal/editorial/

Audit on every gate

Filings, transitions, approvals, config and credential changes, invites. Metrics pre-seed never-hit channels as visible zeros.

internal/backbone/audit/

007 / Distribution

One approval. Ten front doors.

The connector hub registers ten channels at boot and publishes to exactly the targets the editor scheduled, never "all tenant targets". Publish versus update is decided by the prior publication record per channel.

  • Capability guards: a connector whose declared capabilities reject a story performs zero network I/O, proven by a parity test over every registered connector.
  • Per-channel image renditions are requested from Foveio, the planned media layer, via operator-configured named presets.
  • Every published channel emits its own per-tenant event on NATS JetStream.
Registered at boot / cmd/connector-hub10 channels
01
LoomKaritKarma CMS
02
WordPressCMS
03
DrupalCMS
04
GhostCMS
05
ContentfulHeadless CMS
06
JoomlaCMS
07
StrapiHeadless CMS
08
PressablePublishing
09
X + FacebookSocial
10
ViewCastaStreaming

008 / Questions

Frequently asked.

Mirrored in FAQPage JSON-LD so search and answer engines can lift these verbatim.

What is NewsForge?

NewsForge is KaritKarma's multi-tenant newsroom platform, written entirely in Go. It ingests raw news sources, clusters duplicate coverage by vector embedding with pgvector, synthesizes one original article per story through the tenant's own AI provider, routes every story through mandatory human editorial review, and publishes approved stories to up to ten CMS and social channels on a per-story schedule.

Does NewsForge replace my CMS, or sit alongside it?

Alongside it. NewsForge is the content engine; its connector hub publishes into ten channels: Loom, WordPress, Drupal, Ghost, Contentful, Joomla, Strapi, Pressable, X and Facebook, and ViewCasta. One editorial approval fans out to exactly the targets the editor scheduled, and the hub decides between publish and update by checking the prior publication record for each channel.

How does the editorial lifecycle work?

Every story moves through six forward-only states: ingested, clustered, synthesized, in_review, approved, published. Transitions are enforced atomically and illegal moves are rejected. Human review is mandatory: every synthesized story waits in the review queue, and approval requires an explicit distribution schedule. An approval with an empty schedule is rejected, and every gated action lands in the audit log.

How does NewsForge keep articles original?

An originality guard measures n-gram shingle overlap between the synthesized draft and every source, on a scale from 0 (fully original) to 1 (fully copied). If the draft overlaps beyond the configured threshold, the engine re-prompts the model and measures again, up to a configured number of attempts. The threshold, shingle size, and attempt limit are all per-tenant configuration, and a human editor still reviews every story before it can publish.

Which AI providers does NewsForge support?

Four, bring your own: Groq, OpenAI, Anthropic, and Gemini. Each tenant picks its provider and supplies its own key, stored encrypted with AES-256-GCM. The engine resolves that tenant's provider, key, and originality threshold at synthesis time. An unsupported provider is an explicit logged refusal, never a silent fallback to a platform key, and provider endpoints, retries, timeouts, and token limits are all configuration across 198 distinct config keys.

How does NewsForge handle languages?

Synthesize once, localize on demand. Each story is written once in its canonical language and served verbatim with zero LLM calls. Any other requested language is translated once and cached per story and language, and failed translations are never cached. Clustering itself is multilingual, because the embedding model groups the same story across languages before synthesis.

Launch your newsroom

One desk. Ten front doors. Tomorrow's edition, already on press.

Brief us on the desk, the language, and the publish targets. You get a NewsForge tenant with the six-state lifecycle live, a reporter desk, your own AI provider key encrypted at rest, and a connector hub wired to your CMS and social channels.

6-state lifecycle, forward only
10 publishing channels
BYO AI: 4 providers
Human approval, always