Skip to content
All Case Studies
Case Study / NewsForge

NewsForge
synthesis, not scraping.

An all-Go newsroom platform. pgvector clusters multi-source coverage, a synthesis engine writes one original article per story behind an originality guard, humans approve, and a connector hub distributes to 10 channels. Postgres 18, NATS JetStream, 848 Go tests.

6
Lifecycle states
10
Publishing channels
8
Services (all Go + web)
848
Go test functions
Counts measured from the NewsForge repository. Deployed at newsforge.news; first-tenant onboarding in progress.
What is the NewsForge case study

A newsroom where machines synthesize and humans approve.

The NewsForge case study documents an all-Go newsroom platform with a forward-only six-state story lifecycle: ingested, clustered, synthesized, in review, approved, published. Duplicate coverage is clustered by multilingual vector embeddings, one original article is synthesized per story through the tenant's own AI provider, and a mandatory editorial review gates every publish.

The platform's first generation ran the Bengali-first portal khoboria.com as its founding deployment and proved Bengali as a first-class output language. The current stack is a complete re-engineering: synthesis instead of template rewriting, bring-your-own AI, and row-level-secured multi-tenancy. We label that history honestly rather than presenting old-generation claims as current.

The Challenge

Rewriting wires is not journalism, and manual desks cannot scale.

Most automated news tools paraphrase a single source, which is plagiarism with extra steps. Most newsrooms burn their desk hours rewriting agency copy by hand. The engineering problem is to synthesize one original story from many sources, prove the originality, and still keep a human accountable for what publishes.

Duplicate coverage everywhere

The same event arrives from many sources in many languages. Without embedding-level clustering, a newsroom rewrites the same story five times.

Single-source paraphrase risk

LLM rewrites of one source inherit its errors and its copyright. Synthesis across a cluster, gated by an n-gram originality guard, is the defensible alternative.

Accountability at speed

Regulator-grade publishing needs a named human approval and an audit trail on every story, without giving up machine-speed ingestion and distribution.

The Lifecycle

Six states, forward only, enforced in code.

Every story moves through six states with atomic, forward-only transitions; illegal transitions are rejected and every transition emits its matching event on NATS JetStream. Two ingestion paths converge: the crawler path clusters before synthesis, the reporter path may legally skip clustering.

01

Ingested

The crawler parses configured RSS and HTML sources (gofeed plus goquery) with a per-source circuit breaker, so a dead source backs off without blocking healthy ones. A seen-URL ledger deduplicates fetches. Reporters file directly through the reporter desk with drafts, AI assist, and asset attachment.

02

Clustered

Each item is embedded with a multilingual ONNX sentence model and matched against open clusters by pgvector ANN search, threshold- and category-scoped. One event becomes one cluster across sources and languages, not five rewrites. Reporter filings may legally skip this state.

03

Synthesized

When a cluster meets its configured minimum source count and collection window, the synthesis engine writes one original article through the tenant's own LLM provider. An n-gram shingle-overlap originality guard re-prompts until the draft stops overlapping its sources. LLM failures feed a circuit breaker; nothing fails silently.

04

In review

Every story lands in the human editorial queue. Editors are assigned or pick up stories, edit the synthesis, and work against authz, audit, and metrics wired into the editorial service.

05

Approved

Approval carries an explicit per-story distribution schedule of named targets. Approval with an empty schedule is rejected by contract. The approval, like every gated action, is written to the audit log.

06

Published

The connector hub publishes to exactly the scheduled targets across 10 channels (Loom, WordPress, Drupal, Ghost, Contentful, Joomla, Strapi, Pressable, X and Facebook, ViewCasta), requesting per-channel image renditions through its built-in Foveio media client. Localized editions are translated once and cached per language.

Engineering substrate: Go 1.26, PostgreSQL 18 with pgvector and row-level security on tenant tables, NATS JetStream with five per-tenant story subjects, ONNX multilingual embeddings with a configurable dimension, AES-256-GCM encrypted tenant secrets, Prometheus metrics across services, and 198 distinct configuration keys with required keys failing boot loudly. 848 Go test functions across 377 Go files.

NewsForge vs the alternatives

Why synthesis beats a content mill.

Versus a manual newsroom, a generic content mill, or a wire-only feed, here is what the synthesis lifecycle does differently.

CapabilityNewsForgeManual newsroomContent millWire service only
Original synthesis from multi-source coverageSlow
Originality guard before any human sees the draftN/A
Cross-language deduplication (pgvector clustering)
Mandatory human editorial approval
Multi-channel native publishing10 connectorsManual uploadAPIWire feed
Bring-your-own AI provider per tenant4 providersN/AN/A
Audit log on every gated actionVaries
Deployment status

Live platform, honest tenant story.

The all-Go stack is deployed at newsforge.news with editorial, tenant, and operator surfaces. khoboria.com was the founding deployment of the first generation and proved Bengali-first publishing; the current platform is its complete re-engineering, and first-tenant onboarding on the new stack is in progress. We will name tenants only when they are live.

4
BYO AI providers per tenant
198
Configuration keys, zero hardcoded
39
SQL migrations
5
Per-tenant event subjects

Frequently asked

NewsForge, asked plainly.

What is the NewsForge case study?
The NewsForge case study documents how an AI synthesis newsroom is engineered: a multi-tenant, all-Go platform that ingests raw news sources, clusters duplicate coverage by vector embedding, synthesizes one original article per story through a configurable LLM provider, routes it through mandatory human editorial review, and publishes it to ten CMS and social channels on a per-story schedule. The stack is Go 1.26, PostgreSQL 18 with pgvector and per-tenant row-level security, NATS JetStream for events, ONNX multilingual embeddings, and a Next.js 16 web app serving the public site plus editorial, tenant, and operator surfaces.
What happened to khoboria.com and the first-generation pipeline?
khoboria.com, a Bengali-first news portal, was the founding deployment of NewsForge's first generation (.NET plus Python). That generation proved the demand and the hard part, Bengali as a first-class output language, and was then retired. The platform was re-engineered from scratch as the current all-Go stack, which replaces the old template-rewrite approach with true multi-source synthesis. We no longer present khoboria as a customer of the current stack; first-tenant onboarding on the new platform is in progress and will be named only when live.
How does the NewsForge story lifecycle work?
Every story moves through a forward-only six-state lifecycle: ingested, clustered, synthesized, in review, approved, published. Transitions are enforced atomically and illegal transitions are rejected. The crawler path clusters multi-source coverage before synthesis; the reporter path lets a journalist's single-source filing legally skip clustering. Synthesis fires only when a cluster meets the configured minimum source count and collection window, and the draft must pass an n-gram originality guard (re-prompting until overlap with sources falls under the configured threshold) before it can reach the review queue.
Is NewsForge autonomous, or does a human review every story?
Humans approve, machines distribute. Every story passes a mandatory editorial review queue: an editor picks up the story, can edit the synthesis, and approval requires an explicit distribution schedule (approval with an empty schedule is rejected). After approval, the connector hub publishes to exactly the scheduled targets. Every gated action lands in the audit log. We rebuilt the platform around this human gate deliberately; the first generation's fully-autonomous default is gone.
Which AI provider does NewsForge use?
Whichever the tenant chooses. Four providers are wired as real bring-your-own choices: Groq, OpenAI, Anthropic, and Gemini, with per-provider base URLs, retries, timeouts, and token limits as configuration. Each tenant's provider, model, and API key are resolved at synthesis time from its own encrypted configuration (AES-256-GCM at rest); an unsupported provider is an explicit logged refusal, never a silent fallback to a platform key. Nothing is hardcoded: the platform exposes 198 distinct configuration keys, and required keys fail boot loudly.
Can NewsForge publish into an existing CMS or do publishers have to migrate?
Publish into what you already run. Ten connectors register at boot: Loom, WordPress, Drupal, Ghost, Contentful, Joomla, Strapi, Pressable, social (X and Facebook), and ViewCasta. The hub publishes to exactly the targets editorial scheduled, decides publish-versus-update by prior publication lookup, and a connector whose declared capabilities reject a story performs zero network I/O (proven by a parity test over every registered connector). Localization is on demand: the canonical language is served verbatim with zero LLM calls, any other language is translated once and cached per story and language.

Explore NewsForge

Synthesize once. Approve once. Publish everywhere.

From clustered multi-source coverage to ten publishing channels, with your own AI provider and a human approval on every story. See how NewsForge maps to your CMS and your language plan.