ContentForge — Automated Content Pipeline

A distributed, AI-powered content automation platform that turns long-form source stories into multi-language short-form videos through an eight-stage pipeline running across three coordinated machines.

Back to projects

PythonDjangoStrawberryGraphQLNuxtVue.jsTypeScriptasyncioSQLAlchemyCeleryRabbitMQRedisOllamaFFmpegPipersentence-transformersPRAWTailscaleSyncthingDocker2026

Overview

ContentForge is an end-to-end content automation platform that ingests long-form stories, rewrites and quality-gates them with self-hosted language models, translates them into multiple languages, narrates them with text-to-speech, generates timestamp-accurate captions, composites short-form videos, and publishes them across platforms — with no manual editing in the loop. The system is intentionally distributed across three machines, each running the workload it is physically best suited to.

Backend Framework Decision — FastAPI to Django + Strawberry

The backend was first built on FastAPI, but as the platform grew it became clear FastAPI alone was not enough for what the project needed. FastAPI is deliberately minimal — it gives you routing and validation, but everything else (an ORM, database migrations, an admin interface, authentication, and a multi-app project structure) has to be assembled and maintained by hand. For a platform spanning many domains — sources, drafts, narration, rendering, publishing, analytics, accounts — that hand-assembly became the bottleneck. The backend was migrated to Django, which provides those capabilities out of the box and enforces a clean modular app structure, with Strawberry layering a type-safe GraphQL API on top. The result kept the type-safe API surface while gaining Django's mature ORM, migrations, admin, and ecosystem. The lesson: choose the framework that matches the system's breadth — minimal frameworks are excellent for narrow services, but batteries-included frameworks pay off for large multi-domain platforms.

The Eight-Stage Pipeline

Content flows through eight discrete, independently triggerable stages: (1) scrape source stories, (2) rewrite with a large language model, (3) quality-gate the rewrite, (4) translate into target languages, (5) generate text-to-speech narration, (6) generate captions from the narration timestamps, (7) composite the short-form video with FFmpeg, and (8) publish to the target platforms. Stages 1–3 produce a single approved source; from stage 4 onward the pipeline fans out per language, so one story becomes many localised videos. Each stage is its own task, which makes the pipeline easy to retry, resume, and reason about.

Distributed Multi-Machine Mesh

ContentForge runs across three physical machines connected over a Tailscale mesh VPN: a Raspberry Pi handles scraping, scheduling, and publishing; a MacBook handles text-to-speech and caption generation; and a GPU desktop handles language-model inference and video rendering. Work is routed to the machine best suited for it, with no hard-coded addresses — all endpoints resolve from a shared configuration synchronised across machines with Syncthing. Redis Streams carry the work signals between stages with per-consumer acknowledgement, so a crash on any machine can be recovered without losing or duplicating work.

LLM Orchestration and Quality Gate

All language-model work runs on self-hosted Ollama models rather than paid APIs. A larger model performs the rewrite, a smaller model assists with the quality gate, and a dedicated translation model handles localisation. The quality gate is dual-mechanism: a model-assigned score combined with a similarity check against the source, so output that scores too low or stays too close to the original is automatically rejected before it can consume downstream TTS and rendering resources. Embeddings from sentence-transformers support similarity and deduplication.

Text-to-Speech and Timestamp-Accurate Captions

Narration is generated with self-hosted TTS engines (Piper and Kokoro), selectable per language. Captions are deliberately derived from the TTS word-level timestamps rather than by transcribing the generated audio afterwards — this keeps subtitles perfectly aligned to the spoken words without an error-prone transcription step. The approach guarantees caption accuracy because the timing comes straight from the synthesiser that produced the audio.

Frontend Dashboard

A Nuxt 4 dashboard provides visibility and control over the pipeline, talking to the Strawberry GraphQL API through URQL with real-time updates. It surfaces the state of each draft as it moves through the eight stages, across every language variant, so the otherwise headless automation remains observable and controllable from a single screen.