Spoke Plus AI Provider Integration Plan¶

1. Current State¶

Spoke Plus already includes AI integration hooks in backend services and content creation flow.

Implemented placeholder hooks:

Current behavior:

Hooks return mock provider payloads.
Content creation path can call these hooks when ai_generated is enabled.
Generated outputs are persisted into the linguistic content bank (vocabulary/word_assets/related layers).

This provides a stable contract for replacing mock providers with production AI vendors.

LLM/translation provider: OpenAI or equivalent for translation, tagging support, and text transformations.
Image generation provider: provider selected per style and quality/cost profile.
TTS provider: neural TTS provider for per-language speech generation.

AI hooks should remain provider-agnostic contracts while implementation modules are swapped underneath by environment configuration.

Target characteristics:

Planned preset governance:

Voice style presets: e.g., neutral, child-friendly, classroom, slow articulation.
Image style presets: e.g., flat illustration, realistic, iconographic, classroom-safe visual style.

Presets map to provider-specific options while preserving a product-level style taxonomy.

Planned authoring workflow for assisted content production:

Author enters source word/sentence.
Auto-translate to target language.
Auto-generate tags and metadata suggestions.
Generate image asset.
Generate audio asset.
Persist final linguistic record into vocabulary + linked enrichment tables (word_*, sentence_*).

This flow aligns with the existing content schema and current API write path.

Design principle:

Keep the linguistic model language-explicit (source_lang, target_lang where applicable) and reusable.
Reuse content structures across courses while allowing language-specific asset generation.
Support controlled localization via translation hooks and curated tag taxonomies.

Planned controls:

Planned caching layers:

Deterministic keying by (source_text, source_lang, target_lang, style preset).
Reuse existing generated assets when matching cache keys exist.
Persist provider response metadata for traceability and cache diagnostics.

Planned resilience model:

Backend now uses a provider abstraction in services/tts/providers/*.
TTS_PROVIDER selects provider at runtime; fallback stays mock in dev/test.
Deterministic cache key: sha256(text + lang + voicePreset + provider).
Generation is async-only through BullMQ (queue: tts) + PM2 worker (workers/ttsWorker.js).
Storage persistence path standard: tts/<lang>/<voicePreset>/<text_hash>.mp3 in Supabase Storage bucket tts.
No frontend SERVICE_ROLE usage; all storage writes remain backend-only.