Spoke Plus AI Provider Integration Plan¶
1. Current State¶
Spoke Plus already includes AI integration hooks in backend services and content creation flow.
Implemented placeholder hooks:
generateImage(content)generateAudio(content)autoTranslate(content)
Current behavior:
- Hooks return mock provider payloads.
- Content creation path can call these hooks when
ai_generatedis enabled. - Generated outputs are persisted into the linguistic content bank (
vocabulary/word_assets/related layers).
This provides a stable contract for replacing mock providers with production AI vendors.
2. Future Integration Strategy¶
2.1 Provider Categories¶
- LLM/translation provider: OpenAI or equivalent for translation, tagging support, and text transformations.
- Image generation provider: provider selected per style and quality/cost profile.
- TTS provider: neural TTS provider for per-language speech generation.
2.2 Provider Abstraction¶
AI hooks should remain provider-agnostic contracts while implementation modules are swapped underneath by environment configuration.
Target characteristics:
- Stable response schema to callers.
- Centralized timeout/retry policy.
- Structured provider metadata for observability.
2.3 Voice and Image Presets¶
Planned preset governance:
- Voice style presets: e.g., neutral, child-friendly, classroom, slow articulation.
- Image style presets: e.g., flat illustration, realistic, iconographic, classroom-safe visual style.
Presets map to provider-specific options while preserving a product-level style taxonomy.
3. AI Content Workflow¶
Planned authoring workflow for assisted content production:
- Author enters source word/sentence.
- Auto-translate to target language.
- Auto-generate tags and metadata suggestions.
- Generate image asset.
- Generate audio asset.
- Persist final linguistic record into
vocabulary+ linked enrichment tables (word_*,sentence_*).
This flow aligns with the existing content schema and current API write path.
4. Multi-Language Expansion Strategy¶
Design principle:
- Keep the linguistic model language-explicit (
source_lang,target_langwhere applicable) and reusable. - Reuse content structures across courses while allowing language-specific asset generation.
- Support controlled localization via translation hooks and curated tag taxonomies.
5. Cost Control Strategy¶
Planned controls:
- Generation toggles per operation (translate/image/audio).
- Per-provider quotas and budget thresholds.
- Async/batch generation modes for non-blocking back-office pipelines.
- Manual approval checkpoints before expensive generation at scale.
6. Caching Strategy¶
Planned caching layers:
- Deterministic keying by
(source_text, source_lang, target_lang, style preset). - Reuse existing generated assets when matching cache keys exist.
- Persist provider response metadata for traceability and cache diagnostics.
7. Failover Strategy¶
Planned resilience model:
- Provider priority chain (primary → secondary fallback).
- Graceful degradation per asset type (text-only save if media providers fail).
- Error isolation so one failed modality does not block full content persistence.
- Explicit provider failure logging surfaced through system monitoring endpoints.
Update: TTS layer real (provider-agnostic)¶
- Backend now uses a provider abstraction in
services/tts/providers/*. TTS_PROVIDERselects provider at runtime; fallback staysmockin dev/test.- Deterministic cache key:
sha256(text + lang + voicePreset + provider). - Generation is async-only through BullMQ (
queue: tts) + PM2 worker (workers/ttsWorker.js). - Storage persistence path standard:
tts/<lang>/<voicePreset>/<text_hash>.mp3in Supabase Storage buckettts. - No frontend SERVICE_ROLE usage; all storage writes remain backend-only.