Skip to content

Spoke Plus AI Provider Integration Plan

1. Current State

Spoke Plus already includes AI integration hooks in backend services and content creation flow.

Implemented placeholder hooks:

  • generateImage(content)
  • generateAudio(content)
  • autoTranslate(content)

Current behavior:

  • Hooks return mock provider payloads.
  • Content creation path can call these hooks when ai_generated is enabled.
  • Generated outputs are persisted into the linguistic content bank (vocabulary/word_assets/related layers).

This provides a stable contract for replacing mock providers with production AI vendors.


2. Future Integration Strategy

2.1 Provider Categories

  • LLM/translation provider: OpenAI or equivalent for translation, tagging support, and text transformations.
  • Image generation provider: provider selected per style and quality/cost profile.
  • TTS provider: neural TTS provider for per-language speech generation.

2.2 Provider Abstraction

AI hooks should remain provider-agnostic contracts while implementation modules are swapped underneath by environment configuration.

Target characteristics:

  • Stable response schema to callers.
  • Centralized timeout/retry policy.
  • Structured provider metadata for observability.

2.3 Voice and Image Presets

Planned preset governance:

  • Voice style presets: e.g., neutral, child-friendly, classroom, slow articulation.
  • Image style presets: e.g., flat illustration, realistic, iconographic, classroom-safe visual style.

Presets map to provider-specific options while preserving a product-level style taxonomy.


3. AI Content Workflow

Planned authoring workflow for assisted content production:

  1. Author enters source word/sentence.
  2. Auto-translate to target language.
  3. Auto-generate tags and metadata suggestions.
  4. Generate image asset.
  5. Generate audio asset.
  6. Persist final linguistic record into vocabulary + linked enrichment tables (word_*, sentence_*).

This flow aligns with the existing content schema and current API write path.


4. Multi-Language Expansion Strategy

Design principle:

  • Keep the linguistic model language-explicit (source_lang, target_lang where applicable) and reusable.
  • Reuse content structures across courses while allowing language-specific asset generation.
  • Support controlled localization via translation hooks and curated tag taxonomies.

5. Cost Control Strategy

Planned controls:

  • Generation toggles per operation (translate/image/audio).
  • Per-provider quotas and budget thresholds.
  • Async/batch generation modes for non-blocking back-office pipelines.
  • Manual approval checkpoints before expensive generation at scale.

6. Caching Strategy

Planned caching layers:

  • Deterministic keying by (source_text, source_lang, target_lang, style preset).
  • Reuse existing generated assets when matching cache keys exist.
  • Persist provider response metadata for traceability and cache diagnostics.

7. Failover Strategy

Planned resilience model:

  • Provider priority chain (primary → secondary fallback).
  • Graceful degradation per asset type (text-only save if media providers fail).
  • Error isolation so one failed modality does not block full content persistence.
  • Explicit provider failure logging surfaced through system monitoring endpoints.

Update: TTS layer real (provider-agnostic)

  • Backend now uses a provider abstraction in services/tts/providers/*.
  • TTS_PROVIDER selects provider at runtime; fallback stays mock in dev/test.
  • Deterministic cache key: sha256(text + lang + voicePreset + provider).
  • Generation is async-only through BullMQ (queue: tts) + PM2 worker (workers/ttsWorker.js).
  • Storage persistence path standard: tts/<lang>/<voicePreset>/<text_hash>.mp3 in Supabase Storage bucket tts.
  • No frontend SERVICE_ROLE usage; all storage writes remain backend-only.