Skip to content

Schema Snapshot (Documentation Canonical View)

  • Snapshot date: 2026-03-08
  • Scope: Content Bank + Language Learning Engine
  • Purpose: authoritative documentation aligned with implemented, taxonomy-driven architecture

Canonical Content Bank tables

  • vocabulary
  • lemma_forms
  • senses
  • sense_translations
  • sentences
  • sentence_tokens

Canonical taxonomy tables

  • taxonomy_categories
  • taxonomy_values
  • content_item_taxonomies

Taxonomy relationship:

vocabulary -> content_item_taxonomies -> taxonomy_values -> taxonomy_categories

Common category systems include: - parts_of_speech - cefr_levels - frequency_bands - semantic_domains - registers - lemma_types - grammar_topics - languages

Chunk and composition tables

  • vocabulary_components
  • chunk_components

Chunks are stored in vocabulary with type='chunk' and components must reference valid lemmas.

Grammar and media extension tables

  • lemma_assets
  • lemma_grammar
  • verb_conjugations

verb_conjugations includes: tense_key, person_key, form, is_irregular.

Selected vocabulary fields (current documented schema)

  • id
  • lemma
  • lemma_normalized
  • type
  • language_id
  • base_lang
  • language_code
  • pos
  • pos_id
  • lemma_type_id
  • cefr_level
  • cefr_level_id
  • frequency_rank
  • introduced_chapter
  • introduced_unit_id
  • introduced_step
  • difficulty_score
  • editorial_status
  • created_at
  • updated_at

Progression + difficulty notes: - introduced_unit_id and introduced_step feed Lexical Unlock Graph progression logic. - difficulty_score is normalized to [0.0, 1.0] (0.0 easiest, 1.0 most difficult) and can come from taxonomy heuristics or AI suggestion.

Grammar-topic language scope

grammar_topics values are language-scoped at taxonomy level.

Examples: - English: present_simple, past_simple, modal_verbs - Portuguese: preterito_perfeito, preterito_imperfeito - Turkish: aorist, evidential_past

Compatibility note

No documented feature is removed. Legacy entities and mirror fields remain supported for backward compatibility while taxonomy mappings remain canonical.

Language-engine expansion tables (implemented additive layers)

  • lemma_roles
  • lemma_semantic_classes
  • verb_object_constraints
  • modifier_constraints
  • student_lemma_progress
  • pattern_difficulty
  • lemma_frequency
  • collocation_strength
  • sentence_patterns
  • conversation_turns
  • conversation_lemmas

Audit-required table status: - lemma_roles: implemented and used. - student_lemma_progress: implemented and used. - lemma_frequency: implemented and used. - sentence_patterns: implemented and used. - conversation_turns: implemented and used. - conversation_lemmas: implemented and used.