Datascience/AI Flywheel Hypothesis (Optional / Future)

Hypothesis

If LangListen helps users learn from their own content and their own errors, then (with explicit user opt-in) we can create a compounding loop:

Uploaded/selected content → generated practice → user engagement + corrections → better personalization → higher retention.

Longer term, tutor-informed data could be used to train a foundation model that is learner-aware speech and better at error detection** than generic AI models.

Why we believe it (evidence so far)

  • Tutors already track recurring errors and desire better tooling to do so.
  • Speech transcription is poor at detecting errors in speech – they always assume the speaker is fluent. Correct errors relies on tutors “labeling” errors.
  • Some platforms already provide basic summaries; there’s room to do it better and extend it into personalized practice.

Supporting quotes (interviews)

  • Error tracking is a real workflow:
    • “Eu tenho como um checklist… cada classe eu vou entender quantos erros sobre esse tema ele está fazendo…” — “I have it like a checklist… each class I’ll see how many mistakes on that topic they’re making…”
    • “Eu tenho muito boa memória… mas também anoto; deixo muito registrado no chat os erros recorrentes…” — “I have a very good memory… but I also write things down; I keep recurring mistakes well recorded in the chat…”
  • Product precedent (automated summaries exist):
    • “Os alunos que têm a assinatura do italki Plus têm o resumo de todas as aulas e de todos os erros…” — “Students who have the italki Plus subscription get the summary of all the lessons and all the mistakes…”
  • Constraint (current AI is generic; humans still needed):
    • “A IA ainda é muito genérica… não consegue pegar essas nuances de pronúncia; por isso que os professores são insubstituíveis, por enquanto.” — “AI is still very generic… it can’t pick up those pronunciation nuances; that’s why teachers are irreplaceable, for now.”

Tensions / counterevidence

  • This can easily become “ML theater” if it’s not tied to measurable product outcomes.
  • Privacy expectations around recordings are sensitive; this must be opt-in and value-led.

What must be true in the first 5 minutes

  • The loop starts with a simple, non-sci-fi value:
    • “This is my content, at my level, and it clearly adapts to what I struggle with.”

Metrics

  • Loop input: % users who upload/select content; % who correct/confirm errors or preferences.
  • Loop output: improvements in activation/retention for users with more feedback signals vs fewer.

Fastest tests (2-week sprint)

  • Ask directly in interviews:
    • “Would you opt in to saving your corrections to improve personalization for you?”
    • “What would make this feel safe and worth it?”
  • Start with “personal flywheel” (improves your experience) before claiming a global model advantage.

Decision rule (double down / deprioritize)

  • Double down if users willingly provide lightweight feedback signals and those signals correlate with higher engagement/retention.
  • Deprioritize if users don’t want to provide feedback, or if product outcomes don’t improve.