Hourly ·
Textbook Compliance Was 10%. An AI Tutor Hit 90% — and Test Scores Rose With It.
A Dartmouth Statistics course replaced optional readings with an AI-graded quiz platform called Phosphor. Voluntary engagement jumped from roughly 10% to 90%, and heavy users gained up to 1.30 standard deviations on the final exam.
When Dartmouth College deployed an optional AI tutoring platform called Phosphor in its Introductory Statistics course, the results upended expectations. Textbook reading compliance, estimated at 10–15%, was replaced by 90.2% voluntary platform adoption among the 151 enrolled students.
The platform — which mixed AI-graded written-response quizzes with multiple-choice questions and was evaluated at this week's Intelligent Textbooks 2026 workshop — delivered measurable gains. Students who fully engaged with all 24 lessons and three cumulative Module Reviews scored 14.7 points higher on the final exam, an effect size of 1.30 standard deviations. Controlling for prior midterm performance, the advantage remained 0.71 SD — roughly 8 points on a 100-point scale.
But the learning mechanism was specific. When the course temporarily switched to multiple-choice-only quizzes for Module 2, the dose-response relationship vanished: completing more lessons predicted no additional gain. Constructed-response questions, graded by Claude Sonnet 4.6 against instructor rubrics, were the active ingredient. Students who passed all three cumulative Module Reviews — requiring written answers — saw the single largest effect: 7.1 points on the final exam (d = 0.66).
A built-in RAG chat assistant was almost entirely ignored — 72 total queries across the entire semester, with only 14 students using it more than once. Students told researchers that general-purpose LLMs were faster and more capable for their questions, and that the course content itself was sufficient without a separate chat interface.
The study is non-randomized, and author Jonah Bard flags self-selection as the central threat to causal interpretation. But the MCQ-to-CRQ natural experiment within the same course provides a cleaner signal: the format of assessment, not just the act of engaging, drove outcomes.
Bard positions the findings against prior research showing that unrestricted GPT-4 access with no guardrails harmed student test performance by 17% when the tool was removed. Phosphor suggests a different path: embed AI inside structured, rubric-graded formative assessment, and students will not only show up — they will learn.
Sources: Phosphor: Balancing Efficacy and Engagement in Interactive Texts (Bard, 2026)
教材 compliance 只有 10% 。AI 辅导员 达到了 90% —— 测试成绩也因此提高。
一个达特茅斯统计学课程将可选阅读替换为名为Phosphor的AI评分测验平台。自愿参与[K 率从大约10%跃升至90%,重度用户在期末考试中最高获得了1.3个标准差的成绩提升。
← Hourlies Hourly · 2026-07-05 20:00 UTC 课程合规度为10%。一位AI辅导员实现了[K 90%的高分——成绩也随之提升。达特茅斯统计学课用AI批阅测验平台Phosphor取代了可[K 选阅读材料。自愿参与率从大约10%跃升至90%,重度用户在最终考试中得分提高了约1[1D[K 1.3个标准差。 图片:RightBrainPhotography/ EV1A014_(1).jpg
More Hourlies Stories
Content on Anagnorisis is summarized, paraphrased, and editorialized from publicly available sources for length and clarity. Original sources are linked where available. All trademarks belong to their respective owners.

