@Gorden_Sun: https://x.com/Gorden_Sun/status/2066919099016630286

X AI KOLs Following 06/16/26, 04:21 PM News

education student-performance ai-impact homework-outsourcing deepseek doubao chatglm

Summary

A long-term study involving 26,000 Chinese middle and high school students found that after students independently used AI, homework performance improved by 18%, but closed-book exam scores dropped by 20% within six months. Zhongkao and Gaokao scores dropped by 24% and 18% respectively, and 81% of students used AI to complete their homework.

https://t.co/zq6QuvTN5J

Original Article

View Cached Full Text

Cached at: 06/17/26, 01:42 AM

Data from 26,000 Chinese students: How did their grades change after using AI?

A joint study by Stockholm University and the University of Hong Kong tracked 26,811 middle and high school students in a central Chinese county over 30 months. It aimed to answer one question: When students choose to use AI on their own, does their learning actually improve or decline?

This question is tricky because previous research almost exclusively relied on short-term experiments — one group gets AI, another doesn’t, and differences are measured after a few weeks. But in reality, students pick their own tools, decide how to use them, and effects accumulate slowly — conditions hard to simulate in a lab. The value of this paper lies precisely here: it examines real school settings over a real timescale.

Which AI tools did students use?

For context, the AI tools students used are the ones we’re familiar with: 47% used Doubao, 36% used DeepSeek, 14% used ChatGLM, and a few used ERNIE Bot and Qwen. All are general-purpose AIs that can directly provide answers — not education-specific tutoring products.

These tools are more than capable of handling homework assignments.

From September 2022 to June 2025, AI adoption in this county rose from near zero to about 80%. Math and English were the subjects where AI was used most. The launch of DeepSeek V2.5 in September 2024 and DeepSeek R1 in January 2025 each triggered a noticeable usage spike.

Where does the data come from?

Researchers obtained three types of data from the local education bureau.

The first type is monthly closed-book exam scores, covering 9 subjects.
The second type is weekly homework scores and completion times. The completion time was automatically recorded by the digital platform — the interval from when a student opened the assignment to when they submitted the answer, averaging 58 minutes.
The third type is middle school and high school entrance exam (Zhongkao and Gaokao) scores.

Then, through a survey in June 2025, researchers determined which month each student first used AI.

Because different students started using AI at different times, researchers could use a difference-in-differences (DiD) approach for causal inference — comparing changes in a student before and after they started using AI, subtracting the changes in students who didn’t use AI during the same period. Before starting AI, the grade trends of the two groups were nearly identical, and demographic characteristics were also very similar. This makes the method fairly credible.

Homework and exams moved in opposite directions

After using AI, homework performance improved noticeably: scores rose by 18%, and completion time dropped from 64 minutes to 45 minutes. But closed-book exam scores fell by 20% over six months, equivalent to a 1.4 standard deviation drop.

Homework got better and better, exams got worse and worse — and by a significant margin.

The long-term effects were even more severe. Middle school exam (Zhongkao) scores dropped by 24%, and high school exam (Gaokao) scores dropped by 18%. But this damage took nearly two years to fully materialize. The reason is simple: these exams test knowledge accumulated over years. A student who only started using AI in January 2025 still had the learning from previous years unaffected, so the overall decline was gradual. This also means that current mainstream short-term experiments underestimate the long-term harm AI can cause to learning.

81% of students were using AI to “ghostwrite” homework

The most compelling part of the study is the analysis of homework completion times.

Students who didn’t use AI typically spent at least 50 minutes on homework. The more time they spent, the higher their homework and exam scores. This is normal — more input, more output. But among AI users, there were many who spent less than 50 minutes on homework — even faster than the fastest non-AI student. Their homework scores were high, but their exam scores were low.

Researchers call this pattern “homework outsourcing”: students throw the problem to AI, copy the answer, and submit it. It’s essentially cheating on homework — just shifted from copying a classmate’s answers to copying AI’s answers. Among students who had used AI for more than 5 months, 81% exhibited this pattern.

Which students and subjects were hit hardest?

By subject: Exam scores in Politics and Geography fell by an average of 27%. STEM subjects fell by about 22%. Language subjects were the least affected — English dropped 17%, Chinese dropped 9%.

By student type: Middle school students were more affected than high school students, possibly because high schoolers face stricter academic oversight and tighter schedules. Boys were slightly more affected than girls (21.6% vs 18.4%), largely because boys spent more weekly time on AI. Most unexpectedly, the best students were hit the hardest — top students’ exam scores dropped by 24%, while bottom students dropped only 16%. AI is compressing the grade distribution, pulling the high end down.

Why didn’t anyone intervene?

Given the damage, why didn’t teachers, parents, or school administrators step in earlier?

Several reasons. Teachers typically only see their own subject. A drop of this magnitude in a single subject occasionally happens even among non-AI users (about 4% probability), so looking at one subject alone wouldn’t raise flags. Principals look at the county-wide average score. But because AI adoption was low in the first two years, its drag on the average was only 3.4%. It wasn’t until 2025, when adoption soared, that the drag expanded to nearly 10%.

Students’ own judgments are unreliable too. There’s a classic finding in educational psychology: people tend to equate “easy to learn” with “learning well”, and “hard to learn” with “learning poorly”. The experience of instantly solving homework with AI exactly reinforces this illusion. Students who used AI performed worse on exams, but they didn’t feel they had learned less — in fact, they felt quite good.

Is there any sign of improvement?

A little. Looking at the effect after five months of AI use, it was -25% in early 2023 but narrowed to -16% by June 2025. The same trend holds when locking in the same cohort of early adopters. So students and teachers are slowly adapting, but the pace of adaptation is far behind the speed of AI diffusion.

Overall, the picture is not optimistic. The idea that AI could make education more equitable hasn’t materialized; instead, it has made students lazier. There is still no good solution for how education should respond to the impact of AI. The evidence is now clear — change must happen.

Original paper: https://cepr.org/publications/dp21577

@Gorden_Sun: https://x.com/Gorden_Sun/status/2066919099016630286

Data from 26,000 Chinese students: How did their grades change after using AI?

Which AI tools did students use?

Where does the data come from?

Homework and exams moved in opposite directions

81% of students were using AI to “ghostwrite” homework

Which students and subjects were hit hardest?

Why didn’t anyone intervene?

Is there any sign of improvement?

Similar Articles

Harvard Students’ AI Usage: By the Numbers

@KengGuangLong: https://x.com/KengGuangLong/status/2057311636348944738

@AYi_AInotes: https://x.com/AYi_AInotes/status/2062774798166503872

@GoogleDeepMind: We evaluated AI’s impact by looking beyond test scores to behavioral shifts. Over eight weeks, results suggest students…

Submit Feedback

Similar Articles

Harvard Students’ AI Usage: By the Numbers

@KengGuangLong: https://x.com/KengGuangLong/status/2057311636348944738

@AYi_AInotes: https://x.com/AYi_AInotes/status/2062774798166503872

@sheriyuo: Actually, I have been writing papers purely with AI from the very beginning. Previously I used DeepSeek R1, now I use V4. Since I don't have English academic writing ability, but I can tell by eye whether a sentence or passage is appropriate. As for Chinese writing, I am fairly confident. So almost 9...

@GoogleDeepMind: We evaluated AI’s impact by looking beyond test scores to behavioral shifts. Over eight weeks, results suggest students…