Over six weeks in 2024, about 800 senior secondary students in Edo State sat in computer labs twice a week and learned English with the help of Microsoft Copilot.
By the end, their test scores had improved by close to two years of learning compared with classmates who were not in the program.
Girls, who started behind the boys, caught up.
The World Bank ran the experiment as a randomized controlled trial and treated it as one of the first real tests of generative AI as a tutor in a low-resource setting. A few years ago, that result would have sounded impossible. It is now one of the most cited education studies to come out of Africa.
Other News
Most of the coverage has fixed on the numbers. The numbers earn attention. But they are not the lesson. The lesson is in how the thing was done.
Copilot was not built to teach. It was built to draft emails and, in plenty of classrooms, to let students hand in essays they did not write. The same tool produced two years of learning in Edo State for one reason. The people running the pilot did not point it at answers. They pointed it at thinking. Teachers opened each session with the topic and a starting prompt, stayed in the room to mentor and add prompts as students worked, and closed with a short reflection. The World Bank called the teachers “orchestra conductors.” The AI was one section of the orchestra. It was not the conductor, and it was not the score.
That distinction is the whole argument, and we keep getting it backwards. Nowhere more so than in how students are taught to write code.
The wrong scoreboard
Walk into a university computer lab today, in Lagos or anywhere else, and you will find students who are not so much writing assignments as negotiating with a chatbot until working code appears.
Ask it to explain recursion, fix a broken loop, or write a whole solution, and it will. The technology is cheap and capable, and it is not going anywhere.
The question for computer science education was never whether students would use it. The question is how we judge what happens when they do.
Right now, we judge it by the answer. Does the code compile? Does it pass the tests? Is it clean enough to submit? Those are fair questions for an engineer choosing a tool. They are the wrong questions for a teacher measuring a student. A student who submits AI-generated code they do not understand has finished an assignment. They have not finished a lesson.
The gap is not new. Education has always struggled to tell performance apart from understanding. A student can memorize a proof for an exam and never grasp the theorem behind it. What AI does is widen that gap quickly, and our usual tests cannot see across it. The distance between what a student can produce with AI and what they actually understand has never been wider.
Grade the learning
If the answer is the wrong measure, what is the right one? Learning. Whether a student can carry an idea into a problem they have not seen before, and whether they still hold it a month later. These are harder to measure than a passing test suite, and they are what a degree is supposed to certify.
Picture two students on the same assignment. The first pastes the problem into a chatbot, gets a correct solution, submits it, and remembers nothing by Friday. The second tries first, gets stuck, and asks the chatbot why their approach throws an error. They get an explanation and a question back, not a finished function. Their code carries a small inefficiency. They scored lower. They learned more.
Researchers studying AI in introductory programming courses keep finding the same pattern: students who treat the model as something to argue with build stronger mental models than students who treat it as a vending machine. Edo State is the large-scale version of that finding. Guided use taught. Unguided use would not have.
Build for the classroom, not the inbox
The answer is not to ban the tools. Banning AI from programming is like banning the calculator from engineering. It protects a certain idea of effort while preparing students for a workplace that has already moved on. Professional developers use these tools every day, and using them well is now part of the job. Treating every use as cheating ignores that.
The answer is to change how we assess and what we build. Assessment should reward the journey, not only the destination: oral exams where students explain their own code, and assignments where the process is graded alongside the result. Work done with AI can still be marked on the depth of engagement rather than the polish of the product. And the tools students learn on should be built for learning.
A coding assistant made for productivity tries to finish your task. A coding assistant made for teaching tries to make you think. Some already exist. The University of Auckland built a tool called CodeHelp that meets a stuck student with guided prompts instead of finished code. That is the right instinct.
Nigeria is well placed to lead here, and well placed to drift. The National Universities Commission has issued guidance on AI in higher education, and the country published a national AI strategy in 2024.
But most campuses still have no enforceable rule on how students may use these tools, so students and lecturers are left to improvise. The risk is not that Nigerian students use AI. They already do. The risk is that we measure that use with the same broken scoreboard as everyone else.
Edo State already showed what AI looks like when it is aimed at learning instead of at answers. The tools are here. The students are using them. Our standards for judging them need to catch up.
- Daniel Emakporuena is an AI researcher and Data Scientist.
Follow Us on Google Discover