Created an AI Marker for RiPPLE queries students submitted for assessments. The research paper goes into more detail on the specifics of this project - why it was made, difficulties faced, and future work. It also goes into additional information which is related to the course’s plan to push students to utilize AI in the classroom.
What I did
I was given the CSV information of the students assignment submissions (Post creation & Moderation). I received each of the stuudents submissions and would put it into an GenAI model which had access to the assignment rubric and further information on the course and how it should grade students.
For moderation queries - assessments where the student is grading a fellow peer’s work - I also inputted that into the AI model to ensure the feedback given from the student was relevant to the original assessment made by the student.
The results of doing this can be seen below. More intelligent AI systems (Sonnet) would be considerably better graders then simpler systems (Haiku).
Claude Haiku (Dumb)

Haiku would essentially give full marks to every student irrespective of the quality of their submissions.
Claude Sonnet (Smart)

Sonnet follows a more predictable binomial distribution which evenly spreads students out among the 3-3.5 grade area.
When it comes to in comparison to the Teacher’s mean value mark, Sonnet actually doesn’t perform too much better than Haiku when you consider the difference between the means.
Despite this, it should be obvious to say that giving full marks 100% of the time, while I’m sure the students would enjoy, isn’t an AI marker we want to be encouraging as it isn’t pushing students to learn, refine and improve their work
On reflection, the assessment piece given to students in this course were designed to be completed in a couple of hours, with a focus not on quality, but on listening to the talk, researching and reflecting.
In these conditions, the AI should be understandable and grade the students more leniantly - this shows the importance of teachers being able to preview how AI grades student work, and to further tune it to fit their standards before being given to the students.
Conclusion
The limited timeframe as well as systemic bugs with the original excel data caused progress on this project to be slow for the short period I was working on it.
Resultingly, there is plenty of work that can be done to refine this system further - such as evaluating AI’s outputs better, improving the performance of the AI, and giving further utility to students who are involved in courses which uses similar systems.
The research paper goes into further specifics of what this would entail. Hopefully I’ll make an pdate on this project as it develops down the line.