Fine-Tuning Gemini Pro for Essay Feedback

3/25/2024

As the name “GPT” indicates, generative models have been “pre-trained” on large data sets, allowing them to learn general patterns in human language and pick up knowledge and skills in the process. Only accessible to experts a year ago only, post-training is now easy to implement through plug-and-play platforms, promising to specialize LLMs in complex and specific tasks such as essay feedback.

💬 Models like ChatGPT have also been “fine-tuned” on additional inputs and outputs to develop further capacities, such as engaging in question-and-answer interactions with users.

🔧 Most GenAI tools (such as the ones provided by Magic School and the like) are, at present, “wrappers” running a meta-prompt on these models and instructing them to behave a certain way.

📝 In the case of student feedback tools, users often have the ability to upload their own rubric and use “retrieval augmented generation” to enhance (and ground) the output.

🧠 Another option, however, is to actually train (or fine-tune) the underlying model for this specific task. In less than a year, fine-tuning has gone from very complex and only accessible to experts to somewhat more accessible, and now easily implemented through plus-and-play interfaces such as Google AI Studio.

🎥 The video below shows how any school can:
-Train Gemini 1.0 Pro on past student work and teacher feedback (and grading)
-Use the model to assess new pieces of work, and include each response (after review) as an example for subsequent comments and markings.

Note that such a fine-tuned model can also be prompted with specific instructions and equipped with a knowledge base such as a rubric (and, in the near future, an entire textbook).

To simplify:
-A student feedback “wrapper” is like a GP alerted to focus on mental health concerns and using a screening questionnaire
-A student feedback fine-tuned model is like a trained and experienced psychiatrist

👍 For this reason, I would not be surprised if this technology helped teachers, not only reduce their workload (and reallocate their efforts), but also ensure greater accuracy and precision in comments and grades. Keep in mind, I graded both the French Bac Philosophy essay and the IB Extended Essay in my day, both of which are known for low inter-rater reliability.

🔄 This could be possible by:
-Collaborating to create and share high-quality datasets
-Fine-tuning models for specific types assignments
-Testing, monitoring, and improving as needed (including for issues such as biases)

🔍 Such benefits will obviously have to be established through experiments - this video, with its minuscule 20-item dataset, being a mere illustration of the process. But imagine what schools working together could do.

❗ The “human in the loop” principle might prevent us from using AI to *assign* grades, but even in the context of summatives, fine-tuned models could assist teachers, reduce their cognitive load, and allow them to focus on reviewing an initial output. In the case of formatives, this can be an opportunity for students to reflect critically on both their own work and its AI evaluation.

0 Comments

Fine-Tuning Gemini Pro for Essay Feedback

Leave a Reply.