Using AI as an exam tutor

Artificial intelligence is set to help human examinees study more effectively

09-May-2024

Symbolic image

Computer-generated image

Revolutionize your production: real-time Raman analysis for maximum efficiency

Efficient inline analysis for liquids and solids

User-friendly software for effortless Design of Experiments (DoE)

At the WestAI service centre, NRW research institutions support companies in the development of innovative AI applications. A cooperation between Forschungszentrum Jülich and u-form publishing house now illustrates what such a collaboration can look like.

Last year, artificial intelligence (AI) successfully passed various exams, including the US Medical Licensing Examination. Now AI is set to help human examinees study more effectively. With the help of experts from Forschungszentrum Jülich, u-form Verlag publishing house aims to develop an AI tutor for exam preparation. As part of a joint research project with the AI service centre WestAI, various language models will be tested and integrated into the publisher’s training programmes.

“Part of our collaboration will involve comparing the suitability of large commercial models such as GPT-4 and Claude 2 with open source models, the use of which is attractive for companies for reasons of data protection, for example,” explains Fritz Niesel, AI researcher and consultant at the Jülich Supercomputing Centre.

Do you also see faces in the coffee foam?

Researchers use artificial intelligence to investigate possible causes of known sensory illusions

Read news

A first use case is the Chamber of Industry and Commerce’s (IHK) final examination for commercial trainees. U-form Verlag, which specializes in exam preparation, offers corresponding training programs. AI should make it possible to automatically evaluate free-text answers and point out errors to learners.

One such test question is, for example, how a “supply monopoly” and a “polypoly” differ from each other. Previously, trainees could only compare their answers independently with a sample solution. In future, the AI will also provide them with direct feedback on the accuracy and completeness of their answers.

“The evaluation with Claude 2 by Anthropic works quite well in principle,” says Fritz Niesel. In some respects, however, the AI faces the same problems as a human examiner, since freely formulated answers are sometimes ambiguous or imprecise. They cannot always be clearly classified according to an evaluation scheme. For example, the German term “Nachfrager” in the example answer shown in the image could mean either several buyers or just a single buyer. The points awarded are therefore to a certain extent a matter of interpretation.

“Many open source models are currently much weaker in comparison. They still have major deficits, especially in German, and most models were primarily trained in English,” says Fritz Niesel. However, the fact that users can set up their own applications and run them on their own hardware speaks in favour of open source. This way, valuable corporate knowledge can be protected from unauthorized distribution.

Reply from Mixtral-8x7B: "The answer is partly correct, but not detailed enough. An example is given for supply monopoly and polypoly, but the number of suppliers and buyers is not clearly stated. However, a correct description and a correct example are used for supply monopoly, which is why 2 points are awarded. Although a correct definition is used for polypoly, an example and the number of buyers are missing, which is why only 1 point is awarded. Another point is awarded for the correct use of a term."

In the above test case, Mixtral-8x7B comes to the same result as Claude 2. The Mixtral-8x7B language model is considered by many to be the best open source model currently available. However, the explanation given by Mixtral-8x7B is inferior to that of Claude 2, with parts of it being simply wrong or unclear. For instance, no example of a polypoly was given in the answer – in contrast to what was stated in the explanation. Mixtral-8x7B also awards one point “for the correct use of a term”. But what term does this refer to?

“This is a typical weakness that can be improved by means of prompt engineering – i.e. specially customized input for the AI – and some fine tuning,” estimates Fritz Niesel. In the project, the partners now plan to further investigate whether open source models can be used as exam tutors in a way that is as effective as the more powerful commercial AIs.

WestAI for innovative use of AI

The cooperation was made possible by the AI service centre WestAI – one of four AI service centres funded by the Federal Ministry of Education and Research to drive forward AI research and the transfer to practical applications in Germany. The main focus is on collaborating with start-ups and small and medium-sized enterprises.

WestAI provides stakeholders from industry and science with access to AI models and high-performance AI computing infrastructures. The WestAI partners contribute their respective expertise in a targeted manner to support companies in using state-of-the-art AI technologies, putting innovative ideas into practice, and opening up new fields of application. The focus is on launching novel multimodal AI models and generative AI models.

https://www.bionity.com/en/news/1183412/using-ai-as-an-exam-tutor.html