首页 > 其他分享 >创新实训 (六)

创新实训 (六)

时间:2024-05-30 19:21:55浏览次数:25  
标签:code students programming system judgement 创新 实训 error

在指导老师鹿旭东的指导下,我们将现有的工作进行总结,形成了一篇简短的论文,目前该文章已投稿至 CEISEE 2024。

An Online Judgement System Based on Code-Generating Large Mode

Abstract: For computer science majors in higher education institutions, programming courses are one of the most important professional foundation courses. Proficiency in independent programming skills is of great help to the study of subsequent courses and the personal development of students. In the teaching process of programming courses, online judgement systems are often used to improve students' programming level. Traditional online judgement systems lack guidance for students, and it is often difficult for inexperienced students to find and correct errors in their codes by themselves. In this article, we propose an online judgement system that integrates a large model of error correction to help students find errors and improve their programming skills.
Key words:Online Judgement System;Code-Generating Large Mode; AI assistant。

Programming courses (C++ programming, Python programming, Java programming, etc.) are one of the most important professional foundation courses for computer science majors in colleges and universities. Meanwhile, computer technology is penetrating into various fields, and programming has become one of the skills that many college students must master. Many non-computer majors also offer programming courses. The main teaching goal of this course is to enable students to master a high-level programming language, and can independently and autonomously use high-level programming language to analyse and solve problems.
However, in the process of teaching, due to the large number of students, the short teaching time, and the fact that different students have different levels of acceptance and mastery of programming, it is often difficult for teachers to accurately know the programming ability of each student. Some students may have been exposed to programming before the course, and may feel that the content of the class is too simple and lose interest in the course. Some students who have not been exposed to programming beforehand may be less receptive to the programming language and may have difficulty in understanding the knowledge presented in the course. Therefore, programming courses often require students to go through a lot of practice, from the process of hands-on further deepen the understanding of the course knowledge, improve the hands-on ability, stimulate the interest in programming and motivation. The practice questions of programming are different from the questions of other courses, there is no standard answer, and it is difficult to judge the correctness of the static code only. Therefore, it is necessary to use an online judgement system to run the students' codes under actual data, and the system will give an judgement of the correctness, efficiency and quality of the students' codes.
And unlike other subjects, the errors that can occur in programming are not only varied but also sometimes very subtle, which makes it a high barrier to entry. With a large number of students and a wide variation in student code, teachers do not have the time and energy to be able to help every student with code error correction. Students who have not been in contact with programming often lack effective code debugging tools due to the lack of basic experience in programming, and often spend a lot of time and energy on code error checking and debugging, which is not conducive to the learning of the course and the consolidation of knowledge. The process of error checking also lacks effective reference, which is not conducive to the development of correct and good coding habits.
In order to solve this problem, this paper proposes an online judgement system with an integrated error-correcting large model. On the basis of providing the functions of traditional online judgement system such as problemset, judgement and examination, it adds the large model technology that can perform code error correction. It can help students to perform code error correction, save students' time in code error checking and debugging, help students develop correct and good coding habits, and reap better teaching results.
1 Status of the online judgement system
Programming course is one of the most important professional foundation courses for computer science majors in many colleges and universities. And as computer technology is gradually penetrating into various fields, many engineering majors also offer programming courses. Programming has become one of the necessary skills for college students in the new era. The online judgement system is an important teaching tool and platform for programming courses.
Online judgement systems, abbreviated as OJ, were first used in programming competitions. With the development of programming competitions and the opening of programming courses, various universities have gradually developed their own online judgement systems. For example, POJ of Peking University and ZOJ of Zhejiang University are two of the earliest online judgement systems in China. These online judgement systems have rich and high-quality programming practice problems, which attracted a large number of programming contestants to use these systems for practice. Nowadays, the main online judgement system used by colleges and universities is PTA, which contains a large number of basic problems in the problemset, so that teachers can add problems from the problemset into the question list by themselves, and check the feedback of students' performance. These features help to achieve better teaching results.

In the era of the Internet, there are abundant online teaching resources for students to learn and refer to, which helps the teaching of programming courses to perform, and also highlights the importance of online judgement system in the teaching of programming courses. For how to carry out a good programming course, how to combine the advantages of online course teaching and offline course teaching to improve the educational quality of online courses[1], is the focus of many scholars. Academics have conducted extensive research on the goal orientation, teaching mode, teaching evaluation, and course resource construction of programming courses in online courses and offline teaching[2]. Many believe that an online judgement system is an important tool for combining online courses with offline teaching, which can fully improve the teaching quality of programming courses[3]. Scholars have investigated the blended teaching model based on the "OJ+SPOC" platform, which has shown that by implementing a closed-loop teaching process before, during, and after the class, students' programming and problem-solving abilities can be significantly improved[4].
Online judgement system is an important tool in programming courses, with the help of online judgement system, it can make the boring classroom into a fun practice place, so that students can appreciate the wonders of programming through independent exploration. Students can broaden their horizons through a wealth of online teaching resources. From "paper programming", which only focuses on the degree of knowledge memorizing, to on-line programming, which focuses on practice.
An online judgement system is a tool that can fully release the exploratory abilities of students. However, in the process of students' exploration, due to the short time of exposure to programming and lack of experience in programming, many errors often occur. Moreover, students often do not have the ability to check and debug errors on their own, and they can do nothing about erroneous codes. If the errors are relatively obvious or common, students can solve the problems by discussing with each other. But in a practice-oriented subject like programming, errors can be quite subtle and may be difficult for novices to detect. Such as such an error:

#include<cmath>
#include<iostream>
using namespace std; 
double x1,x2,y1,y2;
 int main()
{
cin>>x1>>y1;
cin>>x2>>y2;
cout<<abs(x1-x2)+abs(y1-y2);	
return 0;
}

The function of this code is to read in the coordinates of two points (x_1,y_1 ),(x_2,y_2) and compute the Manhattan distance between them. The code is not complicated and is a good exercise for novices to practice reading and handling variables in C++ programming. But the code as above does not compile. The error message is:
c:\mingw\include\math.h:273:24: note: previous declaration 'double y1(double)'
An experienced programmer can tell from this error message that there is a conflict between the variable y1 and y1 in the cmath library. For someone with a lot of debugging experience, it is very easy to locate this error, just analyse the error message and find the reason for the error in the cmath library. However, for a novice who is new to programming, the variable name was defined using the letters and subscripts as described in the title, and the variable name conformed to the specifications for C++ variable name definitions described in the class. Without being able to understand the error message, it takes a lot of time to independently locate and solve what seems to be a very simple error. Many students will even give up on the problem after a few attempts.
Spending too much time on error checking and debugging on a wrong code is not only a waste of students' learning time, but also a blow to their interest and confidence in learning programming. Due to the large number of students, and the students' code habits, errors vary greatly, and there is no standard answer to the programming problems, so it is often difficult for teachers to take into account the learning progress of all students in the class, and can not help all the students in a timely manner for the code error checking and debugging.
And at the end of 2022, a new generation of AI technology Large Language Models represented by ChatGPT came out, ChatGPT is able to understand natural language, achieve interaction with humans, and iteratively train through reinforcement learning techniques with human feedback (RLHF). Code generation is also an important area of concern for large models, and large language models pioneered by Codex have achieved impressive results in code generating, giving rise to commercial products such as GitHub Copilot and open source billions of code models such as StarCoder and Code LLaMA [5]. Many of these code generation models have demonstrated excellent capabilities in many tasks such as code generation, completion, interpretation, error correction, and unit testing. If these large models can be used in online judgement systems for programming teaching, the AI assistant based on the large models can help students to correct their code when they make mistakes, and help them to improve the efficiency of error checking and debugging, which can greatly solve the problems mentioned above.
2 System Functions and Architecture
2.1 System Functions
In order to achieve better teaching results in programming courses. Also considering the scalability of the system, this online judgement system has the following functions:
User rights and basic information management: The system has three levels of rights: administrator, teacher, and student. The system allows users to register accounts, log into the system, and manage personal information and settings. Teachers, based on courses and classes, import students' information, can view students' codes, issue assignments to students, and view data feedback on how students are doing.
Problemset Management: The system contains a problemset with a variety of programming practice problems. The problemset management function allows teachers or administrators to add, edit and delete problems, including problem descriptions, input/output samples, test results, etc. The system also provides support for multiple choice questions, judgement questions and other question types to help students consolidate their basic knowledge while practicing.
Judgement and Feedback: The system can automatically judge user submitted code. It can run the user code and compare it with the expected results to check its correctness. The system can also evaluate the performance indicators of the code, such as the running time of the code and the memory used during the running process. After completing the evaluation, the system gives instant feedback, including test case passes, error messages and performance analysis. At the same time, in order to prevent attacks and malicious submissions, the system must take certain security measures, such as placing the user's application in a sandbox for isolation during evaluation, performing syntax tree analysis on the code to prevent plagiarism, and detecting code overlap to prevent a large number of submissions within a short period of time.
Competition and Examination: Teachers can select specific problems in the problemset for examination, so as to check the practical ability of students in programming. In order to cope with the large-scale submission of the examination, the system should support the addition of multiple judgement machines, so as to achieve parallel judgement, and prevent the examination from the problem of excessive pressure on the judgement caused by multiple submissions of students, which affects the performance of the system. At the same time, the system is also geared towards the needs of programming contestants, and should also provide the corresponding competition function. The competition supports OI/IOI/ACM and other competition modes; there is a question area in the competition. The result of each game will calculate the rating for the user and provide a ranking of points. Each question has a Hack mechanism, i.e. you can look at other people's code and try to find out the loopholes, which is more rigorous and more fun.
Communication and discussion: When students have doubts about problems or have good ideas to share, they can use the blog function provided by the system to express their opinions. Other users can comment under the blog, thus achieving the function of communication and discussion.
Meanwhile, in order to save the students' time in code checking and debugging, in our online judgement system, if a code fails to pass all the test data, an AI assistant based on the large model will give advice on how to modify the code. This feature will be explained in more detail in the next section.
2.2 System Architecture
Architecture: A distributed system including front-end, back-end and error correction large model components was adopted.
Front-end: The system has a fully functional and user-friendly web interface, which includes functions such as submitting code, viewing error correction results, and providing feedback. The front-end sends the user's code to the back-end, waits for the back-end to process it and accepts the error correction and feedback information sent by the back-end, and displays the information in the front-end in an appropriate way to inform the user.
Backend: The system builds a high-performance server-side application that is responsible for receiving the code submitted by the user and evaluating the code. If the evaluation passes, the code will return the code pass (AC) message to the front-end. If the code can not pass all data, it will be sent to the error correction model for processing, and accepts the error correction information and feedback information, and sends these information to the front-end.
Error Correction Model: The system integrates a powerful code error correction model for automatically analysing and correcting errors in user-submitted code. On the basis of this model, special data from programming test questions are selected for Fine-turning training.
3 Code-Generating Large Model
3.1 Introduction to Code-Generating Large Model
Code-generating large models are trained using deep learning techniques for automatic generation of computer program code. These models are able to understand natural language descriptions or high-level abstract concepts and translate them into executable code. Code-generating large models are usually based on powerful language models such as GPT (Generative Pre-Training Models), which learn the syntax, structure and semantics of the code through a large-scale pre-training and fine-tuning process.
The training process for code-generating large models usually consists of two phases. First, pre-training is performed using a large library of publicly available code in order for the model to learn common code syntax and structure. Then, fine-tuning is performed on specific domains or tasks, such as natural language description-to-code conversion, code auto-completion, or code defect repair. These models have a wide range of applications and can be used to improve developer productivity, assist in code writing, automate software development processes, etc. They can also be used for educational purposes to help beginners understand the basic concepts and paradigms of code writing.
Integrated into the online judgement system proposed in this paper is CodeGeeX, a multilingual code generation model jointly created by Tsinghua and Wisdom Spectrum AI, which can achieve code translation, code completion generation, and basic question and answer functions that are available in large language models. Another important reason for choosing CodeGeeX is that the model supports many mainstream high-level programming languages such as C++, Java, Python, etc., and all of them have good results, which is applicable to a variety of programming course scenarios.
3.2 Fine-tuning and Training of CodeGeeX

Figure 1 Large model technology roadma
The CodeGeeX model was fine-tuned in the process of integrating it into an online judgement system in order to make CodeGeeX more capable of generating code for programming problems. The goal of fine-tuning the model was to make it more capable of solving programming problems, but not less capable. The multi-task fine-tuning capability of the CodeFuse-MFTCoder allows for multi-task fine-tuning (MFT) of the CodeGeex2-6B using multiple code task datasets. Formatting/splitting, syntax analysis, feature extraction, and causal judgement are performed on the obtained code files. The processed dataset will be provided to the big model for fine-tuning training. The fine-tuning is done by using the pre-trained model obtained on the big data to initialise the weights to improve the accuracy. Due to the small number of CodeGeex2-6B parameters, the training is performed using the multi-task LoRA fine-tuning mode of MFTCoder instead of QLoRA, and the code tasks are relatively complex tasks, we fine-tune more modules including Attention and MLP.
For the training datasets, classic public datasets and GitHub high-quality code outside were used. These datasets are mainly software development oriented codes, which are not very effective for code generation of programming topics, although they are helpful to some extent. In order to further improve the model's code generation ability for programming practice topics, a large amount of public code from Codeforces, PTA, Atcode, Topcoder, and other teaching or algorithmic competition practice platforms was additionally obtained using a crawler and added to the training data. The targeting ability of the model was further improved.
In order to evaluate the generative capability of the model, pass@k is chosen as the evaluation metric. The goal of this model is to perform code error correction and generate code, so traditional string similarity metrics such as BLEU are not suitable for evaluating the performance of the model. The pass@k adopted here is an approach to evaluate the performance of the generated code, which measures the accuracy of the generated code by using test cases.
In order to evaluate the generative capability of the model, pass@k is chosen as the evaluation metric. The goal of this model is to perform code error correction and generate code, so traditional string similarity metrics such as BLEU are not suitable for evaluating the performance of the model. The pass@k adopted here is an approach to evaluate the performance of the generated code, which measures the accuracy of the generated code by using test cases.
pass@k is defined as follows:pass@k =E(1-(((n-c)¦k))/((n¦k) )) ,where n denotes the number of models generated and k is the metric we selected,Usually k=1,10,100. c is the number of codes that pass all the test cases。(n¦k) denotes the number of combinations, (n¦k)=n!/(k!(n-k)!) 。The higher this metric is, the better the code generated by the model passes the test. The actual calculation of this metric can be done by calculating the value of 1-(((n-c)¦k))/((n¦k) ) for each topic, and then averaging the values afterwards for the overall metric.
3.3 Processes after Integrating a Large Model
The next section demonstrates the workflow of the system after integrating the large model of code error correction.
As an example, the code in Section 1 is submitted with a compilation error.

Figure 2 Compilation error in code
After that, you can click the "Ask AI Assistant" button to use the large model component.
You can find that the large model found the error in the original code, replaced the wrong variable name "y1" with "y1_val", and gave a more standardised code with additional comments, making the code clearer and easier to understand.
Figure 3 Large mode gives a correct code

4 Conclusion
This online judgement system with integrated large model is currently in the testing stage and will be put into use in programming courses in the near future. While the system has the functions of a basic online judgement system, it also integrates the code error correction model, which can provide timely help to students when they encounter problems, alleviate the problem of difficult to answer questions in programming courses, and improve the teaching quality of programming courses.

Reference
[1] Xiang Zhou, Yanping Zhang, Practice of "Online+Offline" Blended Teaching Mode for Basic Programming Courses[J]. Computer Education, 2021(8): 138-141
[2] Ning Liu, Xia Mengyan, Ru Liu, et al. Research on online-offline integrated teaching mode of Python public course[J]. Science and Technology Wind, 2021(9): 62-63
[3] Yong Liu, Kai Tina, Xiaolin Zhou, et al. Practical Teaching of Programming with OJ System and Subject Competition[J]. Journal of Higher Education, 2021(6): 28-31
[4] Cuixiao Zhang, Guobing Zhang. Blended teaching practice of programming course based on "OJ+SPOC"[J]. China Management Informatisation, 2021, 24(19): 230-232
[5] Zhang Z, Chen C, Liu B, et al. A survey on language models for code[J]. arXiv preprint arXiv:2311.07989, 2023

标签:code,students,programming,system,judgement,创新,实训,error
From: https://www.cnblogs.com/asuldb/p/18223082

相关文章

  • 创新实训 (五)
    论文:Prefix-Tuning:OptimizingContinuousPromptsforGenerationPrefix-tuning将一系列连续的特定于任务的向量添加到输入中,这些前缀向量并不能够映射到真正的实体token,可以理解为“虚拟token”,这些虚拟的token作为Prefix。然后,在训练的时候只更新Prefix部分的参数,而PL......
  • 登上国际舞台!天翼云P4 EIP网关流量管理创新方案亮相CCGrid 2024!
    5月8日,第24届IEEE/ACM集群、云和互联网计算国际研讨会(CCGrid2024)在美国费城隆重举行。来自中国、美国、印度、法国等国家的学术及产业界代表齐聚一堂,围绕云计算相关议题进行深入探讨和交流,并带来最前沿的技术展示。天翼云云网产品事业部弹性网络产品线总监侯叶飞出席大会硬件系......
  • [持续更新中] 创新实训项目
    DeerOJ的前端框架介绍Web文件夹下的结构DeerOJ的前端框架参考了部分Lavarel框架,做到兼顾代码的可维护性和可阅读性。具体的维护目录文件结构如下:注意到web文件夹下的index.php这是整个前端程序的main程序,当服务段收到请求后,根据.htaccess文件指定使用index.php文......
  • 深入探索汇编语言的顶尖级应用领域,包括多核并行处理、物联网和嵌入式系统、高性能计算
    汇编语言初级应用的大纲:1.汇编语言概述介绍汇编语言的基本概念和作用。解释汇编语言与高级语言的区别。简要说明汇编语言的历史和发展。2.汇编语言基础讲解汇编语言的基本语法和结构。介绍汇编语言中的指令、寄存器、内存等概念。解释汇编语言程序的组成部分,如数据段......
  • 【持续更新】创新实训
    项目简介随着互联网+的生态模式和人工智能的产业化发展,程序设计已成为计算机专业乃至工科学生的必备技能之一。学生学习程序设计,不仅能提高代码水平能力,学会如何写代码,如何写好代码,而且能锻炼学生在今后面对项目开发等实际应用场景时解决问题的能力。因此,很多同学在刚刚接触到编......
  • 开源AI智能名片商城系统小程序:构建企业敏捷性与创新力的新引擎
    摘要:在数字化时代,企业正面临前所未有的市场变革。客户需求日新月异,市场竞争日趋激烈。为了在这场变革中立足,企业必须寻求新的解决方案,以提升自身的敏捷性和创新力。开源AI智能名片商城系统小程序,作为一种新兴的技术工具,正以其独特的优势,助力企业迅速响应市场变化,推动产品和服......
  • YOLOv10全网最新创新点改进系列:融合空间信息关注机制(SimAM)于YOLOv10网络,在通道之间和
    YOLOv10全网最新创新点改进系列:融合空间信息关注机制(SimAM)于YOLOv10网络,在通道之间和空间位置之间建立更加准确的关联,助力YOLOv10有效涨点!!!所有改进代码均经过实验测试跑通!此项目不低于30种改进!自己排列组合2-4种后,考虑位置不同后可排列组合上千万种!改进不重样!!专注AI学术,......
  • YOLOv10最新创新点改进系列:融合最新顶会提出的HCANet网络中卷积和注意力融合模块(CAFM
    YOLOv10全网最新创新点改进系列:融合最新顶会提出的HCANet网络中卷积和注意力融合模块(CAFM),有效提升小目标检测性能,大幅度拉升目标检测效果!遥遥领先!YOLOv10全网最新创新点改进系列:融合最新顶会提出的HCANet网络中卷积和注意力融合模块(CAFM-2024年4月开源),有效提升小目标检......
  • HarmonyOS SDK助力中国建设银行探索金融领域创新场景
    今年年初,中国建设银行(以下简称建行)手机银行首批适配HarmonyOSNEXT,并高效实现其应用的核心功能迁移。同时,建行手机银行在HarmonyOSSDK的加持下,充分发挥鸿蒙原生应用在原生智能方面的优势,让用户尽享更高效便捷的线上金融服务。HarmonyOSSDK场景化控件助力建行高效开发自建行加......
  • 基础会计学习指导习题与实训第五版王炜课后习题答案解析
    基础会计学习指导、习题与实训第五版)主 编: 王炜ISBN: 9787040564648出版社: 高等教育出版社上传者: Dzq!大家好,我是一名会计专业的大学生,最近在学习《基础会计学习指导、习题与实训第五版》这本教材。我发现这本书内容丰富,讲解透彻,非常适合初学者。但是,在学习过程中,我......