Supporting Collaborative Writing Tasks in Large-Scale Distance Education

In distance education courses with a large number of students and groups, the organization and facilitation of collaborative writing tasks are challenging. Teachers need support for planning, specification, execution, monitoring, and evaluation of collaborative writing tasks in their course. This requires a collaborative learning platform for coordinating all of the different phases in the writing process. In order to enable the design of such a platform, we created a process model of collaborative writing tasks that is based on the identification of participants, activities, phases, and orchestration from the literature. This model may serve as a basis for teachers to specify the instances for such tasks and can be used to determine the functional requirements needed for supporting model-compliant tasks on a collaborative learning platform. We present a general architecture for a platform of this kind that is independent of a concrete learning management systerm (LMS) system or shared editor and demonstrate its implementation using Moodle, Etherpad Lite, and Docker. The platform makes it easier for teachers to create groups and automatically assign members to collaborative workspaces. It enables asynchronous as well as synchronous text editing and communication. It also respects the European information security and data protection requirements and helps teachers monitor both the writing and reviewing activities. The platform was evaluated over a period of three semesters in distance learning courses with more than 4500 students. It proved a scalable and robust environment for coordinating the collaborative writing process of teachers and students and enables analysis of collaborative writing behavior by teachers and researchers.

In distance education courses with a large number of students, the organization and facilitation of collaborative writing tasks are challenging.Distance learning environments require the provision of dedicated shared workspaces for each group, including support for collaborative text editing, communication, coordination, and learning support through instruction, assessment, and feedback.In practice, such distance learning environments are provided by an e-learning platform, which offers "access to learning content and tests, communication, and collaboration tools for students, as well as course management and assessment facilities for teachers."[7, p. 21] In our experience, planning, specification, execution, monitoring, and evaluation of such collaborative writing tasks currently cost teachers many hours of manual labor.In particular, the efforts involved in providing dedicated workspaces to a large number of student groups, monitoring these workspaces, and dealing with technical and group-specific problems currently limit the use of collaborative writing tasks to a smaller number of students (e.g., [8]).The use of such tasks in large distance learning settings with thousands of students poses significant problems, particularly, e.g., in terms of the workload for teachers, the technology-based support required, integration into the learning environment of the university, and the evaluation of collaborative writing processes.This article, therefore, explores how technology can support teachers and students in the aforementioned activities from planning to evaluation of aforementioned collaborative writing tasks used in large distance learning courses with thousands of students.Since distance learning requires computer-mediated interaction within and between student groups and teachers, teachers currently rely on a combination of different tools (e.g., a learning management system, shared editors, and conferencing systems) for providing instructions, facilitating collaborative text editing, and for communication and coordination.In this context, the emphasis in education is on the use of Web 2.0 applications, such as wikis, weblogs, or collaborative real-time text editors [9], [10].However, these tools either lack support for shared synchronous text editing [11], do not comply with the strict data protection policies of European universities, or lack adequate scalability and robustness [12].A great deal of effort is required to set up, orchestrate, and monitor such a collaborative learning environment, which consists of a combination of different systems.Additional research is required to create better technology-based support for collaborative writing tasks (e.g., [8], [13]).
In order to reduce the burden on teachers, we propose a collaborative learning platform (CLP) that facilitates the aforementioned activities from planning to evaluation of collaborative writing tasks in a course.To design such a platform, a model of the respective collaborative writing tasks is needed.This model must be capable of allowing teachers to specify the specific collaborative writing task for their course and it should be based on a platform design that provides teachers with support for the aforementioned activities from planning to evaluation of model-compliant tasks in their course.
Therefore, we focus, in this article, on the following two research questions.
RQ1) What model of collaborative writing tasks is suitable for specifying such tasks in large-scale distance learning courses?RQ2) How can a CLP support the planning, specification, execution, monitoring, and evaluation of tasks according to the aforementioned model?In this article, we will answer RQ1 by constructing a model for collaborative writing tasks, which is based on the identification of participants, activities, phases, and orchestration from the literature (see Section II).We will then evaluate the resulting model in a large-scale distance learning course (see Section VII).
To answer RQ2, we will specify the requirements of a CLP (see Section III), examine related work on software solutions for collaborative writing tasks (see Section IV), and present a platform architecture (see Section V), its implementation (see Section VI), and its evaluation through a case study (see Section VII).
The results discussed in this article include a model for collaborative writing tasks, the requirements, architecture, and implementation of a CLP that supports this type of task, and an evaluation of the results of its use over a period of three semesters in distance learning courses with more than 4500 students.These outcomes may provide benefits to tool developers, teachers, and researchers.For example, tool developers may profit from the requirements, platform architecture, and components of the model to support similar collaborative learning activities while teachers may benefit by being able to create collaborative writing tasks that can be executed and evaluated with less work on the CLP presented.Teachers and researchers may equally profit by receiving access to a wealth of data from the collaborative writing processes collected by the platform.

II. MODEL OF COLLABORATIVE WRITING TASKS
In this section, we construct a process model for collaborative writing tasks based on the identification of participants, activities, phases, and orchestration from the literature (RQ1).The resulting model is evaluated by applying it in a large-scale distance learning course (see Section VII).
Within a typical course, we distinguish two types of participants: 1) The teachers who define the collaborative writing tasks and integrate them into the curriculum and 2) their students, who work on the collaborative writing tasks (e.g., [14]).
Lowry et al. [15] present a model of the collaborative writing (CW) process, which distinguishes three phases: 1) Pre-CW task, 2) CW task, and 3) Post-CW task.This model neglects the integration of a collaborative writing task into a course.Therefore, we extend the aforementioned model by introducing four phases (see Fig. 1).
The preparation phase is performed by teachers in order to set up the collaborative writing task and extends the Pre-CW task with respect to its integration into the course activities.In this phase, teachers first need to define the didactical design of the writing task.This includes specifying the task description and needed materials, as well as providing instructions for the students.Furthermore, they must specify the type of review process and the criteria for successful completion of the task.Finally, teachers must specify how the groups will be formed, i.e., by specifying a group size and formation process.This provides the basis for setting up the collaborative learning environment for each group.After that, the collaborative writing phase can begin.In contrast to Lowry et al. [15], who see group formation as part of the CW task, we assume that students get to know each other earlier in the course, as the collaborative writing task is an embedded part of the syllabus.
The collaborative writing phase extends the CW-task by not only allowing team formation, team planning, document production, and wind-up by students but also allows for the monitoring of group work by teachers.
After students finish the collaborative writing phase, we extend the Post-CW task by distinguishing the student assessment phase and the learning design evaluation phase.In the student assessment phase, depending on the task, teachers either grade individual performance of the students or their group performance.Teachers provide summative feedback (also called summative evaluation) as a retrospective assessment of learning [16].For this purpose, the documents produced, including group and individual performance, must be considered.The goal is, for teachers, to provide objective assessment, expert feedback, and a comparison of the group's performance with that of the other groups.
In the learning design evaluation phase, teachers may evaluate the learning experience based on monitoring data and the results of the student assessment phase.This may lead to improvements in future learning design.In addition, researchers may use the earlier data to gain insights into the learning processes, the use of the collaborative learning environment by students and teachers, and its effectiveness.
The collaborative writing phase includes three activities.Two of them relate to the collaborative writing activities of the students: the group composes the current version of the document in the text production activity.In the peer review activity, each member of the group, another assigned group, or a teacher gives feedback on the text.In conjunction with the two collaborative writing activities, teachers can perform a monitoring activity in which they monitor group behavior and intervene when needed [17].They can also respond to requests for help from students.
The text production activity contains the actual collaborative document production.Due to the diversity in distance education (i.e., the professional and personal contexts of the students), this can take place in either asynchronous or synchronous form.According to Lowry et al. [15], there are different strategies for collaborative writing, such as sequential or parallel writing.Parallel writing can be subdivided into horizontal-divisional writing, stratified-divisional writing, and reactive writing.The strategy for the respective task is always predefined in the instructions given by the teachers.Due to the time constraints of the course in which the collaborative writing activity will take place, teachers must specify in the preparation phase when and how often a text must be reviewed and revised in a cyclic sequence.If no review is requested after the most recent text production activity, the collaborative writing phase ends and the student assessment phase follows.Otherwise, a peer review activity follows, in which the current version of the text is reviewed by peers from the same or another group, or by teachers in order to provide formative feedback as guided by the instructions.
Formative feedback is important for improving writing skills [18], [19], [20].Current research suggests that in academic writing, both giving and receiving formative feedback can improve writing performance [19].It also improves students' revision skills [21].Peer feedback in the context of collaborative writing is received more reflectively and constructively by students than corrections from a teacher [20], and significantly higher improvements in writing are observed.A questionnaire defined by teachers in the preparation phase serves as a guide for the students in giving formative feedback [22].Students can also provide direct comments and annotations on parts of the text.As with the text production activity, the instructions specify whether each student reviews a different part of the text or the entire text individually.
In our process model, the four phases are performed in sequential order.Only in the collaborative writing phase, the monitoring activity is performed in parallel to the text production and peer review activities.Thus, teachers must specify the sequence of the text production and peer review activities and the assignment of students (of the same or of peer groups) to these activities.When defining the sequence, teachers must specify the number of text production-peer review cycles.If no peer review is required, the collaborative writing phase consists of a single text production activity and the monitoring activity.In general, teachers must also specify the schedule (start and end dates and times) of all activities in the collaborative writing phase.The transitions in Fig. 1 define when documents are passed on to the next phase.In each case, teachers specify who has access to which part of the document.Thus, a separate submission phase in which the document must be submitted or handed over is not needed.

III. REQUIREMENTS SPECIFICATION
In this section, we specify the requirements for the technology that supports teachers and students in the planning, specification, execution, monitoring, and evaluation of collaborative writing tasks conforming to the aforementioned model of collaborative writing tasks (see Section II) in large distance learning courses with thousands of students.For this purpose, we propose a CLP.Requirements for such a CLP arise from the need to support the activities from planning to evaluation of previous tasks, and from the context of software to be used in educational settings and with thousands of users.The requirements are listed ahead and labeled R1 through R33.
First, we analyze the functional requirements for such a CLP.Second, we derive nonfunctional requirements for such a platform from the context of software that is to be used in educational settings and with thousands of users.

A. Functional Requirements
In this section, we begin by specifying the requirements concerning the planning and specification of collaborative writing tasks during the preparation phase.We then analyze the requirements for execution and monitoring during the collaborative writing phase.Finally, we examine the requirements for student assessment and learning design evaluation in their respective phases.
Teachers must be able to create collaborative writing tasks in online courses as defined in the preparation phase (see Section II).To this end, the CLP must allow teachers to specify the digital didactical design of the collaborative writing task (R1).This specification generally includes the learning objectives, the task, the required materials, the instructions, the type of feedback and learning evaluation, and the design of the interaction between students and teachers [23].In our model of collaborative writing tasks, the frequency of peer review activities and, if necessary, the questions for guided formative feedback used during the review (see Section II), peer group assignment for reviewing, and criteria for successful completion must be additionally specified.In addition, the CLP must also allow teachers to determine how groups will be formed (e.g., group size and criteria for formation) and to conduct the group formation process (R2).
The collaborative writing task requires each group of students to work on a shared text document.To this end, the CLP provides each student group with its own collaborative virtual workspace, so that its members can work together on a shared document (R3).A collaborative virtual workspace supports computer-mediated interaction of users with documents and with each other through the use of synchronous and asynchronous tools [24].
The CLP must integrate each collaborative writing task and the associated workspaces into a learning management system, as LMSs are commonly used to centralize, manage, and organize learning activities and materials (R4).For ease of use, the CLP should allow students to use their learning management systerm (LMS) account to complete the entire collaborative writing task (R5).This also helps avoid the need and problem of setting up accounts for external tools not controlled by the university.
In order to limit the workload for teachers facing courses with large numbers of students and groups, the CLP must support the actual instantiation (set up) of a collaborative virtual workspace for each group and assign the group members to it (R6).To do this, students must be divided into groups of equal or different sizes, depending on the teachers' specifications (cf.R2).Due to the large number of students and groups, the CLP must enable teachers to perform group formation automatically and, in special cases, to form groups manually (R7).In cases, where students should be able to choose their group (e.g., due to friends or lack of time), the CLP must also be able to support group formation by allowing self-enrollment (R8).
In the collaborative writing phase (see Section II), each group works on its collaborative writing task in its own collaborative virtual workspace.The CLP must provide group-based access control for each workspace and its tools in order to protect the work of a particular group from unwanted external influences [25] and to allow teachers to evaluate and assess the performance of each group and its members (R9).To guide students through the intended activities of the collaborative writing task (see Section II) and support structured work, the CLP must be set up to provide each group with its own workspace and to convey the necessary instructions for controlling the overall process and the individual phases (R10).In addition, the CLP must inform students about the time constraints of the activities (R11).To avoid confusion and to enable coordinated work, the CLP should provide each group with a workspace that automatically provides all of the functions and information required for the current phase, including information that facilitates group awareness [26] (R12).
In addition, to support the text production activity (see Section II) of the collaborative writing task, the CLP must also provide each group with a collaborative virtual workspace that offers a shared editor for joint text production (R13).This shared editor must fulfill a number of requirements.
Due to the diversity of students in distance learning environments (e.g., different individual availability), it must be possible for students to collaboratively edit shared text documents both synchronously and asynchronously [27] (R14).To enable fluent collaboration, the shared text editor must make all changes visible to all group members as quickly as possible (R15).Lowry et al. [15] describe three different collaborative writing strategies that meet the need to support parallel synchronous or asynchronous work (cf.R14) including horizontal divisional writing (where writing is done individually in different parts of the text), stratified-division writing (in which participants take on a particular role, such as an editor, author, or reviewer), and reactive writing (where students work together on the entire text document, taking text changes or feedback from other group members into account).For horizontal divisional and stratified-division writing, the time delay is not as critical due to the loosely coupled work.In the case of reactive writing, groups usually employ a synchronous audiovisual channel that enables better coordination of joint text editing.Therefore, a delay in the visibility of one or two words seems acceptable.
To support collaboration and exchange of ideas the CLP must provide each group with a collaborative virtual workspace that includes tools to support communication and coordination between group members [28] (R16).
Depending on the teacher's specification, the collaborative writing task may require reviews.For this reason, the collaborative virtual workspace must support the peer review activity (see Section II).According to the collaborative writing tasks model, the CLP must, therefore, provide each group with a collaborative virtual workspace that enables group members, members of a peer group, and teachers to give formative feedback on the text of a group (R17).The CLP must also provide each group with a workspace that allows peer group members to complete a questionnaire defined by the teacher in order to give and receive guided formative feedback (R18).The CLP must also provide each group with a shared editor that allows group members, members of a peer group, and teachers to annotate selected text passages with formative feedback [29] (R19).The CLP must ensure that the number of peer review (peer review activity) and rewriting (text production activity) activities meets the specifications of the teacher before the assessment phase begins (see Section II) (R20).
Section II defines the monitoring activity, in which the teachers monitor group behavior and intervene when they see signs of critical behavior, e.g., deficits in communication, collaboration, student dropouts, or when they receive explicit requests for help.To support this, the CLP must allow teachers to monitor group behavior, evaluate group interactions, and identify groups that need teacher attention (R21).This means that the CLP must offer students the ability to contact their teachers; for example, when problems or questions arise about instructions (R22).In addition, the CLP must also support teachers to communicate with groups and their members at any time; for example, for the purpose of feedback, problem-solving, or for giving and receiving answers to questions (R23).To support quality assessment, feedback, and problem-solving, the CLP must grant teachers unrestricted access to each collaborative virtual workspace and its tools (R24).
For the student assessment phase (see Section II), the CLP must allow teachers in horizontal or stratified-division writing Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
to grade each student's performance or in reactive writing to grade the group performance and provide summative feedback to student groups and their members (R25).
For the learning design evaluation phase (see Section II), the CLP must support teachers in evaluating the student learning experiences by providing access to monitoring data and the results of the student assessment phase (R26).

B. Nonfunctional Requirements
We also determined which nonfunctional requirements are necessary on such a platform in order for the software to be used in educational settings and with thousands of users.The CLP and the required tools must comply with applicable laws and regulations, such as the General Data Protection Regulation (GDPR) in European countries [30] (R27).For example, as a public German university, the FernUniversity in Hagen is subject to legal requirements for the proper use of all procedures and IT applications.In addition, if students are required to explicitly transfer documents between tools, they may use the wrong documents or forget to transfer the documents.To avoid this risk and ease transitions between different tools used in the collaborative learning environment, the CLP should not require students to explicitly transfer documents between these tools (R28).
Finally, in order to limit the costs for licenses induced by a large number of students, the CLP should avoid the use of software that needs to be commercially licensed [31] (R29).
For a comprehensive and fully functional system, the CLP must meet the following general requirements that facilitate effective and efficient use of the system: Scalability and responsiveness are important quality-ofservice attributes for Web applications [32].This applies to learning environments.At our university, up to 3000 students are enrolled in a single distance learning course while at the Open University, U.K., for example, more than 12 000 students have taken part in some courses [33].Therefore, the CLP and required tools must provide sufficient scalability to accommodate these fluctuations (R30).This is especially demanding when supporting synchronous interaction for collaboration in large-scale distance learning courses.
The CLP and required tools also need to be highly responsive to enable fluent and effective collaboration in large-scale distance learning courses (R31).The CLP and required tools must be robust and reliable [34] to avoid demotivation due to system failures, breakdowns, or data loss (R32).To ensure robustness, the CLP must be able to detect errors and take corrective actions, such as rerouting requests to another platform component or restarting a faulty module [35].At the very least, the CLP must inform the system administration immediately in the event of serious technical problems that require rapid human intervention (R33).

IV. RELATED WORK
In education, a wide variety of different software solutions are used to support collaborative writing tasks.These tools help students communicate with each other, collaborate on common text documents, and coordinate their activities.In the following, we discuss how current solutions in seven categories address the requirements identified in Section III.

A. Wikis, Blogs, and Forums
Frequently employed solutions are wikis (e.g., [36], [37], [38], [39], [40]).They allow multiple users to write or edit documents, primarily asynchronously, and can also be used for coordination and communication [41].Once changes have been saved, a new version of the document is automatically published and can be accessed by all group members.Due to their asynchronous usage, wikis are especially suitable for being used in a course with numerous students and provide adequate responsiveness.By default, wikis do not support multiple authors with synchronous (near-real-time) text editing capability [11], [42], as this can lead to conflicts in the version history.These kinds of conflicts must be resolved manually by users [43]; thus, asynchronous and synchronous text edits and fluent collaboration are not fully supported.The same limitation also applies to other asynchronous collaborative text creation tools, such as weblogs (used, e.g., in [44], [45], and [46]), or forums (used, e.g., in [47] and [48]).

B. Repository-Based Solutions
Himmelstein et al. [49] describe the Manubot software, which enables the large-scale collaborative creation of manuscripts.Documents are written using the lightweight Markdown markup language and stored in a GitHub repository.Revisions of text changes can follow a specific workflow in which feedback can be exchanged and discussed.This can lead to further text changes.The software can convert manuscripts to common formats (such as HTML, PDF, and DOCX), continuously publish changes as HTML on GitHub, automatically manage the bibliography, and enable citation through persistent identifiers (such as DOI, ISBN, and URLs).Although writing can be done synchronously, students' individual changes are only visible after they have transferred the changed document to the GitHub repository.Thus, a fluent synchronous collaboration, where all changes are visible to all members as fast as possible, is not fully supported.

C. Collaborative Real-Time Text Editors
Collaborative real-time text editors (such as Google Docs, Overleaf, Microsoft Office 365, Collabora, OnlyOffice, Zoho Writer, CodiMD, Dropbox Paper, Authorea, and Etherpad Lite) allow web-based, near-real-time synchronous, or asynchronous editing of text documents.The main difference from the previously mentioned solutions is that they display the text changes to all group members in near-real-time (WYSIWIS-"What You See Is What I See").Most editors offer a chat feature and the possibility of leaving comments on marked text passages.In the field of education, the proprietary editor Google Docs (e.g., [50], [51], [52]) and the open-source text editor Etherpad Lite (e.g., [53], [54], [55]) are frequently used.Google Docs does not meet the strict requirements of European information security and privacy regulations, such as the GDPR [56].Universities are not allowed to require students to use third-party tools that may force them Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
to make private data available to governments (cf.US Patriot Act).Furthermore, there are ethical concerns regarding privacy and data tracking when using Google Docs (e.g., [57], [58]).
Etherpad Lite can be operated on servers within the educational institution to ensure that third parties do not have data access.Using a single, or in the case of Etherpad Lite, a single-threaded, server with hundreds of groups and thousands of users, can lead to delays, limit responsiveness, and pose a risk to high availability/robustness.While running separate parallel Etherpad Lite servers allows smooth operation without delays [59], this requires the setting up of servers and the distribution of groups across these servers.
Kumar et al. [60] describe an open-source collaborative text editor named LiteDoc.The architecture is based on a distributed system.Each document is divided into different sections that are stored in an emulated atomic single-writer multireader (SWMR) register.By using the distributed architecture and the SWMR register, the authors expect better (almost linear) scalability as the aforementioned bottleneck of a single-server solution is avoided.In addition, the editor only allows one user to write to a particular section at a time.Computationally intensive algorithms to ensure consistency are not needed, which is likely to result in increased performance and enhanced data consistency.
As the text editor is still under development, it is not yet usable.The restriction of one author per section limits the possibility of synchronous, fluent, collaborative, and reactive writing on the same section at any one time.Commenting or communication features are not mentioned by Kumar et al. [60].

D. Collaborative Peer-to-Peer Text Editors
In order to make synchronous web-based writing more delaytolerant, collaborative peer-to-peer (p2p) text editing systems have been developed (e.g., [61], [62], [63]).In contrast to the centralized approach, each peer has a local copy of the text document and sends its text modifications directly to all others, making the system more scalable and fault tolerant [61].To our knowledge, there is currently no mature solution for a p2p text editor that can be considered suitable for large-scale distance education settings.The reasons for this are as follows: First, many existing architectures are only usable under the specific conditions assumed during their development or perform poorly under realistic workloads [64].Second, time-independent asynchronous editing of text documents required by our model of collaborative writing tasks is not supported by current mature approaches.New approaches, such as the p2p collaboration software PushPin currently under development, try to overcome this limitation by using a so-called storage peer, a Unix daemon that stores the data of offline peers on a central server [65].

E. Microservice-Based Solutions
Tapia et al. [59] realized a scalable approach through a microservice architecture using the Docker software.To increase scalability and robustness, their architecture allows the deployment of multiple Etherpad Lite instances, each with its own dedicated database instance.An Etherpad Lite instance is provided to each registered user, allowing him or her to create and share text documents with other users via a special link.This can be seen as the basis for a scalable CLP, which additionally needs to organize the learning process, divide students into groups, and set up and make Etherpad Lite instances available to each group.

F. Frameworks
All aforementioned solutions aim to provide a basic (partially responsive, scalable, and fault tolerant) solution for collaborative text editing, communication, and coordination support.Levrai and Bolster [47] describe a framework, which uses the LMS Moodle [66] in combination with the Google Docs collaborative text editor and the Stormboard collaborative workflow platform, to offer students the possibility to write essays collaboratively.
One teacher per 24 students (eight groups of three) creates and monitors 1 Moodle forum, 1 Stormboard Whiteboard, and 1 Google Docs document per group, and provides feedback and assessments for each group.Thus, teacher effort is limited to providing and maintaining eight collaborative essay writing environments.In the case of universities with few teachers and a course with thousands of students, this approach is not feasible, as the workload for setting up and maintaining hundreds of shared group workspaces, assigning the group members to them, and providing easy access to the group documents would be very demanding.In addition, students need to be able to extract their text documents from Google Docs and upload them to the Moodle forum, which runs the risk of incorrect documents being uploaded or failing to upload at all.Therefore, further research is needed to enable a few teachers to run courses with hundreds of groups and thousands of students and to avoid students having to explicitly transfer data or documents between tools.Furthermore, contrary to most distance learning formats, online collaboration was accompanied by some face-to-face meetings between students and teachers.

G. Shared Workspaces
Synchronous audio-video conferencing systems, such as Zoom [67], plugNmeet [68], or Microsoft Teams [69] can be used in combination with collaborative real-time, or p2p editors, to extend direct communication abilities to the shared workspace environment (e.g., [70]).These tools themselves do not facilitate the writing or learning process; e.g., by providing feedback or assessment functionalities.
The iWrite environment supports collaborative learning activities for large numbers of students using Google Docs [8].iWrite allows teachers to group students manually.It automatically creates and assigns students to Google Docs group documents and includes a peer review feature that provides PDF snapshots of submitted draft documents to predefined peers, tutors, or lecturers, and offers them the opportunity to write a feedback.The environment supports three different phases: a Draft Writing phase, a Review phase, and a Revision phase, in which feedback can be used for text improvements.At the end of the last phase, a PDF copy of the document is stored for teacher use, and the Google Docs document is automatically set to read-only.
VTIE [71] is a prototype of a collaborative writing environment that allows students to write scientific reports in the context of school education.In the preparation phase, it allows teachers to define tasks for collaborative report writing, to manually divide students into learning groups, and to assign an individual section of the report to be developed to each student.A so-called ScrapBook allows students to share their research results (collected texts, images, links, and notes) with the other group members.A simple WYSIWYG editor allows each student to create and edit his or her specific section.The creation of the section goes through several cycles until the text is finalized.Each of these cycles consists of two phases in which the respective student first creates, edits, and submits his/her text before the other team members (or the teacher) give feedback on it.However, the solution does not support the automatic creation and assignment of a collaborative virtual workspace or the interactive writing of the whole text.
TC3 [72] facilitates collaborative writing between pairs of students.It provides information sources, a private notepad for each user, a chat that optionally displays the history, and a simple shared (turn-taking) word processor that documents text changes in near real time.However, it does not facilitate large-scale synchronous joint writing and learning, e.g., through the use of feedback or assessment functionality.The turn-taking shared word processor hinders synchronous fluent collaboration.
The CURE platform [13] provides rooms in the form of shared workspaces for collaborative learning.Rooms contain educational material as well as tools for synchronous and asynchronous communication, coordination, and collaboration.The learning process can be structured by setting up different rooms and links between them.Access to rooms can be restricted by role-specific access permissions.The platform supports manual group formation and group maintenance.However, CURE did not address synchronous collaborative writing, scalability to large numbers of concurrent groups, or the automatic creation and assignment of rooms to groups.
To the best of our knowledge, there is currently no comprehensive solution for supporting collaborative writing tasks in largescale distance education courses.Teachers need support for the planning, specifying, executing, monitoring, and evaluating of collaborative writing tasks in their course.A highly scalable and robust platform is needed, which makes it easy for teachers to plan and specify collaborative writing tasks, create groups, automatically assign members to group workspaces, and enable asynchronous as well as synchronous text editing and group communication while respecting the European information security and data protection requirements.In addition, teachers must be able to monitor group behavior to provide feedback, solve problems or answer questions, assess student and group performance, and evaluate the learning experience.

V. PLATFORM FOR COLLABORATIVE WRITING TASKS IN LARGE-SCALE DISTANCE EDUCATION COURSES
In this section, we describe a CLP that addresses all the requirements mentioned in Section III.According to Piotrowski [7, p. 21], "The functionality of e-learning platforms typically includes access to learning content and tests, communication, and collaboration tools for students, as well as course management and assessment facilities for teachers."In this sense, we define our solution as a platform, since it enables collaborative writing for large-scale distance courses by providing learning material, communication, collaboration tools, and feedback functions for students, as well as planning, specification, execution and management, monitoring, feedback, assessment, and evaluation functionalities for teachers.The following section explains the architecture of the CLP (see Fig. 2), which is described in the sequence of phases that we defined in the collaborative writing task process model (see Section II, Fig. 1).Section VI describes the implementation.

A. Preparation Phase
In the preparation phase (see Section II), teachers can create collaborative writing tasks using interactive collaborative writing activities in the online courses of the LMS by defining the didactical design for each task in its collaborative writing activity configuration (cf.R1).
The collaborative writing activity configuration supports the specification of manual group formation (self-enrollment or assignment by the teacher) (cf.R2).Groups are formed, as specified, either using the functionalities of the LMS or via a group formation plug-in that performs a random or criteria-based group composition and returns the set of groups with their group members.Using the core application programming Interfaces (APIs) of the LMS (cf.R7), the collaborative writing activity plug-in stores the specification of the groups formed in its configuration.
In the configuration, teachers can specify the task and the instructions, as well as the links to the materials needed to perform the task (e.g., PDF files, web links, HTML texts, and pictures).They can also define the exact sequence of the activities (of the types: text production activity and peer review activity) that students must perform.For each activity, teachers may assign a unique name, its type, instructions, and start and end dates.The collaborative writing activity configuration ensures that the first activity is always a text production activity.In the case of peer review activities, teachers are required to additionally specify the review assignment (i.e., which group reviews which other group), and they can optionally define a questionnaire for guided formative feedback.The questionnaire consists of a list of questions that can be answered either on a rating scale or in the form of a free text response of limited length depending on the definition.
The configuration defines the entire collaborative writing phase including its activities (cf.R10) and the timing of the automatic phase transitions (cf.R11), the number of iterations (text production and peer review activity cycles) required before the assessment phase (cf.R20), and the teacher-defined questionnaires for peer review activities (cf.R18).Finally, teachers choose the type of grading (none, scale, point, percent, or free text) that they want to use for this activity in the assessment phase (cf.R25).Once the collaborative writing activity configuration is complete, the collaborative writing activity plug-in can automatically create a collaborative virtual workspace for each group and assign the members to it (cf.R6).This ends the preparation phase, and the collaborative writing phase can begin on the start date defined by the first text production activity.

B. Collaborative Writing Phase
When students log into the LMS and enter an online course, all course-related activities are displayed, including collaborative writing activities.When they select an activity, the collaborative writing activity plug-in presents the collaborative virtual workspace of the respective group (cf.R9).The workspace contains all instructions needed for the collaborative writing phase and its activities (cf.R10, R11).
The collaborative writing activity enables asynchronous and synchronous collaboration on a shared text document (cf.R14).As the LMS is primarily designed to support the asynchronous learning of individual students in a scalable manner (cf.R30), it provides neither the support for scalable synchronous collaboration within student groups (cf.R30) nor for sufficient responsiveness (cf.R31) that is needed for this type of interaction.Therefore, we added an extension to the LMS, the collaboration environment, which is specifically designed for scalable synchronous collaboration within large numbers of student groups (see Fig. 2).The LMS provides and manages the collaborative writing activity, the collaborative virtual workspaces, and facilitates assessment and phase transitions.The collaboration environment provides the tools for the synchronous and asynchronous collaborative writing, communication, and coordination, which are embedded and presented in the workspace in the LMS (cf.R13, R16, and R14).In order to require only a single login to the LMS (cf.R5), the collaboration environment also performs the user authentication for login into the tools.
The collaboration environment provides each group with a group instance, which presents all the tools required to perform the collaborative writing task within the group's workspace, as specified in the collaborative writing activity configuration.
A collaborative text editor supports both, the text production activity and the peer review activity (cf.R19).In order to provide the groups with their own protected workspace and to grant full access only to the relevant members and teachers, the collaborative text editor employs group-based access control (cf.R9).
In addition, the editor stores all user interactions in the group instance database for the monitoring of group behavior and to evaluate group interactions (cf.R21).It also provides at least one dedicated communication channel (e.g., a text chat) so that the group members can communicate and coordinate with each other (cf.R16).The editor is designed to facilitate synchronous as well as asynchronous collaboration between group members (cf.R14).
To ensure fluent collaboration by providing fast responses for each group, the architecture of the collaboration environment (see Fig. 2) supports vertical and horizontal scalability [73, p. 6].Depending on the system performance of the collaboration environment and the resource needs of the group instances, vertical scaling on one server can be achieved by exploiting the multicore architecture of modern server processors.Horizontal scaling can be archived by distributing the group instances among a pool of server nodes.Thereby, server or node overload can be avoided.Isolating the execution of tools within a group instance also contributes to robustness (cf.R32), since errors in one group instance do not affect other ones.With sufficient computing resources (e.g., processor power or number of server nodes), the architecture allows scalability (cf.R31) to large number of groups and users (cf.R30).In this way, sufficient responsiveness can be achieved for each group instance (cf.R15).
Each group instance contains a group instance database that persists and manages all data generated by its collaborative text editor.Since one group produces only a relatively small amount of data, a low-resource and distributed Database Management System is sufficient, reducing the resource needs of each group instance.As a result, only one collaborative text editor is affected by the breakdown of a group instance database.This increases robustness (cf.R32).
In the case of a crash of the collaborative text editor or the group instance database, the state of the database may be corrupted.Therefore, a previous consistent state is kept in the fusion database as a backup.With its help, this state can be restored in the group instance database.This increases the robustness of the system with respect to data losses (cf.R32).The frequency of the backup process can range from continuous to every few minutes as soon as the collaborative text editor is no longer in use.A lower frequency rate saves system resources, which contributes to scalability (cf.R30).If the backup process between a group instance database and the fusion database crashes, the administration of the CLP is immediately informed Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
by the group instance management (see ahead).If the fusion database gets corrupted, it can be rebuilt from the data of the group instance database of each group instance.
When a student enters the learning management system and opens the collaborative writing activity, the collaborative writing activity plug-in requests the group instance management to allow access to the collaborative text editor of the group-specific group instance and embeds the editor into the collaborative virtual workspace of the student, with the activity-related document already opened.
The group instance management creates and manages the group instances.To minimize the use of hardware resources, it automatically starts these instances when they are required and/or stops them when they are no longer in use.This increases the scalability (cf.R31) and the responsiveness (cf.R30) of the collaboration environment because server resources are only occupied by group instances that are currently in use.The shutdown of unused group instances also increases the robustness of the collaboration environment by avoiding errors that can be caused by group instances not in use (cf.R32).The group instance management also employs a group-based access control to authorize users of the LMS for the use of the collaborative text editor (cf.R5), only granting access to group members, teachers, or peer group members (cf.R9).To enhance robustness (cf.R32), it acts as a watchdog [73, p. 260 ff.], restarting any group instance in the event of an error.The group management component immediately informs the administration of the collaborative writing platform (cf.R33) when it cannot resolve a problem relating to itself or to a group instance.Data necessary for the management of the group instances are stored in its own database.
Communication between the collaborative writing activity plug-in and the collaborative text editor embedded in the workspace uses the reverse proxy of the collaboration environment.As a central mediator, the proxy facilitates user access to the corresponding collaborative text editor and to the group instance management by making them accessible under certain web paths.As a load-balancing mechanism, it is essential for the scalability of the system (cf.R30).It also increases security because it supervises access to collaborative text editors and the group instance management (cf.R27).
So far, we have described how the collaboration environment supports text production activities.When a peer review activity begins, each member of the peer review group is given access to the collaborative text editor of the group instance of the group being reviewed.Access is limited to reading and annotating the text, and responding to comments on annotations (cf.R19).In the peer review activity, the group reviewed cannot change the text it created in the previous text production activity.To guide members of the peer review group in creating separate formative feedback (cf.R18), the collaborative writing activity plug-in presents the teacher-defined questionnaire, specified in the collaborative writing activity configuration.It also ensures that all text production and peer review activities are executed in the sequence specified in the earlier activity configuration (cf.

R20).
In order to support the monitoring activity of the teachers, the collaborative writing activity plug-in provides the group instance management with the data of each collaborative writing activity, which includes group formation, specification of each group, teacher-defined questionnaires, answers to the questionnaires, and assignment of peer groups.The group instance management component stores this data in the fusion database.Together with the data from regular backups of the group instance database, the fusion database contains a complete record of all activities of all groups that can be used to monitor group behavior, evaluate group interactions, and identify potential group problems (cf.R21).With the help of the group instance management, the collaborative writing activity plug-in ensures that teachers can directly access the collaborative virtual workspace of each group in each phase without restrictions (cf.R24).Teachers have the ability to directly communicate with students in the collaborative virtual workspace (cf.R23), whereas students directly communicate with teachers using the functionalities of the LMS (cf.R22).

C. Student Assessment Phase
Teachers employ the gradebook provided by the LMS for the grading of all students as specified in the collaborative writing activity configuration.For this purpose, the collaborative writing activity plug-in modifies the gradebook so that teachers can enter each student's grade in the form specified in the earlier collaborative writing activity configuration (cf.R25).Additionally, teachers can provide summative feedback using the gradebook (cf.R17).
Teachers can also evaluate the text produced in each collaborative writing activity at any time by utilizing full access to each group's collaborative virtual workspace and the embedded collaborative text editor (cf.R24, R28).For the purpose of gaining insight into group behavior and group interactions (cf.R21), teachers can analyze the data of the fusion database.

D. Learning Design Evaluation Phase
In order to evaluate the learning design of the collaborative writing tasks and the students' learning experiences, teachers and researchers can access all monitoring data, texts produced in the collaborative writing activities, and assessment results (cf.R26).Insights obtained from this information may help to improve the learning design.
When processing personal data, the CLP ensures compliance with the GDPR (cf.R27).Since the CLP operates and stores data on servers located within the organization (e.g., the university), unauthorized access by third parties can be prevented, and strict access control can be used.

VI. IMPLEMENTATION
The conceptual architecture of a CLP shown in Fig. 2 consists of two parts: 1) the LMS and 2) a collaboration environment.Our implementation is shown in Fig. 3.We use Moodle [66] to implement the LMS part of the conceptual architecture while Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the collaboration environment part is implemented using the Docker infrastructure [74].In Fig. 3, the areas in gray denote the Docker containers.Arrows between boxes represent control and data flow, and arrows to and from databases represent the data flow.Dotted rectangles show the group instances that are implemented as container pairs of the collaborative text editor and the group instance database.Dashed rectangles indicate instances of learning activity modules that are instantiated at runtime.

A. Learning Management System
The LMS Moodle [66] is widely used at the FernUniversität in Hagen as a computer-supported learning environment for distance learning.Therefore, we use it to implement the CLP functionalities for setting up, organizing, and running the proposed collaborative writing tasks (cf.R4).
The Moodle database is used to store data about users, courses, course contents, learning activities, plug-in data, communication, and the configuration of the LMS.Moodle provides a set of core APIs that allow Moodle plug-ins to access its functionalities and data structures.We use the following APIs for our implementation of the collaborative writing task: data manipulation API, group API, access API, web service API, gradebook API, and form API. The data manipulation API provides consistent and secure reading and writing to the Moodle database.
In Moodle, a learning activity is defined as a feature through which students learn by interacting with resources, other students, and/or teachers (cf.[75]).Examples of learning activities are quizzes, forums, wikis, and feedback.We define the collaborative writing activity as a new type of learning activity that supports the phases defined in the collaborative writing task process model discussed earlier (see Section II).For the implementation of a learning activity, Moodle requires the user to specify an activity module plug-in.

B. Collaborative Writing Activity Plug-In
To implement the collaborative writing activity, we developed a new Moodle activity module plug-in: the collaborative writing activity plug-In.It enriches Moodle so that teachers can create and execute interactive collaborative writing activities that implement collaborative writing tasks, within online courses.This plug-in allows teachers to create and edit the configuration for each respective collaborative writing activity in the planning phase.Teachers can define the didactic design for each collaborative writing activity by specifying the respective collaborative writing activity configuration (cf.R1) using Moodle's standard functionalities.Moodle activity plug-ins have their own setting.To ensure secure and authorized access to the collaboration environment, the URL of the group instance management component of the collaboration environment and a token to authorize access to the management component are stored in the collaborative writing activity plug-in setting.

1) Configuration:
The collaborative writing activity plug-in uses the Moodle core form API, which enables the creation and management of consistent and secure web forms, to create and manage a collaborative writing activity configuration form.This form is used to configure the setting for a collaborative writing activity.In the form, teachers can choose between manual group formation (self-enrollment or teacher assignment) or automatic group formation (cf.R7).The teachers can use the Moodle functions to create a set of groups or let students self-enroll in the groups themselves via a Moodle assign activity.Alternatively, a set of groups can be created automatically by using functions of current Moodle versions (automatic and random group formation) (cf.R7, R8) or by using external plug-ins, such as [76], to generate a criteria-based group composition (cf.R7).The current implementation of the collaborative writing activity plug-in uses the Moodle core groups API to access a previously created set of groups and then stores a reference to it in the collaborative writing activity configuration.
In the configuration form, teachers also specify the task and the instructions, and set links to the materials needed (cf.R1).
Since the collaborative writing activity is composed of text production activities and peer review activities, teachers can specify the order of the activities to be performed in the configuration form.For each activity, teachers can specify a unique name, the activity type (text production or peer review activity), the instructions, and the start and end dates.
In the case of peer review activities, teachers use the configuration form to specify the assignment of reviewing groups to reviewed groups and set up a questionnaire for guiding formative feedback (cf.R18).In the form, teachers can define a list of questions with the respective assessment type (rating scale or free-text answer with limited length).
Finally, teachers choose in the configuration form the type of grading (none, scale, point, percent, or free text) to be used in the assessment phase (cf.R25).The chosen grading type is stored Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. in the gradebook of the respective collaborative writing activity using Moodle's core gradebook API.
Except for the grading specification, all data of the collaborative writing activity configuration is stored in the Moodle database via the data manipulation API.
2) Execution: If a user opens a collaborative writing activity, the collaborative writing activity plug-in reads the respective configuration from the Moodle database.To ensure that each group is provided with its own collaborative virtual workspace (cf.R6), the plug-in displays the current state of the activity only if the user is a member of the respective group, as stated in the configuration.
The collaborative virtual workspace consists of three pages: one page for displaying the task, instructions, and the links to the required materials; a second page for the text production activity; and a third page for the peer review activity.In this way, the required functionalities and information are made available to the user depending on the current phase (cf.R10).
Fig. 4 shows an example of the collaborative virtual workspace of a student who is currently in the collaborative writing phase and performing a text production activity (text blurred for privacy reasons).The upper left area shows a timetable (1) that informs the student about the start time and the end time of all phases, as specified in the configuration (cf.R11).A navigation menu (2) allows users to switch between the pages required in the current phase.
To ensure scalability, a separate instance of the editor Etherpad Lite is used for each group to implement the collaborative text editor (cf.R13).The toolbar (3) offers basic text format settings, such as font selection, font size, italic, bold, and the selection of unordered lists or ordered lists.Another toolbar (4) allows the user to export or import the document, save an important version, view the editing history, or see who is currently online.The actual text document ( 5) is displayed below the two toolbars.A color is assigned to each author in the document.Different colors in the text, therefore, indicate different authors and facilitate group awareness (cf.R16).The anchor points of an annotation are highlighted in yellow in the text.The content of the annotation ( 6) is displayed on the right-hand side.The integrated text chat ( 7) is shown at the bottom right of the screen.It can be used for asynchronous and synchronous communication and coordination (cf.R16).
The collaborative writing activity plug-in provides six major methods as follows.
1) Retrieval of the task, its instructions, and links to the materials needed to perform the task from the Moodle database.2) Presentation of the information and tools needed in the current phase.In the case of the collaborative writing phase, this includes the time sequence of the various activities, the unique names, the activity type (text production or peer review activity), and the instructions for the activities from the activity configuration.3) Provision of a URL and a user session token for access to the collaborative text editor of the respective group via the group instance management component.4) Provision of a URL and a user session token for access to the collaborative text editor of each group that is to be reviewed via the group instance management.5) Provision of the questionnaire defined by a teacher for a peer review activity.6) Access to a user's answers to a teacher-defined questionnaire within a peer review activity.The collaborative writing activity plug-in caches the data returned by these methods as long as the user is using the plug-in.The plug-in uses further methods to store the user's responses to a teacher-defined questionnaire of a peer review activity in both the Moodle database, via the data manipulation API, and in the fusion database by sending them to the group instance management component.
All of the aforementioned methods ensure that the current phase and the user permissions allow for execution.If this is not the case, the execution is aborted, and an error message is displayed to the user.

C. Collaboration Environment
The Docker [74] infrastructure is used for deploying containers, virtual networks, and volumes: Containers [77] are used to isolate the components of the collaboration environment against each other to support vertical and horizontal scalability and robustness (cf.R30, R32).Virtual networks provided by Docker [78] are used to support and secure communication between different containers.Volumes are used to maintain persistent data for the database components (i.e., the group instance management database, the fusion database, and the group instance database).Furthermore, volumes enable the recovery of data from containers that have become inaccessible, e.g., due to runtime errors, deadlocks, or infinite loops.An inaccessible container can be recreated immediately, and the data can be restored by mounting the corresponding volume of the container (cf.R32).
1) Group Instance: A group instance is implemented as a pair of containers: one container for the collaborative text editor and one for the group instance database.To increase robustness in the event of a crash of the editor, the container of the editor can be restarted or recreated using the group instance database (cf.R32).A virtual network is set up for communication between these two containers.Security is increased by isolating the editor from the database container and by restricting access to the database.
The collaborative text editor was implemented using the free, open-source real-time editor Etherpad Lite [79], which is available under the Apache License 2.0 (cf.R29).Etherpad Lite is simple to use, easily deployed, and highly adaptable.The lack of scalability, responsiveness, and robustness for a large number of students and groups, as mentioned in Section IV, is addressed by providing each group with its own Etherpad Lite server and a separate group instance database (cf.R32, R31, and R32).The group's Etherpad Lite server provides fluent collaborative near-real-time editing (cf.R15 and R13) of text documents for small-to-medium-sized groups.It supports synchronous or asynchronous editing (cf.R14) and offers an integrated document-based, near-real-time chat (cf.R16).Etherpad Lite is open source and supports its extension via third-party or self-developed plug-ins.A list of online users and the color-coding of the text to identify the different authors facilitates group awareness (cf.R12).
Group-based access control is implemented using the Etherpad Lite plug-in ep_auth_sessions [80].When a user accesses the Etherpad Lite server, this plug-in verifies the validity of the user's session token, provided as a URL parameter, and authorizes the user's access to the group's text documents.
The monitoring functionality (cf.R21) was implemented using a newly developed Etherpad Lite Tracking Plug-In that logs the following data for each user: 1) connection times of the user to the editor; 2) browser times indicating when the browser tab containing the editor is in use or not in use by the person editing the text; 3) events when the user opens or closes the editor chat; 4) scroll positions changes in the text document; 5) scrolling positions in the chat.By combining the Etherpad Lite log data (e.g., chat messages, change sets for text, or format modifications) with data from the tracking plug-in, teachers can monitor the reading and writing behavior of each group and its members and also analyze group interactions (cf.R21).
The required annotation functionality (cf.R19) was implemented by the Annotation Plug-in.This plug-in is based on the Etherpad Lite plug-in ep_comments_page [81], which allows color highlighting of text passages selected by the user, as well as shared discussions through comments on the selected text passages.It also allows users to suggest changes that may alter the text.During a peer review activity of the collaborative writing task, peers are encouraged to provide feedback through text comments but are not allowed to change the text that they are reviewing.Therefore, the "suggested change" functionality of the ep_comments_page was removed from the annotation plug-in.
The group instance database was implemented using the document-oriented database management system CouchDB [82] because it is well suited for horizontal scalability (cf.R30).As a distributed database, CouchDB is robust against network or hardware failures (cf.R32).It maintains and manages all data generated by the respective Etherpad Lite server, including tracking plug-in and annotation plug-in.
2) Reverse Proxy and Group Instance Management: The reverse proxy and the group instance management components are combined into a single container for two reasons: First, the management component must be able to modify the configuration of the proxy to reflect the hostname, port number, HTTP protocol, and web path of currently running collaborative text editor containers.Second, the management component must add all editor containers and the container of the reverse proxy and group instance management component to a virtual network via Docker.The reverse proxy is realized with the software Nginx [83].
Group instance management is implemented using the crossplatform JavaScript runtime environment Node.js [84] together with the scripting language TypeScript [85].Because Node.js works asynchronously and is event-driven, it is highly scalable and can process many concurrent requests [84].It is only active when requests are pending.Thus, it saves system resources and increases scalability due to its nonblocking behavior (cf.R30).To permit the group instance management access to Docker via the Docker API, the Node.jspackage Dockerode [86] is used.The management component also provides an API that allows the methods of the collaborative writing activity plug-in to send requests to the management component.The API is implemented with the Node.jsweb framework Express [87].Express allows us to handle HTTP and HTTPS requests.It is considered to be fast, minimal, and robust (cf.R32).Requests and responses use the JSON format.
The group instance management creates and manages group instances.When a group member requests to access a specific text editor for the first time, the management component of the collaborative writing activity plug-in creates the configurations of the collaborative text editor container, the group instance database container, the group instance database volume, and the virtual network that connects the editor container to the database container.Then, it requests Docker to create the corresponding containers, volumes, and virtual networks.Afterward, it stores the aforementioned configurations with the resulting Docker IDs of containers, volume, and virtual network in the group instance management database.Next, the management component instructs Docker to start the editor container and the group instance database container and initializes the editor, taking into account the group, the requesting group member, the text document, and the token of the user session.In the case of an error, the administration of the CLP is informed; otherwise, the management component updates the reverse proxy configuration by adding a so-called webpath for the newly created editor.Finally, a URL containing the host address of the reverse proxy, the webpath to the respective editor, and the token of the user session is returned to the collaborative writing activity plug-in.
When the collaborative writing activity plug-in of the group member requests further access to a specific editor later on, the group instance management checks whether the editor container and group instance database container are running.If not, the management component instructs Docker to start the editor container and group instance database container.The management component updates the reverse proxy configuration by adding a webpath to the editor started.In both cases, the management component updates the editor, by means of the requesting group member and user session token.Finally, a URL containing the host address of the reverse proxy, the webpath to the respective editor, and the user session token is returned to the collaborative writing activity plug-in.
To minimize the usage of hardware resources, the group instance management component implements a watchdog pattern that automatically stops group instances when they are not used.For this purpose, it creates one watchdog per group instance.The watchdog uses the HTTP-Client-Library Axios [88] of Node.js to check if the respective Etherpad Lite is in use by examining the Etherpad Lite statistics page.If it is not in use, the management component stops this editor container and the corresponding group instance database container using Docker.To increase robustness (cf.R32), the watchdog periodically checks (currently every 30 s) to determine whether the respective editor container and group instance database container are running.If a container is not running due to an error, the watchdog uses Docker to start the respective containers.The responsiveness (cf.R31) of the collaboration environment may be degraded if too many group instances are running on one host.Since resources are only occupied by instances that are currently in use, this should be avoided.The shutdown of unused instances increases robustness because errors related to the editor or database of stopped group instances are avoided by preventing execution beyond their use (cf.R32).The group instance management component, managing the editor container and the group instance database container, behaves as a container manager [89].To further increase robustness, the management component ensures that when an editor container is stopped, the CouchDB replication feature is used to synchronize the respective group instance database with the fusion database.Afterward, the management component stops the group instance database.The interaction between the management component, group instance databases, and the fusion database is facilitated by the Node.jspackage Nano [90].
When a problem arises that the group instance management component cannot solve (e.g., docker-related errors, database conflicts, or errors during replication), it immediately informs the CLP administration by using the Telegram Messenger (see ahead) (cf.R33).
3) Group Instance Management Database: For safety reasons, the group instance management database is implemented in a separate container.Secure communication between the group instance management and group instance management database takes place in a separate virtual network.The database management system PostgreSQL [91] is used to store management data regarding all group instances, including groupID, configurations of the editor container, the group instance database container, the volume of the group instance database, the virtual network between the editor container and the group instance database container for each group, as well as the Docker IDs of both containers, volume, and virtual network of the group.

4) Fusion Database:
For safety reasons, the fusion database is implemented in a separate container.Communication between group instance management and fusion database, as well as between fusion database and group instance database, is secured by separate virtual networks.The fusion database is implemented by using the database management system CouchDB because of its replication feature.The management component uses this feature to synchronize the respective group instance database with the fusion database when it stops the collaborative text editor and before its group instance database is stopped.

5) Telegram Messenger:
The group instance management component uses the Telegram Messenger to inform the system administration when a problem arises that it cannot solve (cf.R33).For this purpose, the administration of the collaboration environment defines a Telegram channel and specifies two environment variables of the management component, a chat ID and a security token, used to access this channel.The management component uses the Node.jspackage Telegraf [92] to send text messages to the Telegram channel [93].

VII. EXPERIENCES
The CLP described in this article has been used in the winter semester 2021/2022, the summer semester 2022, and the winter semester 2022/2023 in a course on introduction to scientific work in the B.Sc. of psychology at the FernUniversität in Hagen (the German Distance Learning University).The number of students enrolled in the course varies strongly between winter (∼2000 to 2400 students) and summer semesters (∼350 students) due to high school graduation cycles (see Participation in Table I).Most students begin their studies in the fall after completing high school in the summer.In the winter semester 2022/2023, 694 of the 1931 students enrolled in the course participated in a demographic survey.About 52.7% were under the age of 30 and 46.8% were 30 years and older.Furthermore, 69.8% indicated a female and 28.7% a male gender.About 25.5% of students indicated having a migration background and 14.0% reported having a non-German mother tongue.In total, 35.3% reported to be full-time employed.In this course, the teachers formed groups of eight students who were asked to collaboratively write a summary in German of a selected research article written in English.
In the preparation phase of the course, one of the teachers created the respective collaborative writing activities and manually assigned students to the Moodle course groups.Before using the platform, students and teachers gave their informed consent for their data to be used both for carrying out the collaborative writing task and for research purposes.
The teachers successfully used the process model for collaborative writing tasks to specify their summary writing task.They divided the task of summarizing the research article into the following four parts (see Fig. 5): 1) the theory part, 2) the methods part, 3) the results part, and 4) the discussion part.
For each part, the teachers created a dedicated collective writing activity.The students in the groups were instructed to collaboratively write each part in the form of reactive writing.They had six weeks to complete the task.In the first three weeks, the theory part and the method part had to be written followed by the results and discussion part in the last three weeks.Finally, all parts had to undergo peer review.
In the winter semester 2021/2022, a simplified task without peer review activity was used.In the summer semester 2022 and the winter semester 2022/2023, the complete task including peer review of the individual parts by another student group was assigned.Here, each student in the review group could annotate passages of the draft text that they were asked to review and answer the peer review questionnaire.The questionnaire, prepared by the teacher, contained simple questions (e.g., "Are the results briefly summarized?," "Is the submitted text structured and do its parts build on each other logically?,"and "Does the linguistic style follow the guidelines presented?")that were to be answered on a rating scale.Therefore, exactly one text production activity in each collaborative writing activity was defined, followed by one peer review activity.The two collaborative writing activities in each three-week period used the same start and end dates for each activity, allowing concurrent work.After the six-week period ended, the students could not make any more changes, and the peer review began, followed by the respective student assessment phase.Because a written exam was scheduled at the end of the course, the teachers did not use the opportunity to give summative feedback.Only active participation was seen as a necessary requirement for successfully completing the task of writing a collaborative summary.
Teachers randomly assigned the peer review groups to the writing groups using the collaborative writing activity plug-in.The peer reviews provided the students with formative feedback.Due to time constraints in the course, no final text production activity was added after the peer review activity.Thus, the students were not able to further improve their texts.

A. Fulfillment of Requirements
The fulfillment of the functional requirements identified in Section III has already been discussed in Sections V and VI.Teachers were able to create Moodle course groups for a large number of enrolled students (see Table I).The collaborative writing activity plug-in automatically created a collaborative virtual workspace for each group and assigned the group members to it.To provide each group with its own group instance, the group instance management component provided a dedicated pair of Etherpad Lite and CouchDB (group instance database) instances for each participating group.In order to minimize the use of hardware resources, the group instance management component automatically started and stopped the components of the group instances when they were used or not used, respectively.The Nginx reverse proxy was able to distribute the various user accesses to the different Etherpads with sufficient performance.With the large number of up to 2343 students (see Table I), this approach worked well.As shown in Fig. 6, user activities resulted in a maximum of 800 processed messages per minute (dotted line) without performance issues when all containers of the collaboration environment were running on single-server hardware.The architecture described allows for the distribution of containers across several servers (nodes).This achieved the necessary scalability (cf.R30).
Table I shows the semester-based statistical data of collaborative summary writing activities.The number of students enrolled in the project course varies greatly between the winter (∼2000 to 2400 students) and summer semesters (∼350 students).Since students are not required to successfully complete the course and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
are free to choose the semester in which they take the written exam, it is also typical that not all enrolled students actively participate in the respective course.A student is considered participating if the student performed at least one activity in the collaborative text editor (e.g., login, editing, or commenting).Depending on the semester, the percentage of participation varies between just over 50% in the summer semesters and up to 86% in the winter semesters.In total, 11 teachers supervised 2343 students in the winter semester 2021/2022, resulting in a ratio of 213 students per teacher and demonstrating the need for platform support.
The platform proved to be reliable and fully operational during its use in the three semesters.It was consistently accessible, and no major problems or insufficient responsiveness were detected or reported by the students.
A small number of students initially had problems accessing the activity-related group documents in the collaborative text editor (Etherpad Lite) due to an ad blocker or strict browser policies that blocked cross-site authentication between the collaborative writing activity plug-in and the editor.These problems could be solved in cooperation with the teachers by recommending students to use different browsers or to disable the ad blocker.
In the case of errors, the group instance management component was able to restart the respective group instances within a few seconds (see Table I).The current state of each group instance database was successfully backed up each time an Etherpad Lite instance was stopped.Tests indicated that the data could also have been recovered from the fusion database to the respective group instance database.Therefore, the intended robustness (cf.R32) was achieved.Further tests showed that the group instance management component successfully informed the system administration via Telegram Messenger when problems arose that the management component could not handle by itself (fulfilling R33).

B. Usage Experiences
The high percentage of participating groups ( 97.8%) (cf.Table I) indicated that students accepted the CLP.The percentage of participating students is within the usual range for the respective course.The high number of sessions and logged text change sets implies actual usage of the system for writing.Since 98% of the text change sets consisted of fewer than seven added characters, no extensive copy-paste behavior could be observed.
The log of the respective group's Etherpad Lite was used to identify the sessions of the group.All consecutive activities of a certain user in the log, which were less than 30 min apart, are considered a single user session (e.g., [94], [95]).Group sessions are identified through the temporal overlap of individual sessions of users in the same group.They denote time intervals when multiple group members simultaneously worked on the text document in the same Etherpad Lite instance.Similarly, all nonoverlapping student sessions of a group denote situations in which students worked on a text document individually.The low percentage of group sessions shows that students mainly used an asynchronous writing strategy with few synchronous group sessions.Overall, students used the CLP to produce the required summary following the collaborative writing process outlined in  Feedback from teachers shows that they find the CLP useful.No serious problems were experienced.Continued use of the platform is planned.
Teachers could request the platform administrator at any time to make the text documents, annotations, and chats of the groups, as well as the data of the fusion database, available for the evaluation of group interactions.This was conducted primarily for research purposes at the end of each semester.
From a researcher's perspective, the platform provides facilities to conduct studies and collect data.To the present date, several research groups have already used the platform to work on research questions related to social psychology, learning analytics, and collaborative writing patterns.The data collected by the CLP enabled teachers and researchers to analyze collaborative writing behavior.For example, data on sessions permitted analysis of collaborative behavior, such as the timing and sequencing of asynchronous and synchronous work phases.Fig. 7 shows the number of user-student sessions during the six-week period of the collaborative summary writing task.Not surprisingly, students are more active prior to the submission deadlines, at three or six weeks into the course.There were always some students who were working, which is to be expected of distance learning students, who have to balance work, family, and studies.Data on text changes permitted analysis of text production strategies (e.g., the absence of massive copy-paste behavior discussed earlier).The platform allows for data analysis of text editing behavior by using natural language processing approaches (e.g., by counting nouns, spelling errors, and topic coverage), descriptive statistics (e.g., analysis of added characters or comments per user), or process mining (e.g., identification of user sessions or writing strategies).

VIII. CONCLUSION
In distance education courses with large numbers of students, the organization and facilitation of collaborative writing tasks are challenging.When dealing with numerous students and groups, teachers need technology support for planning, specifying, executing, monitoring, and evaluating the collaborative writing tasks in their courses.The following two research questions need to be answered for the design of such technology support.
RQ1) What model of collaborative writing tasks is suitable for specifying such tasks in large-scale distance learning courses?RQ2) How can a CLP support the planning, specification, execution, monitoring, and evaluation of tasks according to the aforementioned model?The process model we presented in Section II answers RQ1.It generalizes features described in the literature on collaborative writing tasks.Using the model, teachers can construct concrete collaborative writing tasks to be used in their courses.The collaborative summary writing task described in Section VII demonstrates its applicability.
The platform design and implementation presented in (see Sections V and VI) answer RQ2.We used the process model to derive the functional requirements needed to support modelcompliant tasks on the platform.In addition, we specified nonfunctional requirements for the platform to be used in educational settings and with many thousands of users.While the conceptual platform design (see Section V) provides a general architecture for such a platform independent of a specific LMS or shared editor, Section VI shows how such a platform can be implemented based on Moodle, Etherpad Lite, and Docker.The platform makes it easier for teachers to create groups, automatically assigns members to group workspaces, and enables scalable asynchronous and synchronous text editing and group communication.It also respects the European information security and data protection regulations, helps to monitor the learning and writing process, and offers learning support in different learning and writing phases.
The support provided by the platform goes beyond current approaches in education that support collaborative writing tasks in a distance learning context.Usually, teachers are expected to compose and provide collaborative learning environments/tools from separate technologies (e.g., a learning management system, shared editors, and conferencing systems) and orchestrate the collaborative writing process.While this may work for a limited number of students and groups, this is not possible for a few teachers dealing with thousands of students and groups.Here, the platform supporting the entire life cycle of a collaborative writing task provides a unique benefit.
We applied the platform over a period of three semesters in distance learning courses with more than 4500 students in total.Our experiences show that the platform provided a scalable and robust environment for collaborative writing tasks and was accepted by teachers and students.
These results are beneficial for tool developers, teachers, and researchers.Tool developers can benefit from the requirements, platform architecture, and components when working on support for similar collaborative learning activities.Teachers are able to create collaborative writing tasks that can be executed and evaluated with less work on the presented CLP.Due to its robustness, scalability, and the wealth of interaction data stored, it enables analysis of collaborative writing behavior by teachers and researchers.
By mid-2024, we plan to release the implementation of our developed platform as open source 1 , 2 .In the future, we plan to analyze the impact of horizontal scalability by distributing group instances across different computer nodes.We are currently working on analyzing the writing and collaboration behavior of students.Furthermore, we plan to extend the CLP with a dashboard for both teachers and students, which will support both monitoring and self-regulated learning.Finally, we are exploring adaptive support for students in the collaborative writing process.

Fig. 1 .
Fig. 1.Process model of collaborative writing tasks.The transitions in this figure define when documents are passed to the next phase.

Fig. 5 .
Fig. 5. Overview of course progression.Each row represents a collaborative writing activity.Gray rectangles represent the phases and activities.

Fig. 6 .
Fig. 6.Etherpad Lite messages processed per minute: Dashed line denotes maximum of all days, dash-dotted line denotes maximum average of all days; solid line denotes average of the day; dotted line denotes three-week interval.

Fig. 7 .
Fig. 7. Number of student sessions per minute: Dashed line denotes maximum of all days, dash-dotted line denotes maximum average of all days; solid line denotes average of the day; dotted line denotes three-week interval.
the instructions and specified by the sequence of collaborative writing activities.The increasing number of teacher sessions, in which teachers used the platform for monitoring, rose from 31 in winter semester 2021/2022 to 333 in winter semester 2022/2023 with a slight decrease in the number of teachers (11 in winter semester 2021/2022 versus 9 in summer semester 2022 and winter semester 2022/2023) can be interpreted as an indicator of the growing acceptance of the platform by teachers.