Community

Building a library
Building a community

Aim
The aim of this community page is to bring together researchers and to build a library containing short descriptions of keystroke logging data collections and/or scripts (e.g., R or Python). We briefly describe these materials and refer to the repositories researchers used to share the materials (e.g., Zenodo, OSF, Github, etc.).

In the next stage, discussion fora and other knowledge bases will be added to further enforce the keystroke logging community in writing and translation studies.

Don’t hesitate to contact us!

Tian, Y., Crossley, S. A., & Van Waes, L. (2025). The KLiCKe Corpus: Keystroke Logging in Compositions for Knowledge Evaluation.

Data: ⌧ Inputlog □ Scriptlog x Other:

Code: □ R-script □ Python □ Other:

Authors

Tian Yu, Scott Crossley, & Luuk Van Waes (2025)

Keywords

Corpus, Keystroke logging, Writing quality

Materials

The Keystroke Logging in Compositions for Knowledge Evaluation (KLiCKe) corpus is a large-scale dataset featuring detailed keystroke logs from ~5,000 argumentative essays written by adult English writers in the United States. KLiCKe corpus was collected from January through November 2022 through crowdsourcing. The corpus records each keystroke and mouse operation, along with corresponding time stamps and cursor position information, using a web-based keystroke logging program. Data are provided in .csv format (for custom analysis) and .idfx format (compatible with Inputlog). Holistic writing quality scores for all essays are included, derived from double-blind ratings by trained human raters using a standardized grading scale. Additionally, the corpus offers demographic details on the writers, including age, gender, native language, ethnicity/race, education level, as well as typing skills and vocabulary knowledge. As a publicly available resource, KLiCKe bridges gaps in process-oriented writing research and offers new possibilities for advancing writing assessment and instruction.

The following materials are made available:

Approximately 5,000 persuasive texts
Keystroke logs of the writing process for the ~ 5,000 essays, presented in both CSV and IDFX formats
The holistic writing quality scores for the ~ 5,000 essays
Demographic information for the essay writers
The results from the Inputlog copy tasks (Van Waes et al., 2019), presented in keystroke logs in both CSV and IDFX formats

The results from the Lexical Test for Advanced Learners of English (LexTALE; Lemhöfer & Broersma, 2012), presented in CSV format.

Use of files

These files can be downloaded and used freely. However, if you wish to use them in your research project, please reference this repository as an original: source: https://github.com/terryyutian/KLiCKe-Corpus.

Publications

Tian, Y., Crossley, S. A., & Van Waes, L. (2025 – accepted for publication). The KLiCKe Corpus: Keystroke Logging in Compositions for Knowledge Evaluation. Journal of Writing Research. https://www.jowr.org/jowr/article/view/1556

Access

https://github.com/terryyutian/KLiCKe-Corpus

Contact

θ Yu Tian ytian126@asu.edu.

Tian, Y., Kim, M., & Crossley, S. (2024). Making sense of L2 written argumentation with keystroke logging

Data: ⌧ Inputlog □ Scriptlog □ Other:

Code: ⌧ R-script □ Python □ Other:

Authors

Yu Tian, Minkyung Kim & Scott Crossley (2024)

Keywords

L2 written argumentation, Keystroke logging, Cognitive activities

Materials

In this study, the authors examined associations between L2 writing behaviors manifested by keystroke analytics and the formulation of argument elements in the writing process.

The study used a dataset that contains 99 persuasive texts produced by L2 undergraduate students at a U.S. university. It also includes the writing process information for these texts which was recorded using Inputlog 7 (Leijten & Van Waes, 2015). The persuasive texts were annotated using an argumentative rubric for classifying discourse elements found in Crossley et al. (2022). The rubric comprises five categories as the building blocks of the argumentation framework: final claim, primary claim, counterclaim, rebuttal, and data.

The following materials are made available:
• 99 persuasive texts
• keystroke logs of the writing process
• argument element annotation results
• keystroke measures for each argument element in the texts
• demographic information for the L2 writers
• R script used to calculate inter-annotator agreement (IAA) for the dataset
• R script used to build the MCMCglmm model for the statistical analysis

Use of files

These files can be downloaded and used freely. However, if you wish to use them in your research project please reference the this repository as an original source: https://github.com/terryyutian/Argumentation-Keystroke

Publications

Tian, Y., Kim, M., & Crossley, S. (2024). Making sense of L2 written argumentation with keystroke logging. Journal of Writing Research, 15(3), 435-461. https://doi.org/10.17239/jowr-2024.15.03.01

Access

https://doi.org/10.17239/jowr-2024.15.03.01
https://github.com/terryyutian/Argumentation-Keystroke

Contact

θ Yu Tian tian.yu.research@gmail.com

Evgeny Chukharev-Hudilainen et al. (2023). CyWrite: A web-based word processor with built-in keystroke logging and eye-tracking

Data: □ Inputlog □ Scriptlog ⌧ Other: CyWrite

Code: ⌧ R-script □ Python ⌧ Other: JavaScript, HTML, CSS

Authors

Evgeny Chukharev-Hudilainen, Hui-Hsien Feng, Mark Torrance, Aysel Saricaoglu, Emily Dux Speltz, & Jens Roeser (2013–2024)

Keywords

keystroke logging, eye tracking, writing process intervention, design-based research

Materials

In this project, we developed a web-based word processor that has built-in keystroke logging and eye-tracking capabilities. We have used this system, called “CyWrite”, for various applications ranging from automated writing evaluation to real-time writing-process feedback provision. The system includes built-in capabilities for visually replaying log files, and for exporting them in a machine-readable format.

In the GitHub repository, the authors have uploaded the JavaScript, HTML, and CSS code for operating the CyWrite word processor with embedded keystroke logging and eye-tracking.

In the OSF repository, the authors have uploaded R scripts for calculating behavioral writing-process measures from keystroke and eye movement logs generated by the CyWrite system.

Use of files

These files can be downloaded and used freely. However, if you wish to use them in your research project (whether you use them in their current state or if you edit them), please reference the OSF repository and/or the GitHub repository as an original source: doi:10.17605/OSF.IO/R53H2 and/or https://github.com/chukharev/cywrite. Please cite relevant publications from the list below.

Publications

Dux Speltz, E., & Chukharev-Hudilainen, E. (2021). The effect of automated fluency-focused feedback on text production. Journal of Writing Research, 13(2), 231–255. https://doi.org/10.17239/jowr-2021.13.02.02 (open access)

Chukharev-Hudilainen, E., Saricaoglu, A., Torrance, M., & Feng, H.-H. (2019). Combined deployable keystroke logging and eyetracking for investigating L2 writing fluency. Studies in Second Language Acquisition, 41(3), 583-604. https://doi.org/10.1017/S027226311900007X (open access)

Chukharev-Hudilainen, E. (2019). Empowering automated writing evaluation with keystroke logging. In Lindgren, E., & Sullivan, K. P. H. (Eds.) Observing writing: insights from keystroke logging and handwriting. Studies in Writing (pp. 125-142). Leiden, Netherlands: Brill Publishing.

Access

doi:10.17605/OSF.IO/X9B42

https://github.com/chukharev/cywrite

Contact

Evgeny Chukharev-Hudilainen: evgeny@iastate.edu

Hall, Baaijen & Galbraith (2022). Constructing Theoretically Informed Measures of Pause Duration in Experimentally Manipulated Writing

Data: x Inputlog o Scriptlog o Other:
Code: x R-script o Python x Other: VBA (Excel macro)

Authors
Sophie Hall, Veerle Baaijen & David Galbraith (2022)

Keywords
pause analysis, pause location, mixture modeling

Materials
This study is designed to demonstrate how to (i) isolate relevant transitions within a text and calculate their durations, and (ii) how to use mixture modelling to identify structure within the distributions of pauses at different locations.

In the OSF-repository, the authors have uploaded:

Excel VBA keystroke macros: 17 macro scripts (VBA) files to add extra pause coding to Inputlog keystroke data. These scripts use Inputlog general analysis files (XML-files) converted into xlsx format (Microsoft Office Excel 2016). There are also step-by-step instructions on how to prepare and run the macros on the xml/xlsx files.
R-script: an example of an R Markdown document, which demonstrates how mixture models have been conducted on Inputlog keystroke data that has been processed through the VBA macros (and subsequently saved in CSV file format). More specifically, the example in this document looks at a 3-component mixture model for the linear between-word pause times.
Calculation framework: The framework that the authors used to conceptualise and identify several types of pauses based on their associated keystrokes: linear within-word, linear between-word, linear between-subsentence, linear between-sentence and linear between-paragraph.

Use of files
These files can be downloaded and used freely. However, if you wish to use them in your research project (whether you use them in their current state or if you edit them), please reference the OSF repository as an original source: doi:10.17605/OSF.IO/R53H2 and the associated paper (see below).

Publication
Hall, S., Baaijen, V. M., & Galbraith, D. (2022). Constructing theoretically informed measures of pause duration in experimentally manipulated writing. Reading and Writing, 1-29.
https://doi.org/10.1007/s11145-022-10284-4

Access
doi:10.17605/OSF.IO/R53H2

Contact
θ Sophie Hall: s.m.hall@soton.ac.uk

Van Waes, Vandersmissen, Rossetti & Leijten (2021). Inputlog Copy Task Corpus: Exploring and defining typing skills

Data: x Inputlog o Scriptlog o Other:
Code: x R-script o Python o Other:

Authors
Luuk Van Waes, Benjamin Vandersmissen, Alessandra Rossetti & Mariëlle Leijten (2021)

Keywords
copy task, typing skill, multilingual, dashboard

Materials
The Inputlog copy task allows researchers to investigate different levels of lexicality in more detail. At the moment the copy task has been developed in twelve different languages. The software is open-access and allows researchers to adapt the tasks to their specific needs.

In the Zenodo-repository, the authors have uploaded a 5k corpus of copy task analyses in different languages. A dynamic dashboard application (R-Shiny) enables researchers to explore and filter the corpus. Moreover, also self-collected data can be uploaded and included in the exploration.

Use of files
These files can be downloaded and used freely. However, if you wish to use them in your research project please reference the Zenodo repository as an original source: DOI: 10.5281/zenodo.5803400 and/or https://inputlog-analysis.uantwerpen.be/expert

Publications

Van Waes, L., Leijten, M., Pauwaert, T., & Van Horenbeeck, E. (2019). A multilingual copy task: Measuring typing and motor skills in writing with Inputlog. Journal of Open Research Software, 7(1:30), 1-8. https://doi.org/10.5334/jors.234 (open access)
Van Waes, L., Leijten, M., Roeser, J., Olive, T., & Grabowski, J. (2021). Designing a Copy Task to Measure and Assess Typing Skills in Writing Research. Journal of Writing Research, 13(1), 107-153
https://doi.org/10.17239/jowr-2021.13.01.04 (open access)
Van Waes, L, Leijten, M, Mariën, P., & Engelborghs, S. (2017). Typing competencies in Alzheimer’s disease: An exploration of copy tasks. Computers in Human Behavior, 73, 311– 319.
https://doi.org/10.1016/j.chb.2017.03.050 (open access)

Access
https://zenodo.org/record/5803401#.Y7PwAXbMKUk
DOI: 10.5281/zenodo.5803400

Contact
θ Luuk van Waes: luuk.vanwaes@uantwerpen.be

Cislaru & Olive (2018-2024). ANR Pro-TEXT: Processes of Textualization: Linguistic, Psycholinguistic, and Machine Learning Modeling

Data: x Inputlog x Scriptlog o Other:
Code: o R-script X Python o Other:

Authors
Georgeta Cislaru & Thierry Olive (2019-2024)

Keywords
writing bursts, linguistic analysis, pause analysis

Materials
This interdisciplinary research develops a comprehensive analysis of the textualization process, i.e. the real-time progressive construction of a text. We study bursts of writing, which are textual segments produced between two pauses, in order to provide insight into the relation between regularities of language performance and the cognitive and contextual constraints. The aim is to understand some of the layout mechanisms that allow language to give rise to novelty out of known and prefabricated data. The Pro-TEXT project develops linguistic and psycholinguistic methods and machine-learning tools to model these regularities and provide evidence about patterns of text processing. Machine-learning incremental approaches fills a gap in the analysis and representation of real-time language performance, while revealing regularities that remain unremarked under the methodologies used previously.

Corpus
Linguistically annotated data will be available at: https://pro-text.huma-num.fr/ressources/

Corpus	Words	Texts	Writers	Genres	Writing expertise
Academic	70464	26	MA students	Mini-theses in Linguistics	Semi-experts
Professional	34504	10	Social workers	Social reports on child protection	Experts
Experimental	63533	165	BA students	Essays on different subjects	Experts
Children	20306	183	Pupils (3^rd-6th grades)	Narrative texts & essays	Beginners
Translation	13682	38	BA students	EN-FR translation of medical texts & original texts produced in FR	Semi-experts
Total	202489	422	–	6 types of texts, 3 experimental situations	–

Academic Subcorpus This subcorpus contains mini theses written by MA students as part of a course in discourse analysis. The texts were written over several writing sessions on students’ computers. Since this type of writing task was novel to the participants, they were evaluated as semi-experts. The students involved in data collection are native or near-native speakers of French. There are 26 different authors in the subcorpus.

Professional Reports on child protection Subcorpus The reports were written by social workers as part of their regular tasks over several sessions. Each text has at least two authors. Since the participants wrote these types of texts routinely, they were evaluated as experts. There are 9 different authors, and they are all native speakers of French.

Experimental Subcorpus These texts were produced as part of three psycholinguistic experiments on the writing process. In each experiment, processing difficulty of one of the main writing components (planning, formulating or revising) was manipulated. The texts produced in these experiments were written by BA students and were essays on different social topics, such as smoking at the university and public transportation. Each text was written in a single session. Since this type of writing task is common in the French educational system, the authors were evaluated as experts. The information about the experimental setting and experimental vs control setting is available for each text. There are 83 authors in this subcorpus, and they are all native or near-native speakers of French.

Children Subcorpus The texts in this part of the corpus were written by schoolchildren from three age groups: 3rd year of primary school (ca. 8 years old), 5th year of primary school (ca. 10 years old), and 1st year of secondary school (ca. 11 years old). Each participant wrote a narrative text and an essay on a given subject. The texts were recorded at school in one writing session. The information about the age group, the type of text, and the order of the production of the two texts is available for each text. There are 92 authors in total, and they are considered to hold a language proficiency level corresponding to their grade.

Partial download and overview available: http://syled.univ-paris3.fr/protext/PLAY-TEXTE/CORPUS-ENFANTS-Poitiers/index-inputlog.html

The code of each text includes metadata (For example, for P22C6N1, P22=identification number, C6=6^th grade, N=narrative). When users of the interface click on [Voir] in the left column, then on [Text final] in the central screen, the final text and event segmentations are displayed.

Translation Subcorpus
This subcorpus was written by BA students of translation studies. Each participant produced two types of text: an original text in French describing an image, and a translation of a medical text from English to French. Information about the author and the type of text is available for each text in the subcorpus. Given the type of the task and the fact that the text had to be produced in a highly specialized discourse genre, the students were evaluated as semi-experts. There are 19 authors in total in this subcorpus, and they have native or near-native proficiencly level in French.

Partial download and overview available: http://syled.univ-paris3.fr/protext/PLAY-TEXTE/corpus-Traductions/index-inputlog.html

The code of each text includes metadata containing the identification number. When users of the interface click on [Voir] in the left column, then on [Text final] in the central screen, the final text and event segmentations are displayed.

Use of files
The scripts and data will soon be available for download and free use.

Selected publications

Cislaru, G., & Olive, T. (2018). Le processus de textualisation: Analyse des unités linguistiques de performance écrite. Louvain-la-Neuve: De Boeck. https://www.cairn.info/le-processus-de-textualisation–9782807314832.htm
Cislaru, G., & Olive, Th. (2019). Dynamiques d’amorçage au cours du processus de textualisation dans l’écriture enregistrée. In M.-J. Béguelin, G. Corminboeuf, & F. Lefeuvre (Eds.), Types d’unités et procédures de segmentation (pp. 149-162). , Limoges: Lambert Lucas. ISBN/EAN 978-2-35935-287-0
Cislaru, G., Olive, Th. (2021). Que peut nous apprendre l’écriture enregistrée en temps réel au sujet des figures de construction ? L’Information grammaticale 169, 21-29. https://hal.science/hal-03351391
Feltgen, Q., Cislaru, G., Benzitoun, C. (2022). Etude linguistique et statistique des unités de performance écrite : le cas de et. In SHS Web of Conferences, 138, 1-17. https://www.shs-conferences.org/articles/shsconf/pdf/2022/08/shsconf_cmlf2022_10001.pdf
Miletic Haddad, A., Benzitoun, C., Cislaru, G., Herrera-Yanez, S. (2022). Pro-TEXT: An annotated corpus of key-stroke logs. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 1732-1739). https://aclanthology.org/2022.lrec-1.184.pdf

Access
https://pro-text.huma-num.fr/

Contact
θ Georgeta Cislaru: georgeta.cislaru@sorbonne-nouvelle.fr

Roeser, De Maeyer, Leijten & Van Waes (2022). Fitting a mixture model on copy-task data

Data: x Inputlog o Scriptlog o Other:
Code: x R-script o Python o Other:

Authors
Jens Roeser, Sven De Maeyer, Mariëlle Leijten & Luuk Van Waes (2022)

Keywords
copy task, keystroke modelling, autoregression, mixture models, Bayesian statistical models, typing skills

Materials
The folowing materials are provided:

R-Walk-through: In the RPubs-repository (R-studio), the authors have uploaded an enterily self-contained R-Walk-through. This is a step-by-step instruction that describes how to fit a finite mixture model of two log-Normal distributions using the statistical program R in combination with the Stan-component (manually download from the OSF / github repository). The guide shows how to fit a mixture model to copy-task data to then calculate the differences between two Inputlog copy-task components. Data and code can be loaded from the repository.
R-code for visuals: In order to visualize the analyses, the ‘tidyverse’ component is used. Code for data wrangling is provided as well as code that illustrates how to work with Bayesian posterior samples for statistical inference.
R-script: A mixture model analysis of typing disfluencies (Stan and R code) demonstrated on Inputlog copy-task data.

https://github.com/jensroes/Typing-disfluency

Use of files
The scripts can be downloaded and used freely. However, if you wish to use them in your research project please reference the RPubs-or GitHub repository as an original source: https://rpubs.com/jensroes/765467 and/or https://doi.org/10.1007/s11145-021-10203-z

Publications

Roeser, J., De Maeyer, S., Leijten, M., & Van Waes, L. (2021). Modelling typing disfluencies as finite mixture process. Reading and Writing. https://doi.org/10.1007/s11145-021-10203-z
Van Waes, L., Leijten, M., Pauwaert, T., & Van Horenbeeck, E. (2019). A multilingual copy task: Measuring typing and motor skills in writing with Inputlog. Journal of Open Research Software, 7(1:30), 1-8. https://doi.org/10.5334/jors.234 (open access)
Van Waes, L., Leijten, M., Roeser, J., Olive, T., & Grabowski, J. (2021). Designing a Copy Task to Measure and Assess Typing Skills in Writing Research. Journal of Writing research, 13(1), 107-153
https://doi.org/10.17239/jowr-2021.13.01.04 (open access)

Access
https://rpubs.com/jensroes/765467
https://osf.io/y3p4d/

Contact
θ Jens Roeser: jens.roeser@ntu.ac.uk

Rossetti & Van Waes (2022). Text simplification in second language: process and product data

Data: x Inputlog o Scriptlog o Other:
Code: o R-script o Python o Other:

Authors
Alessandra Rossetti & Luuk Van Waes (2022)

Keywords
corporate social responsibility, text revision, text simplification, second-language writing, plain language training, cognitive effort, text analysis, keystroke logging

Materials
Data are made made available taken from an experimental study with second-language university students. We adopted a pre-test and post-test design, and randomly divided participants into experimental and control group. In the pre-test, participants were given an extract of a corporate report dealing with sustainability and were asked to revise it to make it easier to read for a lay customer. Subsequently, they took part in training. The experimental group received training on both plain language and sustainability, while the control group received training exclusively on the topic of sustainability. In the post-test session, all participants were assigned a second extract of a corporate report dealing with sustainability, and were asked again to make it easier to read for a lay customer by applying what they had learned from their respective training.

Two main types of data were made available:

keystroke logging data via Inputlog, available as IDFX files (process data). We analyzed the keystroke logging data using InputLog analyses: https://www.inputlog.net/
the texts simplified by the students, available as Microsoft Word documents (product data). We analyzed these texts in terms of readability using Coh-Metrix: http://cohmetrix.com/

Use of files
The corpora can be downloaded and used freely. However, if you wish to use them in your research project please reference the Zenodo-repository as an original source: https://zenodo.org/record/6720290#.YrWL3XZByUl

Publications

Rossetti, A., & Van Waes, L. (2022). Revision of business content on corporate social responsibility: Measuring the impact of training on the cognitive effort of second-language university students. Hermes – Journal of Language and Communication in Business, 62, 27-54. PDF | https://doi.org/10.7146/hjlcb.vi62.132262
Rossetti, A., & Van Waes, L. (2022). It’s not just a phase: Investigating text simplification in a second language from a process and product perspective, Frontiers in Artificial Intelligence, 5:983008 | https://doi.org/10.3389/frai.2022.983008
Rossetti, A., & Van Waes, L. (2022). Accessible communication of CSR: Development and preliminary evaluation of an online module. Business and Professional Communication Quarterly, 85(1), 52-79 | https://doi.org/10.1177/23294906221074324

Access
https://zenodo.org/record/6720290#.YrWL3XZByUl

Contact
θ Alessandra Rossetti: Alessandra.Rossetti@vub.be

Meulemans, Leijten, Van Waes, Engelborghs & De Maeyer (2022). CSV files and R script : Writing process data of typed picture description by 15 cognitively impaired patients and 15 healthy controls

Data: o Inputlog x Scriptlog o Other:
Code: x R-script o Python o Other:

Authors
Catherine Meulemans, Mariëlle Leijten, Luuk Van Waes, Sebastiaan Engelborghs & Sven De Maeyer (2022)

Keywords
writing processes, word categories, keystroke logging, Alzheimer’s disease, dementia, mild cognitive impairment

Materials
Writing process data of 15 cognitively impaired patients and 15 age- and gender-matched healthy controls were obtained. Each of them completed two typed picture description tasks that were logged with Inputlog, a keystroke logging tool. Variables included time on task; number of characters, pauses, and Pause-bursts per minute; proportion of pause time; duration of Pause-bursts; and pause time between words. The effect of pauses preceding specific word categories was also analyzed for pause time between words.
The data were used to explore if the observation of writing behavior can assist in the screening and follow-up of mild cognitive impairment (MCI) and mild dementia due to Alzheimer’s disease (AD).

Two main types of data were collected:
1. CSV files that were used for the analyses, and
2. R-scripts.

Use of files
The scripts and data can be downloaded and used freely. However, if you wish to use them in your research project please reference the Zenodo-repository as an original source: https://zenodo.org/record/6720290#.YrWL3XZByUl or https://doi.org/10.5281/zenodo.5942516

Publications
§ Meulemans, C., Leijten, M, Van Waes L, Engelborghs S., & De Maeyer, S. (2022) Cognitive writing process characteristics in Alzheimer’s Disease, Frontiers in Psychology, 13 (872280) | https://doi.org/10.1007/10.3389/fpsyg.2022.872280

Access
https://zenodo.org/record/6720290#.YrWL3XZByUl

Contact
θ Mariëlle Leijten: marielle.leijten@uantwerpen.be

Van Waes , Leijten, Pauwaert & Van Horenbeeck (2019). A Multilingual Copy Task: Measuring Typing and Motor Skills in Writing with Inputlog

Data: x Inputlog o Scriptlog o Other:
Code: o R-script o Python x Other:Javascript

Authors
Luuk Van Waes, Mariëlle Leijten, Tom Pauwaert & Eric Van Horenbeeck (2019)

Keywords
copy tasks, typing skills, motor coordination, keyboarding, writing studies, interkey intervals, bigram frequency, tapping task, typing fluency

Materials
A strictly controlled copy task was developed guiding participants through seven modules in which different prompts are presented, each dealing with complementary levels of lexicality. Fine-grained keystroke logging allows for a range of analyses (www.inputlog.net).

The copy task can be used in – and together with – all types of writing process studies. At the moment the copy task has been developed in ten different languages. The software is open-access and allows researchers to adapt the tasks to their specific needs.

Two main types of materials were collected:

Copy task builder: This tool allows the researcher to adapt and/or create a copy task and adapt the task flow, the modules, and the instruction.
Copy task Javascript: The modular concept of the javascript provided allows the development of new components creating other copy task functions such as audio-based or graphically prompted copying. The task instructions could also be enhanced with video guidelines.

Use of files
The program code is open access and can be downloaded and used freely. However, if you wish to use them in your research project please reference the GitHub-repository as an original source: https://github.com/lvanwaes/Inputlog-Copy-Task or/and DOI: https://doi.org/10.5281/zenodo.2908966

Publications
Van Waes, L., Leijten, M., Pauwaert, T., & Van Horenbeeck, E. (2019). A multilingual copy task: Measuring typing and motor skills in Writing with Inputlog. Journal of Open Research Software, 7(1:30), 1-8. https://doi.org/10.5334/jors.234

Access
https://doi.org/10.5281/zenodo.2908966
https://github.com/lvanwaes/Inputlog-Copy-Task

Contact
θ Luuk van Waes: luuk.vanwaes@uantwerpen.be

Vandermeulen, Leijten & Van Waes (2020). Reporting Writing Process Feedback in the Classroom: Using Keystroke Logging Data to Reflect on Writing Processes

Data: x Inputlog o Scriptlog o Other:
Code: o R-script o Python o Other:Javascript

Authors
Nina Vandermeulen, Mariëlle Leijten & Luuk Van Waes (2020)

Keywords
keystroke logging, process feedback, self-assessment, writing from sources, writing processes

Materials
Inputlog facilitates writing tutors in providing process feedback to their students. Based on an XML- logfile, the so-called ‘report’ function automatically generates a pdf-file addressing different perspectives of the writing process: pausing, revision, source use, and fluency. These perspectives are reported either quantitatively or visually. Brief introductory texts explain the information presented. Inputlog provides a default feedback report, but users can also customize the report.

An intervention study demonstrates the effect of these process reports in a classroom setting.

The following type of materials are made available:

Inputlog based process profiles: Descriptions of 5 benchmark process profiles ….
Inputlog data: …

Use of files
The benchmark corpus (Dutch) is open access and can be downloaded and used freely. However, if you wish to use them in your research project please reference this repository as an original source:
https://liftwritingresearch.wordpress.com/benchmark-processen/

Publications

Vandermeulen, N., Leijten, M., & Van Waes, L. (2020). Reporting Writing Process Feedback in the Classroom: Using Keystroke Logging Data to Reflect on Writing Processes. Journal of Writing Research, 12 (1), 109-140. DOI: 10.17239/jowr-2020.12.01.05
Vandermeulen, N., Van Steendam, E., De Maeyer, S., & Rijlaarsdam, G. (2023). Writing process feedback based on keystroke logging and comparison with exemplars: Effects on the quality and process of synthesis texts. Written Communication, 40(1), 90-144. https://doi.org/10.1177/07410883221127998

Access
https://liftwritingresearch.wordpress.com/benchmark-processen/

Contact
θ Nina Vandermeulen: nina.vandermeulen@uantwerpen.be

Mahlow, Ulasik & Tuggener (2022). Text History Extraction Tool (THEtool) A tool for Linguistic Modeling of Written Text Production

Data: x Inputlog o Scriptlog o Other:
Code: o R-script x Python o Other:

Authors
Cerstin Mahlow, Malgorzata Anna Ulasik & Don Tuggener (2022)

Keywords
Writing process, keystroke-logging, transforming sequence, text history, sentence history, written text production, linguistic modeling

Materials
The study presents an approach for the analysis of writing processes with a focus on linguistic structures applying natural language processing (NLP) tools. It is based on the novel concepts of transforming sequences, text history, and sentence history. The transforming sequence is used to store differences between text versions on the surface and record the editing operations involved. The text and sentence histories allow for reproducing and visualizing the genesis and history of a text and its individual sentences. The main focus of the approach is the constant linking of the process and the product.

THEtool uses two main modes to capture text versions from idfx-files (Inpulog logfiles):

the Pause Capturing Mode (PCM), which relies on a preset pause duration in the text production to yield versions,
and the Edit Capturing Mode (ECM), which uses a change production mode to determine versions. A change in production mode is defined as switching between one of the modes (a) writing at the edge of the text, (b) deleting something, (c) inserting something.

The following materials are made available:

Inputlog data: The input file processed by the tool is an idfx file in XML format
Python script: Open-source application for parsing raw keystroke logging data from a writing session, processing it to retrieve all relevant text versions produced during this session, and eventually generating text and sentence histories based on the collected information.

Use of files
The program code and the corpus can be downloaded and used freely. However, if you wish to use them in your research project please reference the GitHub-repository as an original source: https://github.com/mulasik/wta

Publications
Mahlow, C., Ulasik, M.A. & Tuggener, D. (2022). Extraction of transforming sequences and sentence histories from writing process data: a first step towards linguistic modeling of writing. Reading and Writing (2022). https://doi.org/10.1007/s11145-021-10234-6

Access
https://github.com/mulasik/wta

Contact
θ Cerstin Mahlow: cerstin@mahlow.ch

Buschenhenke, Conijn & Van Waes (2022). Measuring non-linearity of long-term writing processes

Data: x Inputlog o Scriptlog o Other:
Code: x R-script o Python o Other:

Authors
Floor Buschenhenke, Rianne Conijn & Luuk van Waes (2022)

Keywords
non-linearity, writing process, keystroke logging, multi-session writing, point of utterance, writing dynamics

Materials
Linearity metrics are commonly calculated based on the leading edge and are mostly used for short texts and single writing sessions. However, especially for longer, multi-session writing processes, text can often be created at various spaces, not necessarily including the leading edge. Accordingly, the leading edge is not enough to distinguish between linear production and non-linear text alterations.

Therefore, the current study proposes a novel, more flexible, automatized non-linearity analysis, which does not solely rely on the leading edge. In this approach, all backwards and forwards cursor and mouse operations from the point of utterance are extracted from keystroke data, and characterized both based on duration and distance. This results in a detailed list of characteristics per writing episode, allowing us to compare and group episodes of writing at various scales.

The non-linearity analysis can be used to find shifts in non-linearity over time. Moreover, the analysis allows researchers to chart interactions with the text-produced-so-far, for instance, revealing session management strategies in multi-session writing.

Two main types of materials were collected:

R-script I: non-linearity analysis in R (https://github.com/FloorBuschenhenke/NonLinearityMethod)
R-Script II: Dynamic non-linearity visualization using R-Shiny (https://trackchanges.shinyapps.io/Shinyprocessgraph/)

Use of files
The R-scripts can be downloaded and used freely. However, if you wish to use them in your research project please reference the original source:
https://github.com/FloorBuschenhenke/NonLinearityMethod
https://trackchanges.shinyapps.io/Shinyprocessgraph/

Publication
Buschenhenke, F., Conijn, R., & Van Waes (under review). Measuring non-linearity of long-term writing processes.

Access
https://github.com/FloorBuschenhenke/NonLinearityMethod

Contact
θ Floor Buschenhenke: floor.buschenhenke@huygens.knaw.nl or woordheks@gmail.com

Muñoz Martín & Apfelthaler (2022). The Task Segment Framework A tool to study source-based tasks at the keyboard

Data: x Inputlog o Scriptlog x Other: FACIL
Code: o R-script o Python o Other:

Authors

Ricardo Muñoz Martín & Matthias Apfelthaler (2022)

Keywords

Source-based tasks, multilectal communication, keystroke-logging, behavioral fluency, efficiency, efficacy.

Materials

The study presents an analytical framework to study writing processes in source-based tasks, which has so far been tested on translation and writing. Basic concepts in the framework are behavioral fluency and the Minimax principle (maximum effect with minimal effort). IKIs are classed as either willful or involuntary, the latter further divided into potentially relevant to the task and mechanic. Wilful IKIs, or pauses, are used to mark behavioral units o task segments, with or without text (with text, they are bursts). According to their contents, task segments are classed as ADD, CHANGE, SEARCH and MIXED (quite self explanatory, and hypothesized to have or mix mutually excluding behavioral repertoires) plus FILLERS (apparently purposeless, isolated behaviors) and HCI (behaviors at the keyboard apparently unrelated or weakly related to the task). The main focus of the Task Segment Framework (TSF) is metacognitive control and cognitive resource management.

THE TSF is work in progress. As of 2023, it uses a baseline of 200 ms (shorter IKIS, or lags, are ignored) and two subject- and session-bound thresholds to separate remaining IKIs into three kinds: a lower threshold separates apparently mechanical IKIs, or delays, from those potentially relevant to the task, or respites. An upper threshold separates respites from the longest IKIs, or (willful) pauses. The baseline and the thresholds are calculated with the pause analysis of idfx files: the lower one is 2 × median IKI within word and the upper threshold is 3 × median IKI between words.

In order to ease the study of the progress through the tasks, the TSF includes as S-notation system using symbols and color codes and a vertical representation per task segment to facilitate the study of logs. Together with Eros Zanchetti, from Unibo, we have developed a small application to convert the output of Inputlog’s general analisis.

The following materials are made available:

Inputlog data: The input files are idfx files in XML format
FACIL: An application for parsing raw keystroke logging data from a task session, processing it to chunk the record into task segments and recode the data.

Use of files

The program FACIL and a corpus of several small research projects, all of them on translating or containing translation as one of the tasks, can be downloaded and used freely.

Publications

Muñoz Martín, Ricardo & Matthias Apfelthaler. A Task Segment Framework to study keylogged translation processes. Translation & Interpreting 14: 8–31.
Muñoz Martín, Ricardo & Matthias Apfelthaler. Spillover effects in task-segment switching. A study of translation subtasks as behavioral categories within the Task Segment Framework. In Muñoz Martín, Ricardo, Sun Sanjun & Li Defeng, eds. Advances in Cognitive Translation Studies, 19–45. Singapur: Springer.
Muñoz Martín, Ricardo & Celia Martín de Leon. Fascinatin’ rhythm – and pauses in translators’ cognitive processes. Hermes 57: 29–47.
Muñoz Martín, Ricardo & José María Cardona Guerra. Translating in fits and starts: pause thresholds and roles in the research of translation processes. Perspectives 27: 525–551.
Puerini, Sara.Typing your mind away. Comparing keylogged tasks with the Task Segment Framework. M.A. thesis, Università di Bologna. https://amslaurea.unibo.it/22899/

Access

*

Contact

8 Ricardo Muñoz: ricardo.munoz@unibo.it

Reference
If you publish or present a paper in which Inputlog has been used, please refer to the following article:

Leijten, M., & Van Waes, L. (2013). Keystroke Logging in Writing Research: Using Inputlog to Analyze Writing Processes. Written Communication 30(3), 358-392
DOI: 10.1177/0741088313491692
PDF

Creative Commons
Inputlog is published under the following Creative Commons licence:
Attribution-NonCommercial-NoDerivatives | 4.0 International (CC BY-NC-ND 4.0)

Community

Building a libraryBuilding a community

Tian, Y., Crossley, S. A., & Van Waes, L. (2025). The KLiCKe Corpus: Keystroke Logging in Compositions for Knowledge Evaluation.

Tian, Y., Kim, M., & Crossley, S. (2024). Making sense of L2 written argumentation with keystroke logging

Evgeny Chukharev-Hudilainen et al. (2023). CyWrite: A web-based word processor with built-in keystroke logging and eye-tracking

Hall, Baaijen & Galbraith (2022). Constructing Theoretically Informed Measures of Pause Duration in Experimentally Manipulated Writing

Van Waes, Vandersmissen, Rossetti & Leijten (2021). Inputlog Copy Task Corpus: Exploring and defining typing skills

Cislaru & Olive (2018-2024). ANR Pro-TEXT: Processes of Textualization: Linguistic, Psycholinguistic, and Machine Learning Modeling

Roeser, De Maeyer, Leijten & Van Waes (2022). Fitting a mixture model on copy-task data

Rossetti & Van Waes (2022). Text simplification in second language: process and product data

Meulemans, Leijten, Van Waes, Engelborghs & De Maeyer (2022). CSV files and R script : Writing process data of typed picture description by 15 cognitively impaired patients and 15 healthy controls

Van Waes , Leijten, Pauwaert & Van Horenbeeck (2019). A Multilingual Copy Task: Measuring Typing and Motor Skills in Writing with Inputlog

Vandermeulen, Leijten & Van Waes (2020). Reporting Writing Process Feedback in the Classroom: Using Keystroke Logging Data to Reflect on Writing Processes

Mahlow, Ulasik & Tuggener (2022). Text History Extraction Tool (THEtool) A tool for Linguistic Modeling of Written Text Production

Buschenhenke, Conijn & Van Waes (2022). Measuring non-linearity of long-term writing processes

Muñoz Martín & Apfelthaler (2022). The Task Segment Framework A tool to study source-based tasks at the keyboard

Data: x Inputlog o Scriptlog x Other: FACILCode: o R-script o Python o Other:

Materials

Use of files

Publications

Access

*

Contact

Building a library
Building a community

Data: x Inputlog o Scriptlog x Other: FACIL
Code: o R-script o Python o Other: