Building a library
Building a community
Aim
The aim of this community page is to bring together researchers and to build a library containing short descriptions of keystroke logging data collections and/or scripts (e.g., R or Python). We briefly describe these materials and refer to the repositories researchers used to share the materials (e.g., Zenodo, OSF, Github, etc.).
In the next stage, discussion fora and other knowledge bases will be added to further enforce the keystroke logging community in writing and translation studies.
Don’t hesitate to contact us!
Evgeny Chukharev-Hudilainen et al. (2023). CyWrite: A web-based word processor with built-in keystroke logging and eye-tracking
Data: □ Inputlog □ Scriptlog ⌧ Other: CyWrite
Code: ⌧ R-script □ Python ⌧ Other: JavaScript, HTML, CSS
Authors
Evgeny Chukharev-Hudilainen, Hui-Hsien Feng, Mark Torrance, Aysel Saricaoglu, Emily Dux Speltz, & Jens Roeser (2013–2024)
Keywords
keystroke logging, eye tracking, writing process intervention, design-based research
Materials
In this project, we developed a web-based word processor that has built-in keystroke logging and eye-tracking capabilities. We have used this system, called “CyWrite”, for various applications ranging from automated writing evaluation to real-time writing-process feedback provision. The system includes built-in capabilities for visually replaying log files, and for exporting them in a machine-readable format.
In the GitHub repository, the authors have uploaded the JavaScript, HTML, and CSS code for operating the CyWrite word processor with embedded keystroke logging and eye-tracking.
In the OSF repository, the authors have uploaded R scripts for calculating behavioral writing-process measures from keystroke and eye movement logs generated by the CyWrite system.
Use of files
These files can be downloaded and used freely. However, if you wish to use them in your research project (whether you use them in their current state or if you edit them), please reference the OSF repository and/or the GitHub repository as an original source: doi:10.17605/OSF.IO/R53H2 and/or https://github.com/chukharev/cywrite. Please cite relevant publications from the list below.
Publications
Dux Speltz, E., & Chukharev-Hudilainen, E. (2021). The effect of automated fluency-focused feedback on text production. Journal of Writing Research, 13(2), 231–255. https://doi.org/10.17239/jowr-2021.13.02.02 (open access)
Chukharev-Hudilainen, E., Saricaoglu, A., Torrance, M., & Feng, H.-H. (2019). Combined deployable keystroke logging and eyetracking for investigating L2 writing fluency. Studies in Second Language Acquisition, 41(3), 583-604. https://doi.org/10.1017/S027226311900007X (open access)
Chukharev-Hudilainen, E. (2019). Empowering automated writing evaluation with keystroke logging. In Lindgren, E., & Sullivan, K. P. H. (Eds.) Observing writing: insights from keystroke logging and handwriting. Studies in Writing (pp. 125-142). Leiden, Netherlands: Brill Publishing.
Access
https://github.com/chukharev/cywrite
Contact
Evgeny Chukharev-Hudilainen: evgeny@iastate.edu
Hall, Baaijen & Galbraith (2022). Constructing Theoretically Informed Measures of Pause Duration in Experimentally Manipulated Writing
Data: x Inputlog o Scriptlog o Other:
Code: x R-script o Python x Other: VBA (Excel macro)
Authors
Sophie Hall, Veerle Baaijen & David Galbraith (2022)
Keywords
pause analysis, pause location, mixture modeling
Materials
This study is designed to demonstrate how to (i) isolate relevant transitions within a text and calculate their durations, and (ii) how to use mixture modelling to identify structure within the distributions of pauses at different locations.
In the OSF-repository, the authors have uploaded:
- Excel VBA keystroke macros: 17 macro scripts (VBA) files to add extra pause coding to Inputlog keystroke data. These scripts use Inputlog general analysis files (XML-files) converted into xlsx format (Microsoft Office Excel 2016). There are also step-by-step instructions on how to prepare and run the macros on the xml/xlsx files.
- R-script: an example of an R Markdown document, which demonstrates how mixture models have been conducted on Inputlog keystroke data that has been processed through the VBA macros (and subsequently saved in CSV file format). More specifically, the example in this document looks at a 3-component mixture model for the linear between-word pause times.
- Calculation framework: The framework that the authors used to conceptualise and identify several types of pauses based on their associated keystrokes: linear within-word, linear between-word, linear between-subsentence, linear between-sentence and linear between-paragraph.
Use of files
These files can be downloaded and used freely. However, if you wish to use them in your research project (whether you use them in their current state or if you edit them), please reference the OSF repository as an original source: doi:10.17605/OSF.IO/R53H2 and the associated paper (see below).
Publication
Hall, S., Baaijen, V. M., & Galbraith, D. (2022). Constructing theoretically informed measures of pause duration in experimentally manipulated writing. Reading and Writing, 1-29.
https://doi.org/10.1007/s11145-022-10284-4
Access
doi:10.17605/OSF.IO/R53H2
Contact
θ Sophie Hall: s.m.hall@soton.ac.uk
Van Waes, Vandersmissen, Rossetti & Leijten (2021). Inputlog Copy Task Corpus: Exploring and defining typing skills
Data: x Inputlog o Scriptlog o Other:
Code: x R-script o Python o Other:
Authors
Luuk Van Waes, Benjamin Vandersmissen, Alessandra Rossetti & Mariëlle Leijten (2021)
Keywords
copy task, typing skill, multilingual, dashboard
Materials
The Inputlog copy task allows researchers to investigate different levels of lexicality in more detail. At the moment the copy task has been developed in twelve different languages. The software is open-access and allows researchers to adapt the tasks to their specific needs.
In the Zenodo-repository, the authors have uploaded a 5k corpus of copy task analyses in different languages. A dynamic dashboard application (R-Shiny) enables researchers to explore and filter the corpus. Moreover, also self-collected data can be uploaded and included in the exploration.
Use of files
These files can be downloaded and used freely. However, if you wish to use them in your research project please reference the Zenodo repository as an original source: DOI: 10.5281/zenodo.5803400 and/or https://inputlog-analysis.uantwerpen.be/expert
Publications
- Van Waes, L., Leijten, M., Pauwaert, T., & Van Horenbeeck, E. (2019). A multilingual copy task: Measuring typing and motor skills in writing with Inputlog. Journal of Open Research Software, 7(1:30), 1-8. https://doi.org/10.5334/jors.234 (open access)
- Van Waes, L., Leijten, M., Roeser, J., Olive, T., & Grabowski, J. (2021). Designing a Copy Task to Measure and Assess Typing Skills in Writing Research. Journal of Writing Research, 13(1), 107-153
https://doi.org/10.17239/jowr-2021.13.01.04 (open access) - Van Waes, L, Leijten, M, Mariën, P., & Engelborghs, S. (2017). Typing competencies in Alzheimer’s disease: An exploration of copy tasks. Computers in Human Behavior, 73, 311– 319.
https://doi.org/10.1016/j.chb.2017.03.050 (open access)
Access
https://zenodo.org/record/5803401#.Y7PwAXbMKUk
DOI: 10.5281/zenodo.5803400
Contact
θ Luuk van Waes: luuk.vanwaes@uantwerpen.be
Cislaru & Olive (2018-2024). ANR Pro-TEXT: Processes of Textualization: Linguistic, Psycholinguistic, and Machine Learning Modeling
Data: x Inputlog x Scriptlog o Other:
Code: o R-script X Python o Other:
Authors
Georgeta Cislaru & Thierry Olive (2019-2024)
Keywords
writing bursts, linguistic analysis, pause analysis
Materials
This interdisciplinary research develops a comprehensive analysis of the textualization process, i.e. the real-time progressive construction of a text. We study bursts of writing, which are textual segments produced between two pauses, in order to provide insight into the relation between regularities of language performance and the cognitive and contextual constraints. The aim is to understand some of the layout mechanisms that allow language to give rise to novelty out of known and prefabricated data. The Pro-TEXT project develops linguistic and psycholinguistic methods and machine-learning tools to model these regularities and provide evidence about patterns of text processing. Machine-learning incremental approaches fills a gap in the analysis and representation of real-time language performance, while revealing regularities that remain unremarked under the methodologies used previously.
Corpus
Linguistically annotated data will be available at: https://pro-text.huma-num.fr/ressources/
Corpus | Words | Texts | Writers | Genres | Writing expertise |
Academic | 70464 | 26 | MA students | Mini-theses in Linguistics | Semi-experts |
Professional | 34504 | 10 | Social workers | Social reports on child protection | Experts |
Experimental | 63533 | 165 | BA students | Essays on different subjects | Experts |
Children | 20306 | 183 | Pupils (3rd-6th grades) | Narrative texts & essays | Beginners |
Translation | 13682 | 38 | BA students | EN-FR translation of medical texts & original texts produced in FR | Semi-experts |
Total | 202489 | 422 | – | 6 types of texts, 3 experimental situations | – |
Academic Subcorpus This subcorpus contains mini theses written by MA students as part of a course in discourse analysis. The texts were written over several writing sessions on students’ computers. Since this type of writing task was novel to the participants, they were evaluated as semi-experts. The students involved in data collection are native or near-native speakers of French. There are 26 different authors in the subcorpus.
Professional Reports on child protection Subcorpus The reports were written by social workers as part of their regular tasks over several sessions. Each text has at least two authors. Since the participants wrote these types of texts routinely, they were evaluated as experts. There are 9 different authors, and they are all native speakers of French.
Experimental Subcorpus These texts were produced as part of three psycholinguistic experiments on the writing process. In each experiment, processing difficulty of one of the main writing components (planning, formulating or revising) was manipulated. The texts produced in these experiments were written by BA students and were essays on different social topics, such as smoking at the university and public transportation. Each text was written in a single session. Since this type of writing task is common in the French educational system, the authors were evaluated as experts. The information about the experimental setting and experimental vs control setting is available for each text. There are 83 authors in this subcorpus, and they are all native or near-native speakers of French.
Children Subcorpus The texts in this part of the corpus were written by schoolchildren from three age groups: 3rd year of primary school (ca. 8 years old), 5th year of primary school (ca. 10 years old), and 1st year of secondary school (ca. 11 years old). Each participant wrote a narrative text and an essay on a given subject. The texts were recorded at school in one writing session. The information about the age group, the type of text, and the order of the production of the two texts is available for each text. There are 92 authors in total, and they are considered to hold a language proficiency level corresponding to their grade.
- Partial download and overview available: http://syled.univ-paris3.fr/protext/PLAY-TEXTE/CORPUS-ENFANTS-Poitiers/index-inputlog.html
The code of each text includes metadata (For example, for P22C6N1, P22=identification number, C6=6th grade, N=narrative). When users of the interface click on [Voir] in the left column, then on [Text final] in the central screen, the final text and event segmentations are displayed.
Translation Subcorpus
This subcorpus was written by BA students of translation studies. Each participant produced two types of text: an original text in French describing an image, and a translation of a medical text from English to French. Information about the author and the type of text is available for each text in the subcorpus. Given the type of the task and the fact that the text had to be produced in a highly specialized discourse genre, the students were evaluated as semi-experts. There are 19 authors in total in this subcorpus, and they have native or near-native proficiencly level in French.
- Partial download and overview available: http://syled.univ-paris3.fr/protext/PLAY-TEXTE/corpus-Traductions/index-inputlog.html
The code of each text includes metadata containing the identification number. When users of the interface click on [Voir] in the left column, then on [Text final] in the central screen, the final text and event segmentations are displayed.
Use of files
The scripts and data will soon be available for download and free use.
Selected publications
- Cislaru, G., & Olive, T. (2018). Le processus de textualisation: Analyse des unités linguistiques de performance écrite. Louvain-la-Neuve: De Boeck. https://www.cairn.info/le-processus-de-textualisation–9782807314832.htm
- Cislaru, G., & Olive, Th. (2019). Dynamiques d’amorçage au cours du processus de textualisation dans l’écriture enregistrée. In M.-J. Béguelin, G. Corminboeuf, & F. Lefeuvre (Eds.), Types d’unités et procédures de segmentation (pp. 149-162). , Limoges: Lambert Lucas. ISBN/EAN 978-2-35935-287-0
- Cislaru, G., Olive, Th. (2021). Que peut nous apprendre l’écriture enregistrée en temps réel au sujet des figures de construction ? L’Information grammaticale 169, 21-29. https://hal.science/hal-03351391
- Feltgen, Q., Cislaru, G., Benzitoun, C. (2022). Etude linguistique et statistique des unités de performance écrite : le cas de et. In SHS Web of Conferences, 138, 1-17. https://www.shs-conferences.org/articles/shsconf/pdf/2022/08/shsconf_cmlf2022_10001.pdf
- Miletic Haddad, A., Benzitoun, C., Cislaru, G., Herrera-Yanez, S. (2022). Pro-TEXT: An annotated corpus of key-stroke logs. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 1732-1739). https://aclanthology.org/2022.lrec-1.184.pdf
Access
https://pro-text.huma-num.fr/
Contact
θ Georgeta Cislaru: georgeta.cislaru@sorbonne-nouvelle.fr
Roeser, De Maeyer, Leijten & Van Waes (2022). Fitting a mixture model on copy-task data
Data: x Inputlog o Scriptlog o Other:
Code: x R-script o Python o Other:
Authors
Jens Roeser, Sven De Maeyer, Mariëlle Leijten & Luuk Van Waes (2022)
Keywords
copy task, keystroke modelling, autoregression, mixture models, Bayesian statistical models, typing skills
Materials
The folowing materials are provided:
- R-Walk-through: In the RPubs-repository (R-studio), the authors have uploaded an enterily self-contained R-Walk-through. This is a step-by-step instruction that describes how to fit a finite mixture model of two log-Normal distributions using the statistical program R in combination with the Stan-component (manually download from the OSF / github repository). The guide shows how to fit a mixture model to copy-task data to then calculate the differences between two Inputlog copy-task components. Data and code can be loaded from the repository.
- R-code for visuals: In order to visualize the analyses, the ‘tidyverse’ component is used. Code for data wrangling is provided as well as code that illustrates how to work with Bayesian posterior samples for statistical inference.
- R-script: A mixture model analysis of typing disfluencies (Stan and R code) demonstrated on Inputlog copy-task data.
https://github.com/jensroes/Typing-disfluency
Use of files
The scripts can be downloaded and used freely. However, if you wish to use them in your research project please reference the RPubs-or GitHub repository as an original source: https://rpubs.com/jensroes/765467 and/or https://doi.org/10.1007/s11145-021-10203-z
Publications
- Roeser, J., De Maeyer, S., Leijten, M., & Van Waes, L. (2021). Modelling typing disfluencies as finite mixture process. Reading and Writing. https://doi.org/10.1007/s11145-021-10203-z
- Van Waes, L., Leijten, M., Pauwaert, T., & Van Horenbeeck, E. (2019). A multilingual copy task: Measuring typing and motor skills in writing with Inputlog. Journal of Open Research Software, 7(1:30), 1-8. https://doi.org/10.5334/jors.234 (open access)
- Van Waes, L., Leijten, M., Roeser, J., Olive, T., & Grabowski, J. (2021). Designing a Copy Task to Measure and Assess Typing Skills in Writing Research. Journal of Writing research, 13(1), 107-153
https://doi.org/10.17239/jowr-2021.13.01.04 (open access)
Access
https://rpubs.com/jensroes/765467
https://osf.io/y3p4d/
Contact
θ Jens Roeser: jens.roeser@ntu.ac.uk
Rossetti & Van Waes (2022). Text simplification in second language: process and product data
Data: x Inputlog o Scriptlog o Other:
Code: o R-script o Python o Other:
Authors
Alessandra Rossetti & Luuk Van Waes (2022)
Keywords
corporate social responsibility, text revision, text simplification, second-language writing, plain language training, cognitive effort, text analysis, keystroke logging
Materials
Data are made made available taken from an experimental study with second-language university students. We adopted a pre-test and post-test design, and randomly divided participants into experimental and control group. In the pre-test, participants were given an extract of a corporate report dealing with sustainability and were asked to revise it to make it easier to read for a lay customer. Subsequently, they took part in training. The experimental group received training on both plain language and sustainability, while the control group received training exclusively on the topic of sustainability. In the post-test session, all participants were assigned a second extract of a corporate report dealing with sustainability, and were asked again to make it easier to read for a lay customer by applying what they had learned from their respective training.
Two main types of data were made available:
- keystroke logging data via Inputlog, available as IDFX files (process data). We analyzed the keystroke logging data using InputLog analyses: https://www.inputlog.net/
- the texts simplified by the students, available as Microsoft Word documents (product data). We analyzed these texts in terms of readability using Coh-Metrix: http://cohmetrix.com/
Use of files
The corpora can be downloaded and used freely. However, if you wish to use them in your research project please reference the Zenodo-repository as an original source: https://zenodo.org/record/6720290#.YrWL3XZByUl
Publications
- Rossetti, A., & Van Waes, L. (2022). Revision of business content on corporate social responsibility: Measuring the impact of training on the cognitive effort of second-language university students. Hermes – Journal of Language and Communication in Business, 62, 27-54. PDF | https://doi.org/10.7146/hjlcb.vi62.132262
- Rossetti, A., & Van Waes, L. (2022). It’s not just a phase: Investigating text simplification in a second language from a process and product perspective, Frontiers in Artificial Intelligence, 5:983008 | https://doi.org/10.3389/frai.2022.983008
- Rossetti, A., & Van Waes, L. (2022). Accessible communication of CSR: Development and preliminary evaluation of an online module. Business and Professional Communication Quarterly, 85(1), 52-79 | https://doi.org/10.1177/23294906221074324
Access
https://zenodo.org/record/6720290#.YrWL3XZByUl
Contact
θ Alessandra Rossetti: Alessandra.Rossetti@vub.be
Meulemans, Leijten, Van Waes, Engelborghs & De Maeyer (2022). CSV files and R script : Writing process data of typed picture description by 15 cognitively impaired patients and 15 healthy controls
Data: o Inputlog x Scriptlog o Other:
Code: x R-script o Python o Other:
Authors
Catherine Meulemans, Mariëlle Leijten, Luuk Van Waes, Sebastiaan Engelborghs & Sven De Maeyer (2022)
Keywords
writing processes, word categories, keystroke logging, Alzheimer’s disease, dementia, mild cognitive impairment
Materials
Writing process data of 15 cognitively impaired patients and 15 age- and gender-matched healthy controls were obtained. Each of them completed two typed picture description tasks that were logged with Inputlog, a keystroke logging tool. Variables included time on task; number of characters, pauses, and Pause-bursts per minute; proportion of pause time; duration of Pause-bursts; and pause time between words. The effect of pauses preceding specific word categories was also analyzed for pause time between words.
The data were used to explore if the observation of writing behavior can assist in the screening and follow-up of mild cognitive impairment (MCI) and mild dementia due to Alzheimer’s disease (AD).
Two main types of data were collected:
1. CSV files that were used for the analyses, and
2. R-scripts.
Use of files
The scripts and data can be downloaded and used freely. However, if you wish to use them in your research project please reference the Zenodo-repository as an original source: https://zenodo.org/record/6720290#.YrWL3XZByUl or https://doi.org/10.5281/zenodo.5942516
Publications
§ Meulemans, C., Leijten, M, Van Waes L, Engelborghs S., & De Maeyer, S. (2022) Cognitive writing process characteristics in Alzheimer’s Disease, Frontiers in Psychology, 13 (872280) | https://doi.org/10.1007/10.3389/fpsyg.2022.872280
Access
https://zenodo.org/record/6720290#.YrWL3XZByUl
Contact
θ Mariëlle Leijten: marielle.leijten@uantwerpen.be
Van Waes , Leijten, Pauwaert & Van Horenbeeck (2019). A Multilingual Copy Task: Measuring Typing and Motor Skills in Writing with Inputlog
Data: x Inputlog o Scriptlog o Other:
Code: o R-script o Python x Other:Javascript
Authors
Luuk Van Waes, Mariëlle Leijten, Tom Pauwaert & Eric Van Horenbeeck (2019)
Keywords
copy tasks, typing skills, motor coordination, keyboarding, writing studies, interkey intervals, bigram frequency, tapping task, typing fluency
Materials
A strictly controlled copy task was developed guiding participants through seven modules in which different prompts are presented, each dealing with complementary levels of lexicality. Fine-grained keystroke logging allows for a range of analyses (www.inputlog.net).
The copy task can be used in – and together with – all types of writing process studies. At the moment the copy task has been developed in ten different languages. The software is open-access and allows researchers to adapt the tasks to their specific needs.
Two main types of materials were collected:
- Copy task builder: This tool allows the researcher to adapt and/or create a copy task and adapt the task flow, the modules, and the instruction.
- Copy task Javascript: The modular concept of the javascript provided allows the development of new components creating other copy task functions such as audio-based or graphically prompted copying. The task instructions could also be enhanced with video guidelines.
Use of files
The program code is open access and can be downloaded and used freely. However, if you wish to use them in your research project please reference the GitHub-repository as an original source: https://github.com/lvanwaes/Inputlog-Copy-Task or/and DOI: https://doi.org/10.5281/zenodo.2908966
Publications
Van Waes, L., Leijten, M., Pauwaert, T., & Van Horenbeeck, E. (2019). A multilingual copy task: Measuring typing and motor skills in Writing with Inputlog. Journal of Open Research Software, 7(1:30), 1-8. https://doi.org/10.5334/jors.234
Access
https://doi.org/10.5281/zenodo.2908966
https://github.com/lvanwaes/Inputlog-Copy-Task
Contact
θ Luuk van Waes: luuk.vanwaes@uantwerpen.be
Vandermeulen, Leijten & Van Waes (2020). Reporting Writing Process Feedback in the Classroom: Using Keystroke Logging Data to Reflect on Writing Processes
Data: x Inputlog o Scriptlog o Other:
Code: o R-script o Python o Other:Javascript
Authors
Nina Vandermeulen, Mariëlle Leijten & Luuk Van Waes (2020)
Keywords
keystroke logging, process feedback, self-assessment, writing from sources, writing processes
Materials
Inputlog facilitates writing tutors in providing process feedback to their students. Based on an XML- logfile, the so-called ‘report’ function automatically generates a pdf-file addressing different perspectives of the writing process: pausing, revision, source use, and fluency. These perspectives are reported either quantitatively or visually. Brief introductory texts explain the information presented. Inputlog provides a default feedback report, but users can also customize the report.
An intervention study demonstrates the effect of these process reports in a classroom setting.
The following type of materials are made available:
- Inputlog based process profiles: Descriptions of 5 benchmark process profiles ….
- Inputlog data: …
Use of files
The benchmark corpus (Dutch) is open access and can be downloaded and used freely. However, if you wish to use them in your research project please reference this repository as an original source:
https://liftwritingresearch.wordpress.com/benchmark-processen/
Publications
- Vandermeulen, N., Leijten, M., & Van Waes, L. (2020). Reporting Writing Process Feedback in the Classroom: Using Keystroke Logging Data to Reflect on Writing Processes. Journal of Writing Research, 12 (1), 109-140. DOI: 10.17239/jowr-2020.12.01.05
- Vandermeulen, N., Van Steendam, E., De Maeyer, S., & Rijlaarsdam, G. (2023). Writing process feedback based on keystroke logging and comparison with exemplars: Effects on the quality and process of synthesis texts. Written Communication, 40(1), 90-144. https://doi.org/10.1177/07410883221127998
Access
https://liftwritingresearch.wordpress.com/benchmark-processen/
Contact
θ Nina Vandermeulen: nina.vandermeulen@uantwerpen.be
Mahlow, Ulasik & Tuggener (2022). Text History Extraction Tool (THEtool) A tool for Linguistic Modeling of Written Text Production
Data: x Inputlog o Scriptlog o Other:
Code: o R-script x Python o Other:
Authors
Cerstin Mahlow, Malgorzata Anna Ulasik & Don Tuggener (2022)
Keywords
Writing process, keystroke-logging, transforming sequence, text history, sentence history, written text production, linguistic modeling
Materials
The study presents an approach for the analysis of writing processes with a focus on linguistic structures applying natural language processing (NLP) tools. It is based on the novel concepts of transforming sequences, text history, and sentence history. The transforming sequence is used to store differences between text versions on the surface and record the editing operations involved. The text and sentence histories allow for reproducing and visualizing the genesis and history of a text and its individual sentences. The main focus of the approach is the constant linking of the process and the product.
THEtool uses two main modes to capture text versions from idfx-files (Inpulog logfiles):
- the Pause Capturing Mode (PCM), which relies on a preset pause duration in the text production to yield versions,
- and the Edit Capturing Mode (ECM), which uses a change production mode to determine versions. A change in production mode is defined as switching between one of the modes (a) writing at the edge of the text, (b) deleting something, (c) inserting something.
The following materials are made available:
- Inputlog data: The input file processed by the tool is an idfx file in XML format
- Python script: Open-source application for parsing raw keystroke logging data from a writing session, processing it to retrieve all relevant text versions produced during this session, and eventually generating text and sentence histories based on the collected information.
Use of files
The program code and the corpus can be downloaded and used freely. However, if you wish to use them in your research project please reference the GitHub-repository as an original source: https://github.com/mulasik/wta
Publications
Mahlow, C., Ulasik, M.A. & Tuggener, D. (2022). Extraction of transforming sequences and sentence histories from writing process data: a first step towards linguistic modeling of writing. Reading and Writing (2022). https://doi.org/10.1007/s11145-021-10234-6
Access
https://github.com/mulasik/wta
Contact
θ Cerstin Mahlow: cerstin@mahlow.ch
Buschenhenke, Conijn & Van Waes (2022). Measuring non-linearity of long-term writing processes
Data: x Inputlog o Scriptlog o Other:
Code: x R-script o Python o Other:
Authors
Floor Buschenhenke, Rianne Conijn & Luuk van Waes (2022)
Keywords
non-linearity, writing process, keystroke logging, multi-session writing, point of utterance, writing dynamics
Materials
Linearity metrics are commonly calculated based on the leading edge and are mostly used for short texts and single writing sessions. However, especially for longer, multi-session writing processes, text can often be created at various spaces, not necessarily including the leading edge. Accordingly, the leading edge is not enough to distinguish between linear production and non-linear text alterations.
Therefore, the current study proposes a novel, more flexible, automatized non-linearity analysis, which does not solely rely on the leading edge. In this approach, all backwards and forwards cursor and mouse operations from the point of utterance are extracted from keystroke data, and characterized both based on duration and distance. This results in a detailed list of characteristics per writing episode, allowing us to compare and group episodes of writing at various scales.
The non-linearity analysis can be used to find shifts in non-linearity over time. Moreover, the analysis allows researchers to chart interactions with the text-produced-so-far, for instance, revealing session management strategies in multi-session writing.
Two main types of materials were collected:
- R-script I: non-linearity analysis in R (https://github.com/FloorBuschenhenke/NonLinearityMethod)
- R-Script II: Dynamic non-linearity visualization using R-Shiny (https://trackchanges.shinyapps.io/Shinyprocessgraph/)
Use of files
The R-scripts can be downloaded and used freely. However, if you wish to use them in your research project please reference the original source:
https://github.com/FloorBuschenhenke/NonLinearityMethod
https://trackchanges.shinyapps.io/Shinyprocessgraph/
Publication
Buschenhenke, F., Conijn, R., & Van Waes (under review). Measuring non-linearity of long-term writing processes.
Access
https://github.com/FloorBuschenhenke/NonLinearityMethod
Contact
θ Floor Buschenhenke: floor.buschenhenke@huygens.knaw.nl or woordheks@gmail.com
Muñoz Martín & Apfelthaler (2022). The Task Segment Framework A tool to study source-based tasks at the keyboard
Data: x Inputlog o Scriptlog x Other: FACIL
Code: o R-script o Python o Other:
Authors
Ricardo Muñoz Martín & Matthias Apfelthaler (2022)
Keywords
Source-based tasks, multilectal communication, keystroke-logging, behavioral fluency, efficiency, efficacy.
Materials
The study presents an analytical framework to study writing processes in source-based tasks, which has so far been tested on translation and writing. Basic concepts in the framework are behavioral fluency and the Minimax principle (maximum effect with minimal effort). IKIs are classed as either willful or involuntary, the latter further divided into potentially relevant to the task and mechanic. Wilful IKIs, or pauses, are used to mark behavioral units o task segments, with or without text (with text, they are bursts). According to their contents, task segments are classed as ADD, CHANGE, SEARCH and MIXED (quite self explanatory, and hypothesized to have or mix mutually excluding behavioral repertoires) plus FILLERS (apparently purposeless, isolated behaviors) and HCI (behaviors at the keyboard apparently unrelated or weakly related to the task). The main focus of the Task Segment Framework (TSF) is metacognitive control and cognitive resource management.
THE TSF is work in progress. As of 2023, it uses a baseline of 200 ms (shorter IKIS, or lags, are ignored) and two subject- and session-bound thresholds to separate remaining IKIs into three kinds: a lower threshold separates apparently mechanical IKIs, or delays, from those potentially relevant to the task, or respites. An upper threshold separates respites from the longest IKIs, or (willful) pauses. The baseline and the thresholds are calculated with the pause analysis of idfx files: the lower one is 2 × median IKI within word and the upper threshold is 3 × median IKI between words.
In order to ease the study of the progress through the tasks, the TSF includes as S-notation system using symbols and color codes and a vertical representation per task segment to facilitate the study of logs. Together with Eros Zanchetti, from Unibo, we have developed a small application to convert the output of Inputlog’s general analisis.
The following materials are made available:
- Inputlog data: The input files are idfx files in XML format
- FACIL: An application for parsing raw keystroke logging data from a task session, processing it to chunk the record into task segments and recode the data.
Use of files
The program FACIL and a corpus of several small research projects, all of them on translating or containing translation as one of the tasks, can be downloaded and used freely.
Publications
- Muñoz Martín, Ricardo & Matthias Apfelthaler. A Task Segment Framework to study keylogged translation processes. Translation & Interpreting 14: 8–31.
- Muñoz Martín, Ricardo & Matthias Apfelthaler. Spillover effects in task-segment switching. A study of translation subtasks as behavioral categories within the Task Segment Framework. In Muñoz Martín, Ricardo, Sun Sanjun & Li Defeng, eds. Advances in Cognitive Translation Studies, 19–45. Singapur: Springer.
- Muñoz Martín, Ricardo & Celia Martín de Leon. Fascinatin’ rhythm – and pauses in translators’ cognitive processes. Hermes 57: 29–47.
- Muñoz Martín, Ricardo & José María Cardona Guerra. Translating in fits and starts: pause thresholds and roles in the research of translation processes. Perspectives 27: 525–551.
- Puerini, Sara.Typing your mind away. Comparing keylogged tasks with the Task Segment Framework. M.A. thesis, Università di Bologna. https://amslaurea.unibo.it/22899/
Access
*
Contact
8 Ricardo Muñoz: ricardo.munoz@unibo.it
Reference
If you publish or present a paper in which Inputlog has been used, please refer to the following article:
Leijten, M., & Van Waes, L. (2013). Keystroke Logging in Writing Research: Using Inputlog to Analyze Writing Processes. Written Communication 30(3), 358-392
DOI: 10.1177/0741088313491692
PDF