English Linguistic Traces in Thai Translation: A Preliminary Stylometric Study of Textual Traceability in Human and LLM Translation

Sanooch Nathalang; Natthaphon Tripornchaisak

doi:10.64731/jsel.v21i1.287759

PDF

Published: Apr 30, 2026

DOI: https://doi.org/10.64731/jsel.v21i1.287759

Keywords:

translator invisibility linguistic traces stylometry LLM translation human translation

Sanooch Nathalang

Thammasat University, Thailand

https://orcid.org/0009-0008-7392-8404

Natthaphon Tripornchaisak

Thammasat University, Thailand

https://orcid.org/0000-0001-9540-462X

Abstract

This preliminary study examines stylometric patterns in a small corpus of English-to-Thai literary translations produced by six human translators and three LLM systems across six source passages. Human translations function both as evidence of stylometric patterns in literary translation and as reference points against which the three LLM systems are evaluated. The inclusion of multiple LLMs allows the study to examine how stylometric traces emerge in LLM translations, both in comparison with human translations and across the systems themselves. Using lexical diversity measures, sentence-based syntactic measures, Burrows’ Delta, and BERTScore F1 as a semantic fidelity control, the study explores textual differences between the human translations and the outputs within this corpus. The most distinct contrasts appear in Maas values, sentence-length variance, and overall stylistic distance, while some measures show overlap across systems. Overall, the findings show that both human and LLM translations leave traceable stylometric patterns, though these are distributed differently across lexical richness, sentence-length patterning, pronominal reference, and overall stylistic distance. For English language studies, the findings offer a concrete illustration of how marked features of English literary prose are preserved, flattened, or shifted when rendered into a typologically distant language, which may be useful for translation pedagogy, post-editing, and the critical reading of AI-assisted translation.

How to Cite

Nathalang, S., & Tripornchaisak, N. (2026). English Linguistic Traces in Thai Translation: A Preliminary Stylometric Study of Textual Traceability in Human and LLM Translation. Journal of Studies in the English Language, 21(1), 137–160. https://doi.org/10.64731/jsel.v21i1.287759

Issue

Vol. 21 No. 1 (2026): January-April 2026

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms: Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Author Biographies

Sanooch Nathalang, Thammasat University, Thailand

Sanooch S. Nathalang, PhD. is a lecturer at Thammasat University. As a former researcher at NECTEC, she merges technology and education across her research interests in technology-enhanced learning, AI, translation and interpreting applications, and their pedagogical implications. She is a key contributor to a national policy-level AI in education initiative.

Natthaphon Tripornchaisak, Thammasat University, Thailand

Natthaphon Tripornchaisak, PhD, is a lecturer at Thammasat University, specialising in interpreting pedagogy and intercultural communication. An evaluator for the Thai Parliament and interpreter for the IMF and UN, he has completed AIIC Training of Trainers seminars. A former visiting scholar, he bridges academic theory with high-level conference interpreting standards.

References

Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honour of John Sinclair (pp. 233–250). John Benjamins

Baker, M. (2000). Towards a methodology for investigating the style of a literary translator. Target, 12(2), 241–266. https://doi.org/10.1075/target.12.2.04bak

Baroni, M., & Bernardini, S. (2006). A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing, 21(3), 259–274. https://doi.org/10.1093/llc/fqi039

Burrows, J. (2002). ‘Delta’: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 17(3), 267–287. https://doi.org/10.1093/llc/17.3.267

Eder, M. (2015). Does size matter? Authorship attribution, small samples, big problem. Digital Scholarship in the Humanities, 30(2), 167–182. https://doi.org/10.1093/llc/fqt066

Eder, M., Rybicki, J., & Kestemont, M. (2016). Stylometry with R: A package for computational text analysis. The R Journal, 8(1), 107–121. https://doi.org/10.32614/RJ-2016-007

Evert, S., Proisl, T., Jannidis, F., Reger, I., Pielström, S., Schöch, C., & Vitt, T. (2017). Understanding and explaining Delta measures for authorship attribution. Digital Scholarship in the Humanities, 32(suppl_2), ii4–ii16. https://doi.org/10.1093/llc/fqx023

Jiao, W., Wang, W., Huang, J., Wang, X., Shi, S., & Tu, Z. (2023). Is ChatGPT a good translator? A preliminary study. arXiv preprint arXiv:2301.08745. https://arxiv.org/abs/2301.08745

Karpinska, M., & Iyyer, M. (2023). Large language models effectively leverage document-level context for literary translation. arXiv preprint arXiv:2304.02210. https://doi.org/10.48550/arXiv.2304.03245

Kolirin, L. (2026, January 23). Like digging ‘your own professional grave’: The translators grappling with losing work to AI. CNN Business. https://edition.cnn.com/2026/01/23/tech/translation-language-jobs-ai-automation-intl

Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 40(4), 554–564. https://doi.org/10.1016/j.system.2012.10.012

Laviosa, S. (1998). Core patterns of lexical use in a comparable corpus of English narrative prose. Meta, 43(4), 557–570. https://doi.org/10.7202/003425

Lembersky, G., Ordan, N., & Wintner, S. (2012). Language models for machine translation: Original vs. translated texts. Computational Linguistics, 38(4), 799–825. https://doi.org/10.1162/COLI_a_00111

Nam, H. D. (2023). Copyright Law Issues Regarding AI-based Literature Translation: Déjà-vu of the Tower of Babel? Jeo’jag’gweon, 36(4), 33–88. https://doi.org/10.30582/kdps.2023.36.4.33

Nivre, J., de Marneffe, M.-C., Ginter, F., Hajič, J., Manning, C., Pyysalo, S., Schuster, S., Tyers, F., & Zeman, D. (2020). Universal dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020) (pp. 4034–4043). European Language Resources Association. https://aclanthology.org/2020.lrec-1.497/

Phatthiyaphaibun, W., Chaovavanich, K., Polpanumas, C., Suriyawongkul, A., Lowphansirikul, L., Chormai, P., Limkonchotiwat, P., Suntorntip, T., & Udomcharoenchaikit, C. (2023). PyThaiNLP: Thai natural language processing in Python. In L. Tan, D. Milajevs, G. Chauhan, J. Gwinnup, & E. Rippeth (Eds.), Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023) (pp. 25–36). Association for Computational Linguistics. https://aclanthology.org/2023.nlposs-1.4/

Rybicki, J. (2012). The great mystery of the (almost) invisible translator: Stylometry in translation. In M. Oakes & M. Ji (Eds.), Quantitative methods in corpus-based translation studies: A practical guide to descriptive translation research (pp. 231–248). John Benjamins.

Rybicki, J., & Eder, M. (2010). Deeper Delta across genres and languages: Do we really need the most frequent words? Literary and Linguistic Computing, 26(3), 315–321. https://doi.org/10.1093/llc/fqr031

Rybicki, J., & Heydel, M. (2013). The stylistics and stylometry of collaborative translation: Woolf’s Night and Day in Polish. Literary and Linguistic Computing, 28(4), 708–717. https://doi.org/10.1093/LLC/FQT027

Saldanha, G. (2011). Translator style: Methodological considerations. The Translator, 17(1), 25–50. https://doi.org/10.1080/13556509.2011.10799478

Venuti, L. (1995). The translator’s invisibility: A history of translation. Routledge.

Venuti, L. (2008). The translator’s invisibility: A history of translation (2nd ed.). Routledge. (Original work published 1995). https://doi.org/10.4324/9780203360064

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating text generation with BERT. In Proceedings of the International Conference on Learning Representations (ICLR 2020). https://openreview.net/forum?id=SkeHuCVFDr

Article Sidebar

Main Article Content

Abstract

Article Details

Sanooch Nathalang, Thammasat University, Thailand

Natthaphon Tripornchaisak, Thammasat University, Thailand

References