English Linguistic Traces in Thai Translation: A Preliminary Stylometric Study of Textual Traceability in Human and LLM Translation

Main Article Content

Sanooch Nathalang
Natthaphon Tripornchaisak

Abstract

This preliminary study examines stylometric patterns in a small corpus of English-to-Thai literary translations produced by six human translators and three LLM systems across six source passages. Human translations function both as evidence of stylometric patterns in literary translation and as reference points against which the three LLM systems are evaluated. The inclusion of multiple LLMs allows the study to examine how stylometric traces emerge in LLM translations, both in comparison with human translations and across the systems themselves. Using lexical diversity measures, sentence-based syntactic measures, Burrows’ Delta, and BERTScore F1 as a semantic fidelity control, the study explores textual differences between the human translations and the outputs within this corpus. The most distinct contrasts appear in Maas values, sentence-length variance, and overall stylistic distance, while some measures show overlap across systems. Overall, the findings show that both human and LLM translations leave traceable stylometric patterns, though these are distributed differently across lexical richness, sentence-length patterning, pronominal reference, and overall stylistic distance. For English language studies, the findings offer a concrete illustration of how marked features of English literary prose are preserved, flattened, or shifted when rendered into a typologically distant language, which may be useful for translation pedagogy, post-editing, and the critical reading of AI-assisted translation.

Article Details

How to Cite
Nathalang, S., & Tripornchaisak, N. (2026). English Linguistic Traces in Thai Translation: A Preliminary Stylometric Study of Textual Traceability in Human and LLM Translation. Journal of Studies in the English Language, 21(1), 137–160. https://doi.org/10.64731/jsel.v21i1.287759
Section
Research Articles
Author Biographies

Sanooch Nathalang, Thammasat University, Thailand

Sanooch S. Nathalang, PhD. is a lecturer at Thammasat University. As a former researcher at NECTEC, she merges technology and education across her research interests in technology-enhanced learning, AI, translation and interpreting applications, and their pedagogical implications. She is a key contributor to a national policy-level AI in education initiative. 

Natthaphon Tripornchaisak, Thammasat University, Thailand

Natthaphon Tripornchaisak, PhD, is a lecturer at Thammasat University, specialising in interpreting pedagogy and intercultural communication. An evaluator for the Thai Parliament and interpreter for the IMF and UN, he has completed AIIC Training of Trainers seminars. A former visiting scholar, he bridges academic theory with high-level conference interpreting standards.

References

Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honour of John Sinclair (pp. 233–250). John Benjamins

Baker, M. (2000). Towards a methodology for investigating the style of a literary translator. Target, 12(2), 241–266. https://doi.org/10.1075/target.12.2.04bak

Baroni, M., & Bernardini, S. (2006). A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing, 21(3), 259–274. https://doi.org/10.1093/llc/fqi039

Burrows, J. (2002). ‘Delta’: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 17(3), 267–287. https://doi.org/10.1093/llc/17.3.267

Eder, M. (2015). Does size matter? Authorship attribution, small samples, big problem. Digital Scholarship in the Humanities, 30(2), 167–182. https://doi.org/10.1093/llc/fqt066

Eder, M., Rybicki, J., & Kestemont, M. (2016). Stylometry with R: A package for computational text analysis. The R Journal, 8(1), 107–121. https://doi.org/10.32614/RJ-2016-007

Evert, S., Proisl, T., Jannidis, F., Reger, I., Pielström, S., Schöch, C., & Vitt, T. (2017). Understanding and explaining Delta measures for authorship attribution. Digital Scholarship in the Humanities, 32(suppl_2), ii4–ii16. https://doi.org/10.1093/llc/fqx023

Jiao, W., Wang, W., Huang, J., Wang, X., Shi, S., & Tu, Z. (2023). Is ChatGPT a good translator? A preliminary study. arXiv preprint arXiv:2301.08745. https://arxiv.org/abs/2301.08745

Karpinska, M., & Iyyer, M. (2023). Large language models effectively leverage document-level context for literary translation. arXiv preprint arXiv:2304.02210. https://doi.org/10.48550/arXiv.2304.03245

Kolirin, L. (2026, January 23). Like digging ‘your own professional grave’: The translators grappling with losing work to AI. CNN Business. https://edition.cnn.com/2026/01/23/tech/translation-language-jobs-ai-automation-intl

Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 40(4), 554–564. https://doi.org/10.1016/j.system.2012.10.012

Laviosa, S. (1998). Core patterns of lexical use in a comparable corpus of English narrative prose. Meta, 43(4), 557–570. https://doi.org/10.7202/003425

Lembersky, G., Ordan, N., & Wintner, S. (2012). Language models for machine translation: Original vs. translated texts. Computational Linguistics, 38(4), 799–825. https://doi.org/10.1162/COLI_a_00111

Nam, H. D. (2023). Copyright Law Issues Regarding AI-based Literature Translation: Déjà-vu of the Tower of Babel? Jeo’jag’gweon, 36(4), 33–88. https://doi.org/10.30582/kdps.2023.36.4.33

Nivre, J., de Marneffe, M.-C., Ginter, F., Hajič, J., Manning, C., Pyysalo, S., Schuster, S., Tyers, F., & Zeman, D. (2020). Universal dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020) (pp. 4034–4043). European Language Resources Association. https://aclanthology.org/2020.lrec-1.497/

Phatthiyaphaibun, W., Chaovavanich, K., Polpanumas, C., Suriyawongkul, A., Lowphansirikul, L., Chormai, P., Limkonchotiwat, P., Suntorntip, T., & Udomcharoenchaikit, C. (2023). PyThaiNLP: Thai natural language processing in Python. In L. Tan, D. Milajevs, G. Chauhan, J. Gwinnup, & E. Rippeth (Eds.), Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023) (pp. 25–36). Association for Computational Linguistics. https://aclanthology.org/2023.nlposs-1.4/

Rybicki, J. (2012). The great mystery of the (almost) invisible translator: Stylometry in translation. In M. Oakes & M. Ji (Eds.), Quantitative methods in corpus-based translation studies: A practical guide to descriptive translation research (pp. 231–248). John Benjamins.

Rybicki, J., & Eder, M. (2010). Deeper Delta across genres and languages: Do we really need the most frequent words? Literary and Linguistic Computing, 26(3), 315–321. https://doi.org/10.1093/llc/fqr031

Rybicki, J., & Heydel, M. (2013). The stylistics and stylometry of collaborative translation: Woolf’s Night and Day in Polish. Literary and Linguistic Computing, 28(4), 708–717. https://doi.org/10.1093/LLC/FQT027

Saldanha, G. (2011). Translator style: Methodological considerations. The Translator, 17(1), 25–50. https://doi.org/10.1080/13556509.2011.10799478

Venuti, L. (1995). The translator’s invisibility: A history of translation. Routledge.

Venuti, L. (2008). The translator’s invisibility: A history of translation (2nd ed.). Routledge. (Original work published 1995). https://doi.org/10.4324/9780203360064

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating text generation with BERT. In Proceedings of the International Conference on Learning Representations (ICLR 2020). https://openreview.net/forum?id=SkeHuCVFDr