English Linguistic Traces in Thai Translation: A Preliminary Stylometric Study of Textual Traceability in Human and LLM Translation
Main Article Content
Abstract
This preliminary study examines stylometric patterns in a small corpus of English-to-Thai literary translations produced by six human translators and three LLM systems across six source passages. Human translations function both as evidence of stylometric patterns in literary translation and as reference points against which the three LLM systems are evaluated. The inclusion of multiple LLMs allows the study to examine how stylometric traces emerge in LLM translations, both in comparison with human translations and across the systems themselves. Using lexical diversity measures, sentence-based syntactic measures, Burrows’ Delta, and BERTScore F1 as a semantic fidelity control, the study explores textual differences between the human translations and the outputs within this corpus. The most distinct contrasts appear in Maas values, sentence-length variance, and overall stylistic distance, while some measures show overlap across systems. Overall, the findings show that both human and LLM translations leave traceable stylometric patterns, though these are distributed differently across lexical richness, sentence-length patterning, pronominal reference, and overall stylistic distance. For English language studies, the findings offer a concrete illustration of how marked features of English literary prose are preserved, flattened, or shifted when rendered into a typologically distant language, which may be useful for translation pedagogy, post-editing, and the critical reading of AI-assisted translation.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms: Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).References
Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honour of John Sinclair (pp. 233–250). John Benjamins
Baker, M. (2000). Towards a methodology for investigating the style of a literary translator. Target, 12(2), 241–266. https://doi.org/10.1075/target.12.2.04bak
Baroni, M., & Bernardini, S. (2006). A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing, 21(3), 259–274. https://doi.org/10.1093/llc/fqi039
Burrows, J. (2002). ‘Delta’: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 17(3), 267–287. https://doi.org/10.1093/llc/17.3.267
Eder, M. (2015). Does size matter? Authorship attribution, small samples, big problem. Digital Scholarship in the Humanities, 30(2), 167–182. https://doi.org/10.1093/llc/fqt066
Eder, M., Rybicki, J., & Kestemont, M. (2016). Stylometry with R: A package for computational text analysis. The R Journal, 8(1), 107–121. https://doi.org/10.32614/RJ-2016-007
Evert, S., Proisl, T., Jannidis, F., Reger, I., Pielström, S., Schöch, C., & Vitt, T. (2017). Understanding and explaining Delta measures for authorship attribution. Digital Scholarship in the Humanities, 32(suppl_2), ii4–ii16. https://doi.org/10.1093/llc/fqx023
Jiao, W., Wang, W., Huang, J., Wang, X., Shi, S., & Tu, Z. (2023). Is ChatGPT a good translator? A preliminary study. arXiv preprint arXiv:2301.08745. https://arxiv.org/abs/2301.08745
Karpinska, M., & Iyyer, M. (2023). Large language models effectively leverage document-level context for literary translation. arXiv preprint arXiv:2304.02210. https://doi.org/10.48550/arXiv.2304.03245
Kolirin, L. (2026, January 23). Like digging ‘your own professional grave’: The translators grappling with losing work to AI. CNN Business. https://edition.cnn.com/2026/01/23/tech/translation-language-jobs-ai-automation-intl
Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 40(4), 554–564. https://doi.org/10.1016/j.system.2012.10.012
Laviosa, S. (1998). Core patterns of lexical use in a comparable corpus of English narrative prose. Meta, 43(4), 557–570. https://doi.org/10.7202/003425
Lembersky, G., Ordan, N., & Wintner, S. (2012). Language models for machine translation: Original vs. translated texts. Computational Linguistics, 38(4), 799–825. https://doi.org/10.1162/COLI_a_00111
Nam, H. D. (2023). Copyright Law Issues Regarding AI-based Literature Translation: Déjà-vu of the Tower of Babel? Jeo’jag’gweon, 36(4), 33–88. https://doi.org/10.30582/kdps.2023.36.4.33
Nivre, J., de Marneffe, M.-C., Ginter, F., Hajič, J., Manning, C., Pyysalo, S., Schuster, S., Tyers, F., & Zeman, D. (2020). Universal dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020) (pp. 4034–4043). European Language Resources Association. https://aclanthology.org/2020.lrec-1.497/
Phatthiyaphaibun, W., Chaovavanich, K., Polpanumas, C., Suriyawongkul, A., Lowphansirikul, L., Chormai, P., Limkonchotiwat, P., Suntorntip, T., & Udomcharoenchaikit, C. (2023). PyThaiNLP: Thai natural language processing in Python. In L. Tan, D. Milajevs, G. Chauhan, J. Gwinnup, & E. Rippeth (Eds.), Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023) (pp. 25–36). Association for Computational Linguistics. https://aclanthology.org/2023.nlposs-1.4/
Rybicki, J. (2012). The great mystery of the (almost) invisible translator: Stylometry in translation. In M. Oakes & M. Ji (Eds.), Quantitative methods in corpus-based translation studies: A practical guide to descriptive translation research (pp. 231–248). John Benjamins.
Rybicki, J., & Eder, M. (2010). Deeper Delta across genres and languages: Do we really need the most frequent words? Literary and Linguistic Computing, 26(3), 315–321. https://doi.org/10.1093/llc/fqr031
Rybicki, J., & Heydel, M. (2013). The stylistics and stylometry of collaborative translation: Woolf’s Night and Day in Polish. Literary and Linguistic Computing, 28(4), 708–717. https://doi.org/10.1093/LLC/FQT027
Saldanha, G. (2011). Translator style: Methodological considerations. The Translator, 17(1), 25–50. https://doi.org/10.1080/13556509.2011.10799478
Venuti, L. (1995). The translator’s invisibility: A history of translation. Routledge.
Venuti, L. (2008). The translator’s invisibility: A history of translation (2nd ed.). Routledge. (Original work published 1995). https://doi.org/10.4324/9780203360064
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating text generation with BERT. In Proceedings of the International Conference on Learning Representations (ICLR 2020). https://openreview.net/forum?id=SkeHuCVFDr