The Effects of Lexical-Features Changes in ChatGPT-Generated Content Using N-Gram Approach

Chatpong Hacherngsoonghom; Mukda Suktarachan; Thanasak Sirikanerat; Bhimbasistha Tejarajanya; Lena Maluleem

PDF

Published: Feb 24, 2025

Keywords:

Lexical-Features Changes ChatGPT N-gram

Chatpong Hacherngsoonghom

Department of Linguistics, Faculty of Humanities, Kasetsart University

Mukda Suktarachan

Department of Linguistics, Faculty of Humanities, Kasetsart University

Thanasak Sirikanerat

Department of Linguistics, Faculty of Humanities, Kasetsart University

Bhimbasistha Tejarajanya

Department of Linguistics, Faculty of Humanities, Kasetsart University

Lena Maluleem

Department of Linguistics, Faculty of Humanities, Kasetsart University

Abstract

This article aims 1) to analyze changes in word types appearing in content generated by ChatGPT through n-gram analysis and 2) to examine the lexical features of revised content modified by ChatGPT at the third-gram position. The research adopts a quantitative approach, employing n-gram analysis focusing on the third-gram position as the framework for evaluating changes in word types and lexical features resulting from word modifications. The study is document-based, with the sample comprising eight children’s stories selected through purposive sampling. The research utilized two tools: 1) the ChatGPT program and 2) Python via Google Collaboratory. The data were analyzed using basic statistical methods. The findings revealed that 1. Regarding the first objective, modifications at the third-gram position influenced the realism and coherence of the content. Nouns were the most frequently affected word type, followed by verbs, determiners, pronouns, adjectives, conjunctions, adverbs, numerals, and interjections, respectively. 2. For the second objective, eight types of lexical relationships were identified: 1) Antonym, 2) Co-hyponym, 3) Co-hypernym, 4) Hyponym, 5) Hypernym, 6) Synonym, 7) Meronym, and 8) Holonym. A new pattern, Word embedding similarity, was also observed as the most frequently occurring relationship, reflecting the AI’s capability to maintain contextual relevance and content appropriateness. The knowledge derived from this research enhances the integration of linguistics and artificial intelligence (AI). The user can apply this model to develop improved methods for evaluating AI-generated content, refining prompts for better performance, and designing suitable teaching materials. Furthermore, this study advocates for responsible, transparent, and ethical AI use, contributing to advancements in academia, industry, and society

How to Cite

Hacherngsoonghom, C., Suktarachan, M., Sirikanerat, T., Tejarajanya, B., & Maluleem, L. (2025). The Effects of Lexical-Features Changes in ChatGPT-Generated Content Using N-Gram Approach. Journal of Multidisciplinary in Humanities and Social Sciences, 8(1-2), 79–98. retrieved from https://so04.tci-thaijo.org/index.php/jmhs1_s/article/view/276657

Issue

Vol. 8 No. 1-2 (2025): January - April 2025

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Views and opinions appearing in the Journal it is the responsibility of the author of the article, and does not constitute the view and responsibility of the editorial team.

References

ราชบัณฑิตยสถาน. (ม.ป.ป.). พจนานุกรม ฉบับราชบัณฑิตยสถาน พ.ศ. 2554. สืบค้นเมื่อ 28 ตุลาคม 2567, จาก https://dictionary.orst.go.th/

Aggarwal, T. (20 December 2023). Demystifying NLP: exploring lexical, syntactic, and semantic processing for powerful natural language understanding. Retrieved November 14, 2024, from https://techladder.in/article/demystifying-nlp-exploring-lexical-syntactic-and-semantic-processing-powerful-natural

Arts2Survive. (4 December 2022). NLP Unlocked: N-Grams#006. Retrieved November 14, 2024, from https://medium.com/@pankajchandravanshi/nlp-unlocked-n-grams-006-ceab1bc56bf4

Cambridge Dictionary. (n.d.). Word classes and phrase classes. Retrieved November 14, 2024, from https://dictionary.cambridge.org/grammar/british-grammar/word-classes-and-phrase-classes

DeepAI. (n.d.). N-Grams. Retrieved November 14, 2024, from https://deepai.org/machine-learning-glossary-and-terms/n-gram#:~:text=What%20are%20N%2DGrams%3F, computational%20linguistics%20and%20text%20analysis.

Dey, R. (2 October 2023). Understanding language modeling: from N-grams to transformer-based neural models. Retrieved November 14, 2024, from https://medium.com/@roshmitadey/ understanding-language-modeling-from-n-grams-to-transformer-based-neural-models-d2bdf1532c6d

Gomede, E. (19 August 2023). Exploring N-gram Models in Natural Language Processing. Retrieved November 14, 2024, from https://medium.com/@evertongomede/exploring-n-gram-models-in-natural-language-processing-bf5852b32050

GeekforGeeks. (24 May 2020). N-gram language models. Retrieved November 14, 2024, from https://medium.com/mti-technology/n-gram-language-models-70af02e742ad.

GeekforGeeks. (28 November 2021). Understanding Semantic Analysis – NLP. Retrieved November 14, 2024, from https://www.geeksforgeeks.org/understanding-semantic-analysis-nlp/

Imani, A., & Habil, H. (2014). Lexical features of academic writing. LSP International Journal, 1(1).

Jagarlapoodi, S. (19 May 2023). Lexical analysis: unveiling the language's structural foundations. Retrieved November 14, 2024, from https://www.linkedin.com/pulse/lexical-analysis-unveiling-languages-structural-jagarlapoodi/

Jain, A. (5 February 2024). N-grams in NLP. Retrieved November 14, 2024, from https://medium.com/@abhishekjainindore24/n-grams-in-nlp-a7c05c1aff12

Javapoint. (n.d.). NLP Tutorial. Retrieved November 14, 2024, from https://www.javatpoint.com/nlp

Kalotra, S. (22 November 2023). A deep dive into openai's GPT models: architectural insights. Retrieved November 14, 2024, from https://www.signitysolutions.com/tech-insights/openais-gpt-models-architectural-insights

Lang, N. (19 October 2024). What are N-grams?. Retrieved November 14, 2024, from https://databasecamp.de/en/ml/n-grams

Langeek. (n.d.). Lexical relations. Retrieved November 14, 2024, from https://langeek.co/en/grammar/course/1633/lexical-relations

Madala, S. (4 May 2023). Introduction to N-grams in NLP. Retrieved November 14, 2024, from https://www.scaler.com/topics/nlp/ngrams-in-nlp/

Martínez, G., Hernández, J. A., Conde, J., Reviriego, P. & Merino, E. (2024). Beware of words: evaluating the lexical diversity of conversational LLMs using ChatGPT as case study. ACM Transactions on Intelligent Systems and Technology. DOI:10.1145/3696459

Mljourney. (3 July 2024). Understanding N-Gram language models. Retrieved November 14, 2024, from https://mljourney.com/understanding-n-gram-language-models/

Nguyen, K. (19 May 2020). N-gram language models Part 1: Unigram model. Retrieved November 14, 2024, from https://medium.com/mti-technology/n-gram-language-model-b7c2fc322799

Nithyashree. (12 December 2024). What are N-Grams and how to implement them in python?. Retrieved November 14, 2024, from https://www.analyticsvidhya.com/blog/2021/09/what-are-n-grams-and-how-to-implement-them-in-python/

Patel, R. (27 September 2024). Evaluating content quality: a guide to verify AI-generated text. Retrieved November 14, 2024, from https://texta.ai/blog/ai-content/evaluating-content-quality-a-guide-to-verify-ai-generated-text

Ramponi, M. (23 December 2022). How ChatGPT actually works. Retrieved November 14, 2024, from https://www.assemblyai.com/blog/how-chatgpt-actually-works/

Research Graph. (19 March 2024). The journey of large language models: evolution, application, and limitations. Retrieved November 14, 2024, from https://medium.com/@researchgraph/the-journey-of-large-language-models-evolution-application-and-limitations-c72461bf3a6f

Team Experts. (2 July 2023). N-grams: AI (Brace for these Hidden GPT Dangers). Retrieved November 14, 2024, from https://predictivethought.com/n-grams-ai-brace-for-these-hidden-gpt-dangers/

Article Sidebar

Main Article Content

Abstract

Article Details

References