A Corpus-Based Analysis of Lexical Characteristics Across English News Categories for L2 Pedagogical Use
Main Article Content
Abstract
News articles are widely regarded as valuable resources for vocabulary acquisition. However, they encompass diverse categories, each catering to specific learner needs. This study analysed the vocabulary of 3,000 news articles across 12 categories, focusing on lexical profiling, lexical level, variation, density, and CEFR level to support L2 learners. The results showed that the Health category had the highest General Service List word coverage (81.01%), while Technology featured the most Academic Word List terms (8.23%). Fashion contained the largest proportion of specialised vocabulary (18.31%) and exhibited the highest lexical variation (51.29%). High-frequency words dominated all categories (91–94.79%), while Fashion included the most mid-frequency (5.84%) and low-frequency (2.36%) words. Lexical density was highest in the Environment category (57.85%) and lowest in Sports (53.2%). The CEFR analysis indicated that A1 and A2 words comprised the majority (76.66% and 10.44%, respectively), while categories such as Fashion and Nutrition included the highest proportions of C1-C2 words (6.32% and 6.53%, respectively). These findings suggest that categories such as Health and Sports are suitable for beginner learners, while Fashion and Nutrition offer more complex vocabulary for advanced learners. This study highlights the unique lexical characteristics of news categories, providing educators and learners with guidance on selecting authentic materials to enhance vocabulary learning.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms: Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).References
Abyaad, R., Kabir, M. R., & Hasan, S. (2020). A novel approach to categorize news articles from headlines and short text. Proceedings of the 2020 IEEE Region 10 Symposium (TENSYMP), Bangladesh, 2020, 162–165. https://doi.org/10.1109/TENSYMP50017.2020.9230675
Anthony, L. (2024). AntWordProfiler (Version 2.2.1) [Computer Software]. Tokyo, Japan: Waseda University. https://www.laurenceanthony.net/software/AntWordProfiler
Astika, G. (2015). Profiling the vocabulary of news texts as capacity building for language teachers. Indonesian Journal of Applied Linguistics, 4(2), 123–134. https://doi.org/10.17509/ijal.v4i2.689
August, D., Carlo, M., Dressler, C., & Snow, C. (2005). The critical role of vocabulary development for English language learners. Learning Disabilities Research & Practice, 20(1), 50–57. https://doi.org/10.1111/j.1540-5826.2005.00120.x
Baker, P. (2006). Using corpora in discourse analysis. Bloomsbury Publishing.
Baranowska, K. (2020). Learning most with least effort: subtitles and cognitive load. ELT Journal, 74(2), 105–115. https://doi.org/10.1093/elt/ccz060
Bates, E., Bretherton, I., & Snyder, L. S. (1988). From first words to grammar: Individual differences and dissociable mechanisms. Cambridge University Press.
Benigno, V., & de Jong, J. (2019). Linking vocabulary to the CEFR and the Global Scale of English: A psychometric model. In A. Huhta, G. Erickson, & N. Figueras (Eds.), Development in language education: A memorial volume in honour of Sauli Takala (pp. 8–29). Jyväskylä University Printing House.
Bleyer, W. G. (1916). Types of news writing. Houghton Mifflin.
Chung, M. (2009). The newspaper word list: A specialised vocabulary for reading newspapers. JALT journal, 31(2), 159–182. https://doi.org/10.37546/JALTJJ31.2-2
Cobb, T. (2022). Vocabprofile. [Computer program]. http://www.lextutor.ca/vp/
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238. https://doi.org/10.2307/3587951
Coxhead, A. (2018). Vocabulary and English for Specific Purposes research: Quantitative and qualitative perspectives. Routledge. https://doi.org/10.4324/9781315146478
Coxhead, A., & Byrd, P. (2007). Preparing Writing Teachers to Teach the Vocabulary and Grammar of Academic Prose. Journal of Second Language Writing, 16(3), 129–147. https://doi.org/10.1016/j.jslw.2007.07.002
Coxhead, A., & Demecheleer, M. (2018). Investigating the technical vocabulary of Plumbing. English for Specific Purposes, 51, 84–97. https://doi.org/10.1016/j.esp.2018.03.006
Coxhead, A. & Hirsch, D. (2007). A pilot science word list for EAP. Revue Française de linguistique appliqueé, 12(2), 65–78. https://doi.org/10.3917/rfla.122.0065
Coxhead, A., & Walls, R. (2012). Ted talks, vocabulary, and listening for EAP. TESOLANZ Journal, 20(1), 55–67.
Council of Europe. (2020). Common European Framework of Reference for Languages: Learning, teaching, assessment–Companion volume. Strasbourg.
Coyle, Y., & Gracia, R. G. (2014). Using songs to enhance L2 vocabulary acquisition in preschool children. ELT Journal, 68(3), 276–285. https://doi.org/10.1093/elt/ccu015
Dang, T.N.Y., & Long, X. (2023). Online news as a resource for incidental learning of core academic words, academic formulas, and general formulas. TESOL Quarterly, 58(1), 32–62. https://doi.org/10.1002/tesq.3208
Dang, T. N. Y., & Lu, C. (2024). Learning academic vocabulary through reading online news. International Review of Applied Linguistics in Language Teaching, 1–21. https://doi.org/10.1515/iral-2023-0206
Dang, T. N. Y., & Webb, S. (2014). The lexical profile of academic spoken English. English for Specific Purposes, 33, 66–76. https://doi.org/10.1016/j.esp.2013.08.001
Davis, G. M. (2017). Songs in the young learner classroom: a critical review of evidence. ELT Journal, 71(4), 445–455. https://doi.org/10.1093/elt/ccw097
Davis, B. H., & Brewer, J., & Brewer, J. P. (1997). Electronic discourse: Linguistic individuals in virtual space. Suny Press.
Graves, K. (2008). The language curriculum: A social contextual perspective. Language Teaching, 41(2), 147–181. https://doi.org/10.1017/S0261444807004867
Ha, H. T. (2022a). Vocabulary demands of informal spoken English revisited: What does it take to understand movies, TV programs, and soap operas? Frontiers in Psychology, 13, Article 831684. https://doi.org/10.3389/fpsyg.2022.831684
Ha, H. T. (2022b). Lexical profile of newspapers revisited: A corpus-based analysis. Frontiers in Psychology, 13, Article 800983. https://doi.org/10.3389/fpsyg.2022.800983
Hsu, W. (2018). Voice of America News as voluminous reading material for mid-frequency vocabulary learning. RELC Journal, 50(3), 408–421. https://doi.org/10.1177/0033688218764460
Hu, M., & Nation, I. S. P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13(1), 403–430.
Hulstijn, J. H., Charles, A. J., & Schoonen, R. (2010). Developmental stages in second‑language acquisition and levels of second‑language proficiency: Are there links between them? In I. Bartning, M. Martin, & I. Veddar (Eds.), Communicative Proficiency and Linguistic Development: Intersections between SLA and Language Testing Research (pp. 11–20). Eurosla.
Indarti, D. (2017). Lexical richness of the Jakarta Post opinion articles: Comparison between native and non-native writers. Wanastra, 4(2), 138–142. https://doi.org/10.31294/w.v9i2.2550
Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: A developmental perspective. Working papers, 53, 61–79.
Kaspar, K., & Fuchs, L. A. M. (2021). Who likes what kind of news? The relationship between characteristics of media consumers and news interest. SAGE Open, 11(1), 1–12. https://doi.org/10.1177/21582440211003089
Kembaren, F. R., & Aswani, A. N. (2022). Exploring lexical density in the New York Times. Journal of English Language, Literature, and Teaching, 7(2), 110–119. https://doi.org/10.32528/ellite.v7i2.8795
Kyongho, H., & Nation, I. S. P. (1989). Reducing the Vocabulary Load and Encouraging Vocabulary Learning through Reading Newspapers. Reading in a Foreign Language, 6, 323–335.
Laosrirattanachai, P., & Laosrirattanachai, P. (2023). Analysis of vocabulary use and move structures of the World Health Organization Emergencies press conferences on Coronavirus Disease: A corpus–based investigation. LEARN Journal: Language Education and Acquisition Research Network, 16(1), 121–146. https://so04.tci-thaijo.org/index.php/LEARN/article/view/263436
Laosrirattanachai, P., & Laosrirattanachai, P. (2025a). Unveiling the distinction of near synonymy: A corpus-based analysis on attempt, endeavor, strive, and try. PASAA, 70, 132–163.
Laosrirattanachai, P., & Laosrirattanachai, P. (2025b). Tracing tourism business research trends in Scopus-indexed journals using corpus-based and judgement-based approaches. Humanities, Arts and Social Sciences Studies, 25(1), 32–53. https://doi.org/10.69598/hasss.25.1.268122
Laufer, B., & Nation, I. S. P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16(3), 307–322. https://doi.org/10.1093/applin/16.3.307
Li, Z., Li, J. Z., Zhang, X., & Reynolds, B. L. (2024). Mastery of listening and reading vocabulary levels in relation to CEFR: Insights into student admissions and English as a medium of instruction. Languages, 9(7), 239. https://doi.org/10.3390/languages9070239
Liu, C. Y. (2021). Examining the implementation of academic vocabulary, lexical density, and speech rate features on OpenCourseWare and MOOC lectures. Interactive Learning Environments, 31(8), 4924–4939. https://doi.org/10.1080/10494820.2021.1987274
Madarbakus-Ring, N., & Benson, S. (2024). TED Talks and the textbook: An in-depth lexical analysis. Languages, 9(10), 309. https://doi.org/10.3390/languages9100309
Meebangsai, D., Pongtin, P., Kitipoontanakorn, P., & Laosrirattanachai, P. (2023). Investigating proficiency of academic English in student writing: A comparative case study on vocabulary utilization in student research article writing vis–à–vis national and international research. PASAA, 67, 66–100. https://doi.org/10.58837/CHULA.PASAA.67.1.3
Moore, T., Morton, J., Hall, D., & Wallis, C. (2015). Literacy practices in the professional workplace: implications for the IELTS reading and writing tests. IELTS Research Reports Online Series, 46. https://ielts.org/researchers/our-research/research-reports/literacy-practices-in-the-professional-workplace-implications-for-the-ielts-reading-and-writing-tests
Na Ayutthaya, J. A., Kunthonjinda, K., Somwang, K., & Laosrirattanachai, P. (2022). Making beverage service word list for English for Specific Purposes classroom. rEFLections, 29(2), 325–343. https://doi.org/10.61508/refl.v29i2.259524
Nation, I. S. P. (2006). How large a vocabulary is needed to reading and listening? The Canadian Modern Language Review, 63(1), 59–82. https://doi.org/10.3138/cmlr.63.1.59
Nation, I. S. P. (2016). Making and using word lists for language learning and testing. John Benjamins. https://doi.org/10.1075/z.208
Nation, I. S. P. (2017). The BNC/COCA Level 6 word family lists (Version 1.0.0) [Data file]. http://www.victoria.ac.nz/lalsstaff/paul-nation.aspxl
Nation, I. S. P. (2018, April, 10). Resources. https://www.wgtn.ac.nz/lals/resources.
Nation, I. S. P. (2022). Learning vocabulary in another language (3rd ed.). Cambridge University Press. https://doi.org/10.1017/9781009093873
Nation, I. S. P., & Crabbe, D. (1991). A survival language learning syllabus for foreign travel. System, 19(3), 191–201. https://doi.org/10.1016/0346-251X(91)90044-P
Nation, P., & Waring, R. (1997). Vocabulary size, text coverage and word lists. In N. Schmitt & M. McCarthy (Eds.), Vocabulary, description, acquisition and pedagogy (pp. 6–19). Cambridge University Press.
Nasseri, M., & Thompson, P. (2021). Lexical density and diversity in dissertation abstracts: Revisiting English L1 vs. L2 text differences. Assessing Writing, 47, Article 100511. https://doi.org/10.1016/j.asw.2020.100511
Reynolds, B. L., Xie, X., & Pham, Q. H. P. (2022). Incidental vocabulary acquisition from listening to English teacher education lectures: A case study from Macau higher education. Frontiers in Psychology, 13, 1–18. https://doi.org/10.3389/fpsyg.2022.993445
Sayer, P., & Ban, R. (2014). Young EFL students’ engagements with English outside the classroom. ELT Journal, 68(3), 321–329. https://doi.org/10.1093/elt/ccu013
Schmitt, N. (2008). Instructed second language vocabulary learning. Language Teaching Research, 12(3), 329–363. https://doi.org/10.1177/1362168808089921
Schmitt, N., Jiang, X., & Grabe, W. (2011). The percentage of words known in a text and reading comprehension. Modern Language Journal, 95(1), 26–43. https://doi.org/10.1111/j.1540-4781.2011.01146.x
Schmitt, N., & Schmitt, D. (2014). A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching, 47(4), 484–503. https://doi.org/10.1017/S0261444812000018
Stubbs, M. (1986). Lexical density: A computational technique and some findings. In M. Coulthard (Ed.), Talking about text (pp. 27–48). University of Birmingham.
Tegge, F. (2018). Pop songs in the classroom: time-filler or teaching tool? ELT Journal, 72(3), 274–284. https://doi.org/10.1093/elt/ccx071
Teng, F. (2015). EFL vocabulary learning through reading BBC news: An analysis based on the Involvement Load Hypothesis. English as a Global Language Education (EaGLE) Journal, 1(2), 63–90. https://doi.org/10.6294/EaGLE.2015.0102.03
Thornbury, S., & Slade, D. (2006). Conversation: from Description to Pedagogy. Cambridge University Press.
Treffers-Daller, J., Parslow, P., & Williams, S. (2018). Back to basics: How measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics, 39(3), 302–327. https://doi.org/10.1093/applin/amw009
Ure, J. (1971). Lexical density and category differentiation. ln G. E. Perren & J. L. M. Trim (Eds.), Applications of linguistics (pp. 443–452). Cambridge University Press.
van Zeeland, H., & Schmitt, N. (2013). Lexical coverage in L1 and L2 listening comprehension: The same or different from reading comprehension? Applied Linguistics, 34(4), 457–479. https://doi.org/10.1093/applin/ams074
Vuković-Stamatović, M., & Čarapić, D. (2024). Vocabulary profile, lexical density and speech rate in science podcasts: How appropriate are science podcasts for EAP and EST listening? Ibérica, 47, 201–226. https://doi.org/10.17398/2340-2784.47.201
Wingrove, P. (2017). How suitable are TED talks for academic listening? Journal of English for Academic Purposes, 30, 79–95. https://doi.org/10.1016/j.jeap.2017.10.010
West, M. (1953). A general service list of English words. Longman.