การศึกษาการแยกนัยความหมายของ หัว ในภาษาไทย โดยใช้วิธีการวิเคราะห์ความหมายแอบแฝง

Main Article Content

Nutcha Tirasaroj
วิโรจน์ อรุณมานะกุล

Abstract

In language, a number of words are polysemous. For humans, polysemy is not a problem in communication as the sender and the receiver are able to understand the same meaning of multi-meaning words. However, teaching a computer to know the senses of a word and choose the appropriate meaning when the word is in different contexts is still a problem. The purpose of this paper was to study word sense discrimination of /hua4/ using Latent Semantic Analysis. Contexts are the clues that help discriminating the senses in this study. The results show that contexts in a small window size tend to help discriminate senses more than those in a large window size. Furthermore, the systems using left contexts are better than those using right contexts because many word clues that help indicate the senses of a word on the left are found in only one meaning while some word clues on the right are found in several meanings. Moreover, when word pairs of the form word-space and vice versa of right contexts are much more than those of left contexts, the efficiency of the systems is decreased. The result of word sense discrimination in this study is not high (accuracy 41.63%). This could result of the use of only word forms and the small quantity of training data.

Downloads

Download data is not yet available.

Article Details

Section
บทความวิจัยและวิทยานิพนธ์

References

ราชบัณฑิตยสถาน. (2556). พจนานุกรมฉบับราชบัณฑิตยสถาน พ.ศ.2554. กรุงเทพฯ: นานมีบุ๊คส์พับลิเคชั่นส์.

วิโรจน์ อรุณมานะกุล. (2545). Thai Word Segmentation. Retrieved from https://pioneer.chula.ac.th/~awirote/resources/thai-word-segmentation.html
Agirre, E. and Edmonds, P. (2007). Introduction. In Agirre, E. and Edmonds, P. (eds.). Word Sense Disambiguation Algorithms and Application.

Clarke, D. (2007). Context-theoretic Semantics for Natural Language an Algebraic Framework (Doctor of Philosophy’s thesis). University of Sussex, Brighton.

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., and Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41(6), pp. 391-407.

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1).
Evans, V. and Tyler, A. (2003). Towards a Theory of Principled Polysemy: The Case of In. ICLC 2003.

Firth, J.R. (1957). Papers in Linguistics (1934-1951). London: Oxford University Press.

Harris, Z. (1968). Mathematical Structures of Language. New York; Krieger.

Kanokrattananukul, W. (2001). Word Sense Disambiguation in Thai Using Decision List Collocation (Master of Arts Degree Thesis, Linguistics) Chulalongkorn University, Bangkok.

Landauer, T.K., Laham, D., and Foltz, P. (1998). Learning Human-like Knowledge by Singular Value Decomposition: A Progress Report. In Report, M.I., Jordan, M.J., Kearns & S.A. Sollar (eds.). Advances in Neural Information Processing Systems 10. Cambridge: MIT Press.

Lloyd, S.P. (1982). Least squares quantization in PCM. In IEEE Transactions on Information Theory, 28 (2): 129-137.
Pongpinigpinyo, S. and Rivepiboon, W. (2005). Distributional Semantic Approach to Thai Word Sense Disambiguation, In International Journal of Computational Intelligence Vol. 2 No.3 2005.

Ravin, Y. and Leacock, C. (2006). Polysemy: An Overview. In Ravin, Y. and Leacock, C. (eds.) Polysemy: Theoretical and Computational Approaches. New York: Oxford University Press.