Application of Naïve Bayes Classification to Analyze Performance Using Stopwords
Keywords:
social media, stopwords, naïve bayesAbstract
Based on current data, there has been an increase in social media users, which shows that more and more people are using social media as a place to express themselves and their emotions. This will generate thousands of tweets within a day. The tweet data is processed so that it is useful for stakeholders who need it to help them make a decision. Because sentence structures on social media are often irregular, pre-processing is necessary to make tweet sentences normal. Stemming and Stopwords are pre-processing techniques that are widely used in sentiment analysis. In previous studies, there were indications that its use did not have a significant effect on accuracy. In this study, the authors divide it into four models: using stemming and stopwords and without using stemming and stopwords. Data using stemming gets the best results with an f1-score of 65%. These results indicate an increase in performance in the use of stemming and stopwords using Multi-class Naive Bayes
References
Akella, J. O., & Akella, L. N. Y. (2018). Sentiment Analysis Using Naïve Bayes Algorithm: With Case Study. Proceedings of the 3rd International Conference on Inventive Computation Technologies, ICICT 2018. https://doi.org/10.1109/ICICT43934.2018.9034394
Alam, S., & Yao, N. (2019). The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Computational and Mathematical Organization Theory, 25(3), 319–335. https://doi.org/10.1007/s10588-018-9266-8
Dey, L., Chakraborty, S., Biswas, A., Bose, B., & Tiwari, S. (2016). Sentiment Analysis of Review Datasets Using Naïve Bayes‘ and K-NN Classifier. International Journal of Information Engineering and Electronic Business, 8(4), 54–62. https://doi.org/10.5815/ijieeb.2016.04.07
Fitri, V. A., Andreswari, R., & Hasibuan, M. A. (2019). Sentiment analysis of social media Twitter with case of Anti-LGBT campaign in Indonesia using Naïve Bayes, decision tree, and random forest algorithm. Procedia Computer Science, 161, 765–772. https://doi.org/10.1016/j.procs.2019.11.181
Hidayatullah, A. F. (2015). The Influence of Stemming on Indonesian Tweet Sentiment Analysis. In Computer Science and Informatics. http://www.website.com
Pradana, A. W., & Hayaty, M. (2019). The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts. Kinetic: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 4(3), 375–380. https://doi.org/10.22219/kinetik.v4i4.912
Saputri, M. S., Mahendra, R., & Adriani, M. (2018). Emotion Classification on Indonesian Twitter Dataset. IEEE.
Simon Kemp. (2022, February 15). DIGITAL 2022: INDONESIA. Datareportal.Com. https://datareportal.com/reports/digital-2022-indonesia
Sudarsa, D., Kumar.P, S., & Jagajeevan Rao, L. (2018). Sentiment Analysis for Social Networks Using Machine Learning Techniques. International Journal of Engineering & Technology, 7(2.32), 473. https://doi.org/10.14419/ijet.v7i2.32.16271
Published
How to Cite
Issue
Section
Copyright (c) 2023 JISTE (Journal of Information System, Technology and Engineering)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.



