Application of Naïve Bayes Classification to Analyze Performance Using Stopwords

https://doi.org/10.61487/jiste.v1i2.15

Authors

  • Jefriyanto Jefriyanto Universitas Negeri Padang
  • Nur Ainun Universitas Serambi Mekkah
  • Muchamad Arif Al Ardha Universitas Negeri Surabaya

Keywords:

social media, stopwords, naïve bayes

Abstract

Based on current data, there has been an increase in social media users, which shows that more and more people are using social media as a place to express themselves and their emotions. This will generate thousands of tweets within a day. The tweet data is processed so that it is useful for stakeholders who need it to help them make a decision. Because sentence structures on social media are often irregular, pre-processing is necessary to make tweet sentences normal. Stemming and Stopwords are pre-processing techniques that are widely used in sentiment analysis. In previous studies, there were indications that its use did not have a significant effect on accuracy. In this study, the authors divide it into four models: using stemming and stopwords and without using stemming and stopwords. Data using stemming gets the best results with an f1-score of 65%. These results indicate an increase in performance in the use of stemming and stopwords using Multi-class Naive Bayes

References

Akella, J. O., & Akella, L. N. Y. (2018). Sentiment Analysis Using Naïve Bayes Algorithm: With Case Study. Proceedings of the 3rd International Conference on Inventive Computation Technologies, ICICT 2018. https://doi.org/10.1109/ICICT43934.2018.9034394

Alam, S., & Yao, N. (2019). The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Computational and Mathematical Organization Theory, 25(3), 319–335. https://doi.org/10.1007/s10588-018-9266-8

Dey, L., Chakraborty, S., Biswas, A., Bose, B., & Tiwari, S. (2016). Sentiment Analysis of Review Datasets Using Naïve Bayes‘ and K-NN Classifier. International Journal of Information Engineering and Electronic Business, 8(4), 54–62. https://doi.org/10.5815/ijieeb.2016.04.07

Fitri, V. A., Andreswari, R., & Hasibuan, M. A. (2019). Sentiment analysis of social media Twitter with case of Anti-LGBT campaign in Indonesia using Naïve Bayes, decision tree, and random forest algorithm. Procedia Computer Science, 161, 765–772. https://doi.org/10.1016/j.procs.2019.11.181

Hidayatullah, A. F. (2015). The Influence of Stemming on Indonesian Tweet Sentiment Analysis. In Computer Science and Informatics. http://www.website.com

Pradana, A. W., & Hayaty, M. (2019). The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts. Kinetic: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 4(3), 375–380. https://doi.org/10.22219/kinetik.v4i4.912

Saputri, M. S., Mahendra, R., & Adriani, M. (2018). Emotion Classification on Indonesian Twitter Dataset. IEEE.

Simon Kemp. (2022, February 15). DIGITAL 2022: INDONESIA. Datareportal.Com. https://datareportal.com/reports/digital-2022-indonesia

Sudarsa, D., Kumar.P, S., & Jagajeevan Rao, L. (2018). Sentiment Analysis for Social Networks Using Machine Learning Techniques. International Journal of Engineering & Technology, 7(2.32), 473. https://doi.org/10.14419/ijet.v7i2.32.16271

Published

2023-06-15

How to Cite

Jefriyanto, J., Ainun, N., & Ardha, M. A. A. (2023). Application of Naïve Bayes Classification to Analyze Performance Using Stopwords. Journal of Information System, Technology and Engineering, 1(2), 49–53. https://doi.org/10.61487/jiste.v1i2.15

Issue

Section

Articles