Comparison of Traditional and Ensemble Machine Learning in Classifying Hate Speech Sentences


  • Ridwan Duan
  • Riyan Latifahul Hasanah Universitas Nusa Mandiri, Jakarta
  • Eni Heni Hermaliani Universitas Nusa Mandiri, Jakarta



hate speech, Logistic Regression, ensemble, Adaptive Boosting


Sentences of Hate Speech are criminal acts that are expressed to individuals or groups in the form of insults, slander, or insults related to race, religion, culture, etc. Hate Speech is often conveyed through social media such as Twitter. To help overcome the spread of hate speech, this study aims to analyze the categorization of hate speech sentences using machine learning. To achieve this goal, pre-processing stages are needed, namely removing punctuations, lowercase, tokenizing, filtering, and stemming. The dataset has an unbalanced data distribution, so the SMOTE (Synthetic Minority Over-sampling Technique) method is very suitable to use, followed by applying the features engineering model, namely TF-IDF (Term Frequency-Inverse Document Frequency) and using the Logistic Regression algorithm, Decision Tree, and Naïve Bayes, then developed machine learning algorithms using ensemble methods, namely Adaptive Boosting (AdaBoost) and Random Forest. The Logistic Regression Algorithm gets the best accuracy value of 91.40 and can outperform other algorithms


