Unsupervised text feature selection approach based on improved Prairie dog algorithm for the text clustering
Mohammad Alshinwan ;
Abdul Ghafoor Memon ;
Mohamed Chahine Ghanem ;
Mohammed Almaayah
Published: 2025/03/25
Abstract
Text clustering is suitable for dividing many text documents into distinct groups. The size of the documents has an impact on the performance of text clustering, reducing its effectiveness. Text documents often include sparse and uninformative characteristics, which can negatively impact the efficiency of the text clustering technique and increase the computational time required. Feature selection is a crucial strategy in unsupervised learning that involves choosing a subset of informative text features to enhance the efficiency of text clustering and decrease computing time. This work presents a novel approach based on an improved Prairie dog algorithm to solve the feature selection problem. K-means clustering is employed to assess the efficacy of the acquired subgroups of features. The proposed algorithm is being compared to other algorithms published in the literature. The feature selection strategy ultimately promotes the clustering algorithm to get precise clusters.
Keywords
Unsupervised text feature selection approach based on improved Prairie dog algorithm for the text clustering is licensed under CC BY 4.0
References
- L. Abualigah et al., “Improved prairie dog optimization algorithm by dwarf mongoose optimization algorithm for optimization problems,” Multimed. Tools Appl., pp. 1–41, 2023.
- F. Zabihi and B. Nasiri, “A novel history-driven artificial bee colony algorithm for data clustering,” Appl. Soft Comput., vol. 71, pp. 226–241, 2018.
- M. Hijjawi et al., “Accelerated Arithmetic Optimization Algorithm by Cuckoo Search for Solving Engineering Design Problems,” Processes, vol. 11, no. 5, p. 1380, 2023.
- O. Maâtouk, W. Ayadi, H. Bouziri, and B. Duval, “Evolutionary local search algorithm for the biclustering of gene expression data based on biological knowledge,” Appl. Soft Comput., vol. 104, p. 107177, 2021.
- M. S. Daoud et al., “Recent advances of chimp optimization algorithm: Variants and applications,” J. Bionic Eng., vol. 20, no. 6, pp. 2840–2862, 2023.
- S. Xu and J. Zhang, “A parallel hybrid web document clustering algorithm and its performance study,” J. Supercomput., vol. 30, pp. 117–131, 2004.
- H. N. Alshaer, M. A. Otair, L. Abualigah, M. Alshinwan, and A. M. Khasawneh, “Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application,” Multimed. Tools Appl., vol. 80, no. 7, pp. 10373–10390, 2021.
- A.-D. Li, B. Xue, and M. Zhang, “Improved binary particle swarm optimization for feature selection with new initialization and search space reduction strategies,” Appl. Soft Comput., vol. 106, p. 107302, 2021.
- M. Safaldin, M. Otair, and L. Abualigah, “Improved binary gray wolf optimizer and SVM for intrusion detection system in wireless sensor networks,” J. Ambient Intell. Humaniz. Comput., vol. 12, pp. 1559–1576, 2021.
- M. Ragab, “Hybrid firefly particle swarm optimization algorithm for feature selection problems,” Expert Syst., 2023.
- L. Abualigah et al., “Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering. Electronics 2021, 10, 101.” s Note: MDPI stays neutral with regard to jurisdictional claims in~…, 2021.
- N. Oikonomakou and M. Vazirgiannis, “A review of web document clustering approaches,” Data Min. Knowl. Discov. Handb., pp. 931–948, 2010.
- H.-G. Li, G.-Q. Wu, X.-G. Hu, J. Zhang, L. Li, and X. Wu, “K-means clustering with bagging and mapreduce,” in 2011 44th Hawaii International Conference on System Sciences, 2011, pp. 1–8.
- P. Dhal and C. Azad, “A comprehensive survey on feature selection in the various fields of machine learning,” Appl. Intell., pp. 1–39, 2022.
- L. Abualigah, K. H. Almotairi, M. Abd Elaziz, M. Shehab, and M. Altalhi, “Enhanced Flow Direction Arithmetic Optimization Algorithm for mathematical optimization problems with applications of data clustering,” Eng. Anal. Bound. Elem., vol. 138, pp. 13–29, 2022.
- I. B. Aydilek, “A hybrid firefly and particle swarm optimization algorithm for computationally expensive numerical problems,” Appl. Soft Comput., vol. 66, pp. 232–249, 2018.
- M. Q. Bashabsheh, L. Abualigah, and M. Alshinwan, “Big data analysis using hybrid meta-heuristic optimization algorithm and MapReduce framework,” in Integrating meta-heuristics and machine learning for real-world optimization problems, Springer, 2022, pp. 181–223.
- L. Abualigah et al., “Meta-heuristic optimization algorithms for solving real-world mechanical engineering design problems: a comprehensive survey, applications, comparative analysis, and results,” Neural Comput. Appl., pp. 1–30, 2022.
- L. Abualigah, M. Abd Elaziz, P. Sumari, Z. W. Geem, and A. H. Gandomi, “Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer,” Expert Syst. Appl., p. 116158, 2021.
- O. N. Oyelade, A. E.-S. Ezugwu, T. I. A. Mohamed, and L. Abualigah, “Ebola optimization search algorithm: A new nature-inspired metaheuristic optimization algorithm,” IEEE Access, vol. 10, pp. 16150–16177, 2022.
- L. Abualigah, A. Diabat, S. Mirjalili, M. Abd Elaziz, and A. H. Gandomi, “The arithmetic optimization algorithm,” Comput. Methods Appl. Mech. Eng., vol. 376, p. 113609, 2021.
- L. Abualigah, M. Shehab, M. Alshinwan, S. Mirjalili, and M. Abd Elaziz, “Ant lion optimizer: a comprehensive survey of its variants and applications,” Arch. Comput. Methods Eng., vol. 28, no. 3, pp. 1397–1416, 2021.
- S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey wolf optimizer,” Adv. Eng. Softw., vol. 69, pp. 46–61, 2014.
- L. M. Abualigah and A. T. Khader, “Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering,” J. Supercomput., vol. 73, pp. 4773–4795, 2017.
- M. Mafarja et al., “Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems,” Knowledge-Based Syst., vol. 145, pp. 25–45, 2018.
- A. A. Heidari, S. Mirjalili, H. Faris, I. Aljarah, M. Mafarja, and H. Chen, “Harris hawks optimization: Algorithm and applications,” Futur. Gener. Comput. Syst., vol. 97, pp. 849–872, 2019.
- H. Chantar, M. Mafarja, H. Alsawalqah, A. A. Heidari, I. Aljarah, and H. Faris, “Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification,” Neural Comput. Appl., vol. 32, pp. 12201–12220, 2020.
- A. E. Ezugwu, J. O. Agushaka, L. Abualigah, S. Mirjalili, and A. H. Gandomi, “Prairie dog optimization algorithm,” Neural Comput. Appl., vol. 34, no. 22, pp. 20017–20065, 2022.
- R. Purushothaman, S. P. Rajagopalan, and G. Dhandapani, “Hybridizing Gray Wolf Optimization (GWO) with Grasshopper Optimization Algorithm (GOA) for text feature selection and clustering,” Appl. Soft Comput., vol. 96, p. 106651, 2020.
- L. Zhang et al., “Varying the counter ion changes the kinetics, but not the final structure of colloidal gels,” J. Colloid Interface Sci., vol. 463, pp. 137–144, 2016.
- S. Mirjalili and A. Lewis, “The whale optimization algorithm,” Adv. Eng. Softw., vol. 95, pp. 51–67, 2016.
- S. Li, H. Chen, M. Wang, A. A. Heidari, and S. Mirjalili, “Slime mould algorithm: A new method for stochastic optimization,” Futur. Gener. Comput. Syst., vol. 111, pp. 300–323, 2020.