A Conceptual Framework for a Semantic Hadith Retrieval System Using Modern NLP Techniques
DOI:
https://doi.org/10.46568/ihya.v25i1.204Keywords:
Hadith Retrieval, Semantic Search, Natural Language Processing, Ontology-Based Categorization, Arabic NLP, Semantic Web, Islamic ScholarshipAbstract
In the contemporary digital era, Islamic scholarship is undergoing a technological transformation, particularly in the realm of textual access and interpretation. One of the most critical corpora in Islamic studies is the Hadith literature, comprising the sayings, actions, and approvals of the Prophet Muhammad (Peace Be Upon Him). Traditional keyword-based search systems have proven inadequate for retrieving Hadith content effectively due to linguistic complexity, semantic variation, and contextual depth. This paper proposes a conceptual framework for a Semantic Hadith Retrieval System using modern Natural Language Processing (NLP) techniques. By integrating ontology-based thematic categorization, deep language models, and semantic similarity algorithms, the proposed framework aims to overcome the limitations of surface-level keyword matching. The paper begins by exploring the unique challenges posed by Hadith texts, followed by a comprehensive literature review of prior digital Hadith retrieval efforts and semantic search models. It then presents the theoretical foundations of semantic processing, outlines the ontological modeling required for religious themes, and delves into the technical components of NLP suitable for Arabic and Islamic contexts. Furthermore, a practical architectural framework is proposed, detailing the technology stack, implementation flow, and sample annotation practices. Ethical concerns, challenges of authenticity, and evaluation metrics are also discussed. This research does not involve the development of a functioning system but lays down a blueprint for future implementation. The framework is designed to guide both developers and Islamic scholars toward building more intelligent, spiritually aware, and user-friendly Hadith retrieval platforms. The integration of semantic web technologies with Islamic knowledge promises to not only modernize access to religious texts but also preserve their interpretive richness and contextual integrity.
References
Brown, J. A. (2017). Hadith: Muhammad's legacy in the medieval and modern world. Simon and Schuster.
Kamali, M. H. (2014). A textbook of Hadith studies: authenticity, compilation, classification and criticism of Hadith. Kube Publishing Ltd.
Dutton, Y. (2013). The Origins of Islamic Law: The Qur'an, the Muwatta'and Madinan Amal. Routledge.
Hallaq, W. B. (1997). A history of Islamic legal theories: An introduction to Sunni Usul al-Fiqh. Cambridge University Press.
Nur'aini, L. H. (2025). The use of digital technology in hadith studies. At Turots: Jurnal Pendidikan Islam, 12-23.
Alghamdi, M., Abushawarib, M., Ellouh, M., Ghaleb, M., & Felemban, M. (2023, December). Enhancing arabic information retrieval for question answering. In Proceedings of the 7th International Conference on Future Networks and Distributed Systems (pp. 366-371).
Alowaidi, S., Atwel, E., & Alsalka, M. A. (2024). Survey of Semantic Islamic Search Systems. International Journal on Islamic Applications in Computer Science And Technology, 12(4).
Daud, A., Ullah, M. H., Banjar, A. R., & Alshdadi, A. A. (2022). Ontological modeling and semantic search in quran. IJCSNS, 22(5), 771.
Azmi, A. M., Alkhalifah, F., Alsaeed, A., & Barnawi, Y. (2017, September). Using non-conventional search schemes to retrieve Hadiths. In The 5th international conference on Arabic language processing (CITALA’14), Oujda, Morocco. http://www. citala. org/citala2014/papers/paper_39. pdf. Accessed (Vol. 11).
Ibda, H., Sofanudin, A., Syafi, M., Soedjiwo, N. A. F., Azizah, A. S., & Arif, M. (2023). Digital learning using Maktabah Syumilah NU 1.0 software and computer application for Islamic moderation in pesantren. International Journal of Electrical and Computer Engineering, 13(3), 3530-3539.
Li, H., & Xu, J. (2014). Semantic matching in search. Foundations and Trends® in Information Retrieval, 7(5), 343-469.
Tamine, L., & Goeuriot, L. (2021). Semantic information retrieval on medical texts: Research challenges, survey, and open issues. ACM Computing Surveys (CSUR), 54(7), 1-38.
Xiong, C., Power, R., & Callan, J. (2017, April). Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web (pp. 1271-1279).
Castells, P., Fernandez, M., & Vallet, D. (2006). An adaptation of the vector-space model for ontology-based information retrieval. IEEE transactions on knowledge and data engineering, 19(2), 261-272.
Kim, W. J., Kim, D. H., & Jang, H. W. (2016). Semantic extention search for documents using the Word2vec. The Journal of the Korea Contents Association, 16(10), 687-692.
Zhang, L. (2025). Improved Web Page Categorization with Semantic-Aware Focused Crawling Using GloVe and TF-IDF. J. COMBIN. MATH. COMBIN. COMPUT, 127, 6569-6586.
Trisnawati, L., Samsudin, N. A. B., Bin Ahmad Khalid, S. K., Bin Ahmad Shaubari, E. F., & Indra, Z. (2025). An Ensemble Semantic Text Representation with Ontology and Query Expansion for Enhanced Indonesian Quranic Information Retrieval. International Journal of Advanced Computer Science & Applications, 16(1).
Al-Sanasleh, H. A., & Hammo, B. H. (2017, October). Building domain ontology: Experiences in developing the prophetic ontology form Quran and hadith. In 2017 International Conference on New Trends in Computing Sciences (ICTCS) (pp. 223-228). IEEE.
AlZahrani, F. M., & Al-Yahya, M. (2023). A transformer-based approach to authorship attribution in classical arabic texts. Applied Sciences, 13(12), 7255.
Almutrash, S., & Abudalfa, S. (2024). Comparative Study on the Efficiency of Using PaLM and CAMeLBERT for Arabic Entity Sentiment Classification.
Sellami, M., Hadrouk, R., Chelghoum, S., Badache, R., Kamel, N., & Lakhfif, A. (2024, November). Multitask Fake News Detection in Arabic Language using AraELECTRA model: COVID-19 Case Study. In 2024 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM) (pp. 1-7). IEEE.
Nazri, N. A. B. M., & Omar, A. W. B. (2025). Fine-tuning Large Language Model (BERT) for Islamic Moral Inquiry and Response. International Journal on Perceptive and Cognitive Computing, 11(1), 88-94.
Fairouz, B., Nora, T., & Nouha, A. A. (2020). An ontological model of hadith texts. International Journal of Advanced Computer Science and Applications, 11(4), 2020.
Jaafar, A. H., & Che Pa, N. (2016). Hadith commentary repository: An ontological approach.
Alkhatib, M., Monem, A. A., & Shaalan, K. (2017). A Rich Arabic WordNet Resource for Al-Hadith Al-Shareef. Procedia Computer Science, 117, 101-110.
Harrag, F. (2014). Text mining approach for knowledge extraction in Sahîh Al-Bukhari. Computers in Human Behavior, 30, 558-566.
Al-Arfaj, A., & Al-Salman, A. (2014, September). Towards ontology construction from Arabic texts-a proposed framework. In 2014 IEEE International Conference on Computer and Information Technology (pp. 737-742). IEEE.
Azmi, A. M., Al-Qabbany, A. O., & Hussain, A. (2019). Computational and natural language processing based studies of hadith literature: a survey. Artificial Intelligence Review, 52(2), 1369-1414.
Aldhlan, K. A., Zeki, A. M., & Zeki, A. M. (2010, December). Datamining and Islamic knowledge extraction: alhadith as a knowledge resource. In Proceeding of the 3rd International Conference on Information and Communication Technology for the Moslem World (ICT4M) 2010 (pp. H-21). IEEE.
Naji Al-Kabi, M., Kanaan, G., Al-Shalabi, R., Al-Sinjilawi, S. I., & Al-Mustafa, R. S. (2005). Al-Hadith text classifier. Journal of Applied Sciences, 5(3), 584-587.
Saeed, S., Yousuf, S., Khan, F., & Rajput, Q. (2022). Social network analysis of Hadith narrators. Journal of King Saud University-Computer and Information Sciences, 34(6), 3766-3774.
Azmi, A., & Badia, N. B. (2010, August). iTree-Automating the construction of the narration tree of Hadiths (Prophetic Traditions). In Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE-2010) (pp. 1-7). IEEE.
Azmi, A., & Badia, N. B. (2010, August). iTree-Automating the construction of the narration tree of Hadiths (Prophetic Traditions). In Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE-2010) (pp. 1-7). IEEE.
Baraka, R. S., & Dalloul, Y. (2014). Building Hadith ontology to support the authenticity of Isnad. International Journal on Islamic Applications in Computer Science And Technology, 2(1), 25-39.
Altammami, S., Atwell, E., & Alsalka, A. (2020). The Arabic-English parallel corpus of authentic hadith. International Journal on Islamic Applications in Computer Science And Technology, 8(2), 1-10.
Darwish, K., & Mubarak, H. (2016, May). Farasa: A new fast and accurate Arabic word segmenter. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) (pp. 1070-1074).
Darwish, K., & Mubarak, H. (2016, May). Farasa: A new fast and accurate Arabic word segmenter. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) (pp. 1070-1074).
Alfaidi, A., Alwadei, H., Alshutayri, A., & Alahdal, S. (2023). Exploring the performance of farasa and CAMeL taggers for arabic dialect tweets. Int. Arab J. Inf. Technol., 20(3), 349-356.
Noy, N. F., & McGuinness, D. L. (2001). Ontology development 101: A guide to creating your first ontology.
Alatrish, E. S. (2013). Comparison some of ontology. Journal of Management Information Systems, 8(2), 018-024.
Dahir, S., Khalifi, H., & El Qadi, A. (2019, March). Query expansion using DBpedia and WordNet. In Proceedings of the ArabWIC 6th Annual International Conference Research Track (pp. 1-6).
Mutawa, A. M., & Sruthi, S. (2025). A Comparative Evaluation of Transformers and Deep Learning Models for Arabic Meter Classification. Applied Sciences, 15(9), 4941.
Muller, B., Anastasopoulos, A., Sagot, B., & Seddah, D. (2020). When being unseen from mBERT is just the beginning: Handling new languages with multilingual language models. arXiv preprint arXiv:2010.12858.
Sibaee, S., Ahmad, S., Khurfan, I., Sabeeh, V., Bahaaulddin, A., Belhaj, H., & Alharbi, A. (2023, December). Qamosy at Arabic reverse dictionary shared task: Semi decoder architecture for reverse dictionary with SBERT encoder. In Proceedings of ArabicNLP 2023 (pp. 467-471).
Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P. E., ... & Jégou, H. (2024). The faiss library. arXiv preprint arXiv:2401.08281.
Sahih al-Bukhari 3015 https://sunnah.com/bukhari:3015
Kamran, A. B., Abro, B., & Basharat, A. (2023). SemanticHadith: An ontology-driven knowledge graph for the hadith corpus. Journal of Web Semantics, 78, 100797.
Sahih al-Bukhari 6114 https://sunnah.com/bukhari:6114
Hammouda, T., Jarrar, M., & Khalilia, M. (2024). SinaTools: Open Source Toolkit for Arabic Natural Language Processing. Procedia Computer Science, 244, 388-396.
Abo-Elghit, A. H., Hamza, T., & Al-Zoghby, A. (2022). Embedding Extraction for Arabic Text Using the AraBERT Model. Computers, Materials & Continua, 72(1).
Sahih al-Bukhari 2109 https://sunnah.com/bukhari:2109