A Brief Overview of Arabic NLP
- Prof. Sakhar B. Alkhereyf
B5 L5 R5209
Arabic, being one of the six official languages of the United Nations and the sixth most commonly spoken language across the globe, holds a significant stance in the linguistic landscape. The diglossic nature of Arabic, where Classical Arabic (CA), Modern Standard Arabic (MSA), and various dialects coexist, coupled with its complex morphological structure, presents a unique set of challenges and opportunities for the domain of Natural Language Processing (NLP).
Overview
Abstract
Arabic, being one of the six official languages of the United Nations and the sixth most commonly spoken language across the globe, holds a significant stance in the linguistic landscape. The diglossic nature of Arabic, where Classical Arabic (CA), Modern Standard Arabic (MSA), and various dialects coexist, coupled with its complex morphological structure, presents a unique set of challenges and opportunities for the domain of Natural Language Processing (NLP). This talk presents the landscape of Arabic NLP, showing the core challenges and the previous work on Arabic NLP.
The talk starts with exploring the fundamental morphological and dialectal challenges that Arabic poses to NLP, presenting the varieties of dialects and their differences at each linguistic level.
A substantial segment of the discussion will be dedicated to examining language resources for Arabic NLP. The discourse will extend to a review of the challenges that Arabic presents to Large Language Models (LLMs), with a specific focus on evaluating models like ChatGPT and Bard in handling Arabic language tasks.
Furthermore, we will show our recent research at the King Abdulaziz City for Science and Technology (KACST) in contributing to the Arabic NLP field, emphasizing our efforts in corpus construction and evaluation.
Brief Biography
Dr. Sakhar Alkhereyf is an AI Consultant and expert in Arabic NLP. Dr. Alkhereyf is currently an Assistant Research Professor in the Artificial Intelligence and Robotics Institute at KACST, where he is also the head of the Human Language Technologies (HLT) research group. He completed his Ph.D. in computer science from Columbia University in 2021. His research interests mainly focus on Arabic NLP, including building language resources, text classification, conversational agents, and information extraction.io text.