1
0
2026-03-08 23:29:17 +00:00
2026-03-08 23:29:17 +00:00

# Awesome Arabic NLP

> قائمة منسقة لأفضل **الأدوات والمكتبات والنماذج ومجموعات البيانات** في مجال
> **معالجة اللغة العربية (Arabic Natural Language Processing)**.

هذا المستودع يهدف إلى جمع أهم الموارد الخاصة بـ **Arabic NLP** في مكان واحد، وهو **مشروع مجتمعي** يمكن لأي شخص المساهمة فيه عبر Pull Request.

---

## Contents

* Frameworks & Libraries
* Named Entity Recognition (NER)
* Part-of-Speech Tagging (POS)
* Datasets
* Pre-trained Models
* Research Papers
* Learning Resources
* Contributing

---

# Frameworks & Libraries

### Python

* **CAMeL Tools**
  https://github.com/CAMeL-Lab/camel_tools
  Toolkit متقدم لمعالجة اللغة العربية (tokenization, morphology, NER, POS)

* **PyArabic**
  https://github.com/linuxscout/pyarabic
  مكتبة للتعامل مع النصوص العربية

* **AraNLP**
  https://github.com/linuxscout/aranlp
  أدوات متعددة لمعالجة اللغة العربية

* **Tashaphyne**
  https://github.com/linuxscout/tashaphyne
  مكتبة للتجذير والتحليل الصرفي

---

# Named Entity Recognition (NER)

التعرف على الكيانات الاسمية مثل:

* Person
* Location
* Organization
* Date

### Tools

* **CAMeL Tools NER**
  https://github.com/CAMeL-Lab/camel_tools

* **Stanford Arabic NER**
  https://stanfordnlp.github.io/CoreNLP/

### Datasets

* **WikiANN Arabic**
  https://huggingface.co/datasets/wikiann

* **ARB-NER Dataset**
  https://alt.qcri.org/resources/arb-ner/

---

# Part-of-Speech Tagging (POS)

تحديد نوع الكلمة داخل الجملة.

Examples:

* NOUN
* VERB
* ADJ
* ADV
* PRON

### Tools

* **CAMeL Tools POS Tagger**
  https://github.com/CAMeL-Lab/camel_tools

* **Farasa POS Tagger**
  https://farasa.qcri.org/

* **MADAMIRA**
  https://github.com/ColumbiaNLP/madamira

* **Stanford POS Tagger**
  https://stanfordnlp.github.io/CoreNLP/

### Datasets

* **UD Arabic Treebank**
  https://universaldependencies.org/

---

# Datasets

* **SANAD Dataset**
  https://data.mendeley.com/datasets/57zpx667y9

* **Arabic Poetry Dataset**
  https://github.com/linuxscout/arabicpoetry

* **ArSAS Sentiment Dataset**
  https://homepages.inf.ed.ac.uk/wmagdy/ArSAS.htm

* **Arabic SQuAD**
  https://github.com/ppaudel/arabic-squad

---

# Pre-trained Models

* **AraBERT**
  https://huggingface.co/aubmindlab/bert-base-arabert

* **AraGPT2**
  https://huggingface.co/aubmindlab/aragpt2-base

* **AraELECTRA**
  https://huggingface.co/aubmindlab/araelectra-base

* **CAMeL BERT**
  https://huggingface.co/CAMeL-Lab

---

# Research Papers

* AraBERT: Transformer-based Model for Arabic NLP
  https://arxiv.org/abs/2003.00104

* CAMeL Tools: An Open Source Toolkit for Arabic NLP
  https://aclanthology.org/2020.lrec-1.868

* Farasa: A Fast and Accurate Arabic NLP Toolkit
  https://aclanthology.org/L16-1170

---

# Learning Resources

* Natural Language Processing for Arabic (Book)

* Arabic Computational Linguistics

* NLP with Python

---

# Contributing

المشروع مفتوح للمجتمع ونرحب بإضافة أدوات أو مصادر جديدة.

### Steps

1. Fork the repository
2. Add your resource
3. Follow the existing format
4. Submit a Pull Request

Example:

```
- **Tool Name**
  https://github.com/example/project
  Short description
```

---

الوصف
لا يوجد وصف
اقرأني 47 KiB