الملفات
ArabicNLPResources/Readme
2026-03-08 23:29:17 +00:00

181 أسطر
3.4 KiB
Plaintext

# Awesome Arabic NLP
> قائمة منسقة لأفضل **الأدوات والمكتبات والنماذج ومجموعات البيانات** في مجال
> **معالجة اللغة العربية (Arabic Natural Language Processing)**.
هذا المستودع يهدف إلى جمع أهم الموارد الخاصة بـ **Arabic NLP** في مكان واحد، وهو **مشروع مجتمعي** يمكن لأي شخص المساهمة فيه عبر Pull Request.
---
## Contents
* Frameworks & Libraries
* Named Entity Recognition (NER)
* Part-of-Speech Tagging (POS)
* Datasets
* Pre-trained Models
* Research Papers
* Learning Resources
* Contributing
---
# Frameworks & Libraries
### Python
* **CAMeL Tools**
https://github.com/CAMeL-Lab/camel_tools
Toolkit متقدم لمعالجة اللغة العربية (tokenization, morphology, NER, POS)
* **PyArabic**
https://github.com/linuxscout/pyarabic
مكتبة للتعامل مع النصوص العربية
* **AraNLP**
https://github.com/linuxscout/aranlp
أدوات متعددة لمعالجة اللغة العربية
* **Tashaphyne**
https://github.com/linuxscout/tashaphyne
مكتبة للتجذير والتحليل الصرفي
---
# Named Entity Recognition (NER)
التعرف على الكيانات الاسمية مثل:
* Person
* Location
* Organization
* Date
### Tools
* **CAMeL Tools NER**
https://github.com/CAMeL-Lab/camel_tools
* **Stanford Arabic NER**
https://stanfordnlp.github.io/CoreNLP/
### Datasets
* **WikiANN Arabic**
https://huggingface.co/datasets/wikiann
* **ARB-NER Dataset**
https://alt.qcri.org/resources/arb-ner/
---
# Part-of-Speech Tagging (POS)
تحديد نوع الكلمة داخل الجملة.
Examples:
* NOUN
* VERB
* ADJ
* ADV
* PRON
### Tools
* **CAMeL Tools POS Tagger**
https://github.com/CAMeL-Lab/camel_tools
* **Farasa POS Tagger**
https://farasa.qcri.org/
* **MADAMIRA**
https://github.com/ColumbiaNLP/madamira
* **Stanford POS Tagger**
https://stanfordnlp.github.io/CoreNLP/
### Datasets
* **UD Arabic Treebank**
https://universaldependencies.org/
---
# Datasets
* **SANAD Dataset**
https://data.mendeley.com/datasets/57zpx667y9
* **Arabic Poetry Dataset**
https://github.com/linuxscout/arabicpoetry
* **ArSAS Sentiment Dataset**
https://homepages.inf.ed.ac.uk/wmagdy/ArSAS.htm
* **Arabic SQuAD**
https://github.com/ppaudel/arabic-squad
---
# Pre-trained Models
* **AraBERT**
https://huggingface.co/aubmindlab/bert-base-arabert
* **AraGPT2**
https://huggingface.co/aubmindlab/aragpt2-base
* **AraELECTRA**
https://huggingface.co/aubmindlab/araelectra-base
* **CAMeL BERT**
https://huggingface.co/CAMeL-Lab
---
# Research Papers
* AraBERT: Transformer-based Model for Arabic NLP
https://arxiv.org/abs/2003.00104
* CAMeL Tools: An Open Source Toolkit for Arabic NLP
https://aclanthology.org/2020.lrec-1.868
* Farasa: A Fast and Accurate Arabic NLP Toolkit
https://aclanthology.org/L16-1170
---
# Learning Resources
* Natural Language Processing for Arabic (Book)
* Arabic Computational Linguistics
* NLP with Python
---
# Contributing
المشروع مفتوح للمجتمع ونرحب بإضافة أدوات أو مصادر جديدة.
### Steps
1. Fork the repository
2. Add your resource
3. Follow the existing format
4. Submit a Pull Request
Example:
```
- **Tool Name**
https://github.com/example/project
Short description
```
---