0f873adc65343638c8f4417fc3460f9936328a8d
# Awesome Arabic NLP > قائمة منسقة لأفضل **الأدوات والمكتبات والنماذج ومجموعات البيانات** في مجال > **معالجة اللغة العربية (Arabic Natural Language Processing)**. هذا المستودع يهدف إلى جمع أهم الموارد الخاصة بـ **Arabic NLP** في مكان واحد، وهو **مشروع مجتمعي** يمكن لأي شخص المساهمة فيه عبر Pull Request. --- ## Contents * Frameworks & Libraries * Named Entity Recognition (NER) * Part-of-Speech Tagging (POS) * Datasets * Pre-trained Models * Research Papers * Learning Resources * Contributing --- # Frameworks & Libraries ### Python * **CAMeL Tools** https://github.com/CAMeL-Lab/camel_tools Toolkit متقدم لمعالجة اللغة العربية (tokenization, morphology, NER, POS) * **PyArabic** https://github.com/linuxscout/pyarabic مكتبة للتعامل مع النصوص العربية * **AraNLP** https://github.com/linuxscout/aranlp أدوات متعددة لمعالجة اللغة العربية * **Tashaphyne** https://github.com/linuxscout/tashaphyne مكتبة للتجذير والتحليل الصرفي --- # Named Entity Recognition (NER) التعرف على الكيانات الاسمية مثل: * Person * Location * Organization * Date ### Tools * **CAMeL Tools NER** https://github.com/CAMeL-Lab/camel_tools * **Stanford Arabic NER** https://stanfordnlp.github.io/CoreNLP/ ### Datasets * **WikiANN Arabic** https://huggingface.co/datasets/wikiann * **ARB-NER Dataset** https://alt.qcri.org/resources/arb-ner/ --- # Part-of-Speech Tagging (POS) تحديد نوع الكلمة داخل الجملة. Examples: * NOUN * VERB * ADJ * ADV * PRON ### Tools * **CAMeL Tools POS Tagger** https://github.com/CAMeL-Lab/camel_tools * **Farasa POS Tagger** https://farasa.qcri.org/ * **MADAMIRA** https://github.com/ColumbiaNLP/madamira * **Stanford POS Tagger** https://stanfordnlp.github.io/CoreNLP/ ### Datasets * **UD Arabic Treebank** https://universaldependencies.org/ --- # Datasets * **SANAD Dataset** https://data.mendeley.com/datasets/57zpx667y9 * **Arabic Poetry Dataset** https://github.com/linuxscout/arabicpoetry * **ArSAS Sentiment Dataset** https://homepages.inf.ed.ac.uk/wmagdy/ArSAS.htm * **Arabic SQuAD** https://github.com/ppaudel/arabic-squad --- # Pre-trained Models * **AraBERT** https://huggingface.co/aubmindlab/bert-base-arabert * **AraGPT2** https://huggingface.co/aubmindlab/aragpt2-base * **AraELECTRA** https://huggingface.co/aubmindlab/araelectra-base * **CAMeL BERT** https://huggingface.co/CAMeL-Lab --- # Research Papers * AraBERT: Transformer-based Model for Arabic NLP https://arxiv.org/abs/2003.00104 * CAMeL Tools: An Open Source Toolkit for Arabic NLP https://aclanthology.org/2020.lrec-1.868 * Farasa: A Fast and Accurate Arabic NLP Toolkit https://aclanthology.org/L16-1170 --- # Learning Resources * Natural Language Processing for Arabic (Book) * Arabic Computational Linguistics * NLP with Python --- # Contributing المشروع مفتوح للمجتمع ونرحب بإضافة أدوات أو مصادر جديدة. ### Steps 1. Fork the repository 2. Add your resource 3. Follow the existing format 4. Submit a Pull Request ---
الوصف
اللغات
Python
100%