Episode

Fake News Classification in Urdu: A Domain Adaptation Approach for a Low-Resource Language

Muhammad Zain Ali,Bernhard Pfahringer,Tony Smith

Dec 28, 2025•7:58

Computation and Language

No ratings yet

Abstract

Misinformation on social media is a widely acknowledged issue, and researchers worldwide are actively engaged in its detection. However, low-resource languages such as Urdu have received limited attention in this domain. An obvious approach is to utilize a multilingual pretrained language model and fine-tune it for a downstream classification task, such as misinformation detection. However, these models struggle with domain-specific terms, leading to suboptimal performance. To address this, we investigate the effectiveness of domain adaptation before fine-tuning for fake news classification in Urdu, employing a staged training approach to optimize model generalization. We evaluate two widely used multilingual models, XLM-RoBERTa and mBERT, and apply domain-adaptive pretraining using a publicly available Urdu news corpus. Experiments on four publicly available Urdu fake news datasets show that domain-adapted XLM-R consistently outperforms its vanilla counterpart, while domain-adapted mBERT exhibits mixed results.

Links & Resources

View on arXiv Download PDF

Authors

Muhammad Zain Ali Bernhard Pfahringer Tony Smith

Cite This Paper

arXiv:2512.22778

Year:2025

Category:cs.CL

APA

Ali, M. Z., Pfahringer, B., Smith, T. (2025). Fake News Classification in Urdu: A Domain Adaptation Approach for a Low-Resource Language. arXiv preprint arXiv:2512.22778.

MLA

Muhammad Zain Ali, Bernhard Pfahringer, and Tony Smith. "Fake News Classification in Urdu: A Domain Adaptation Approach for a Low-Resource Language." arXiv preprint arXiv:2512.22778 (2025).