Cx PhD student Yiqing Hua has been working on using natural language processing techniques to detect propaganda in news. As a participant in the 2nd Workshop on NLP for Internet Freedom (NLP4IF), Yiqing contributed to the workshop's shared task on news propaganda detection. The task provided participants with guidelines on defining propaganda, and provided them with a test set containing news articles that had been manually labeled as having propaganda-related features.
To automatically detect these features, Yiqing used a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model, fine tuned on the training dataset. She used 10-fold cross validation to create an ensemble of models that limits over-training and improves model performance. The resulting model achieved 0.62 F1 on the test dataset, and placed third among 25 participant groups at the workshop.
When investigating specifically what features of language tended to indicate a 'propaganda' label, Yiqing and her team found signs of uncivil language and strong emotion. Words such as "devastating" or "cruel", along with phrases like "totally insane" and "utterly unacceptable" were likely to cause a sentence to be classified as propaganda. However, these could also be found in opinion pieces published in credible news sources, meaning that automated systems such as a BERT model may have difficulty distinguishing between the two. The system also isn't capable of distinguishing between actual propaganda and quotations from propaganda, creating a challenge for future researchers.
The NLP4IF workshop took place at the Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing that took place in Hong Kong on November 3, 2019. You can find the paper for this project at this link >