Fake news and methods to identify them via machine learning.
Introduction
Fake news is defined as "false stories that appear to be news, spread on the internet or using other media, usually created to influence political views or as a joke." (Cambridge University Press, 2022) Originating as far back as the Roman era, it has existed for over two millennia. (BBC, 2022). However, powered by the growth of the internet, it has gained significant prominence.
Its influence on the public is a course for concern as it "promotes toxic narratives, spreads doubt and confusion, and increases social polarisation, affecting democratic decision-making." (Civica, 2022) It can also be difficult to identify due to cognitive biases. These are shortcuts and according to Centre for Information Technology & Society (2022) four aspects of cognitive biases are in affect in the utility of information.
First, we tend to focus on headlines and tags without reading the article they’re associated with. Second, social media’s popularity signals affect our attention to and acceptance of information. Third, fake news takes advantage of partisanship, a very strong reflex. And fourth, persistence--there’s a weird tendency for false information to stick around, even after it’s corrected.
The issue of fake news made headlines during the 2016 US election (Blake, 2018). However, due to its power to miniplate audiences and subsequent profitability, as well as the ethics surrounding free speech, fake news continues to remain prominent. For instance, Donald Trump's use of fake news to benefit himself and strengthen his position is widely recorded. (Rattner, 2021) In such cases, securing government led condemnation of the issue may be limited.
This is not to say that efforts have not been made. For instance, organisations such as FactChecker and Birdwatch strived to debunk such news (Lorenz et al, 2022). They work by identifying and highlighting misinformation.
This method is one of 4 methods highlighted in a report by Lazer et al (2017) as methods to combat fake news:
(1) offering feedback to users that particular news may be fake (which seems to depress overall sharing from those individuals); (2) providing ideologically compatible sources that confirm that particular news is fake; (3) detecting information that is being promoted by bots and “cyborg” accounts and tuning algorithms to not respond to those manipulations; and (4) because a few sources may be the origin of most fake news, identifying those sources and reducing promotion (by the platforms) of information from those sources.
Combatting Fake News
In the case of the methods highlighted above and in the ways exercised by groups such as FactCherker, fake news must first be identified. For this, fake news or suspected fake news needs to be identified and a key tool for such identification can be machine learning/ artificial intelligence.
There are two main methods for this. Unsupervised learning may be able to identify discussion groups where an Echo chamber is forming. Likewise, a supervised learning model akin to those used for spam filtering could be used to identify suspected fake news based on a pre-existing dataset. The later can be explored using a fake news dataset on Kaggle.
Supervised Learning for detecting fake news
The Kaggle dataset is split into two fake and true news. There is approx. 2.3% more fake news than true news article with 23481 fake news entries and 21417 true news articles.
Using a potion of the dataset, a model can be created through the use of matrix of TF-IDF features and linear support vector classification. This model performs extremely robustly with a f1-score of 0.99 and an accuracy of 0.99.
In other words, this model can be used to assess new data to examine whether the news is likely to be fake or not. However, there is one major issue with this dataset. It relies on the data it has been trained on. In other words, if fake news focused on a new topic emerges, the model is likely to struggle/its performance may decrease.
It should be noted that this project utilises Spacy and Sklearn. However, alternative models such as BERT, that uses deep learning are also available (Paialunga, 2021) .
Using AI to identify fake news
Fake news covers a broad array of topics from vaccinations to politics and gender equality. These topics can be identified through unsupervised learning methods. In this case, assuming there are 6 different topics exists, key phrases can be singled out from similar articles.
This process indicates that major topics includes:
Trump supporters
US politics
Media
US election
International politics
Education
This process can indicate where active investigations should take place to curb the spread of fake news. However, this process will not be able to assess in identifying new areas of focus unless an up-to-date catalogue is routinely provided.
Further Concerns
One major solution that individuals revert to is the notion that as the issues of fake news grows, individuals will become better adopt at identifying it and acting accordingly. However, this entails training individuals to go against their default cognitive biases. In addition, whilst individuals may be aware that photoshop is frequently used in images, its impact on mental health remains profound. (Harvard, 2020) Moreover, educating a large population who are no longer in education can also be challenging.
Likewise, the issues covered only cover the information which is in the public domain. It does not cover the growing concerns surrounding echo chambers and private groups where fake news is able to freely circulate (DW Documentary, 2022).
Conclusion
Fake news is a major issue affecting society. This is unlikely to alter in the near future especially as technology such as deep fakes continue to develop (Schwartz, 2018). Subsequently, there is a strong need to find robust methods to combat such information. In order to do so, there is a strong need to identify possible fake news so it may be investigated and appropriate action taken. For this, tools such as machine learning highlighted here may serve a vital role.
“A lie can travel half way around the world while the truth is putting on its shoes.” ― Mark Twain
Comments