Enhancing human review with machine learning
The deluge of fake news has been so overwhelming that fact-checking sites such as Snopes told its readers in March 2020 that it was unable to keep up due to resource constraint. The failure to control fake news, despite increased efforts, shows the need for better methods to detect it.
One common approach adopted by social media companies relies on humans (subscribers, contractors) to flag potential false content. A major limitation to human review is that flagging is subject to personal bias. Nor is it practical given the large volume of information in the media in today’s digital era. Inevitably, the use of artificial intelligence (AI) has come into focus in the fight against fake news.
A common AI-based hybrid approach builds on two classification models: content and social context. Content model analyzes topic distribution within the news article. However, the model itself is rarely used on its own because relying on content alone makes it difficult to differentiate intentional deception from bias.
The content model is often complemented by the social context model, which focuses on key aspects of the social network (e.g., followers, user characteristics, interaction and engagement history). The drawback of this approach is that analysis is performed based on the news content only. The limited scope can make it difficult to understand the broader context of the news and, potentially, the type of fake news.
Another approach championed by MIT focuses on news sources. Researchers from MIT’s Computer Science and Artificial Intelligence Lab and the Qatar Computing Research Institute developed machine-learning models to assess the authenticity or neutrality of news sources. The drawback of this approach is that while certain facts in an article may be fabricated or embellished, the overall point of the article may still be authentic.
A new way of thinking
In examining the fake news problem, it may be worthwhile to learn from government counterespionage efforts. Open-source intelligence (OSINT) is a key element of government counterintelligence strategies. OSINT refers to any information that can be legally gathered from free, public sources. The information can be about an individual or an organization.
Given its reliance on freely available data, OSINT can be compromised if an individual or organization falsifies information in the public domain. However, this can be addressed by cross-checking relevant pieces of information about an individual to look for inconsistency.
For example, by starting with an author’s social media information to find out about his/her work history, one can then use the work history to search the employer website and court records to verify the author’s online profile. Given the vast scope of data covered in OSINT, extensive cross-checking can be carried out to minimize the risk of false information.
By bringing in additional data from OSINT, particularly data outside of what is in the news content, the scope of analysis can be greatly increased, aided by machine learning and natural language processing technologies. The outcome is more comprehensive insights to help determine the authenticity and nature of news content, whether fake or not.
Using OSINT, an EY team has run through a series of tests using known misinformation. The preliminary results have shown very encouraging opportunities in commercializing this approach in organizations’ fight against fake news.