How Data Science and Machine Learning can Combat Conspiracy Theories

It's no revelation that conspiracy theories spread like wildfire on the internet, creating such massive amounts of misinformation it can be challenging for people to sift through what's real and what isn't. 

In this month's episode of AI Right? our hosts Kris McFadyen, Megan Stamper and Andy McMahon, tackle internet conspiracy theories, looking at new research that could isolate content on Twitter's platform and flag it as misinformation. 

We've seen this recently on Facebook owned platform, Instagram, were mentions of Covid-19 are flagged with a banner that directs users to the NHS website.

The Los Alamos National Laboratory, USA, which Machine Learning Engineer, Andy McMahon, describes as an "extremely reputable laboratory that's very famous across all sorts of branches of science," has conducted the research. But, he says, the more he dived into it, the more it appeared to be a hype train going off the rails (the study, which he says ticks the buzzword bingo box, made it into the mainstream media). 

"The study looks at 1.8 million tweets, which to me doesn't seem a lot, really, and then does some regular expressions and basic searching to look for pattern matching e.g. looking for words like Covid-19 and Bill Gates, that's the conspiracy theories they were looking at." Andy says. "So, there were four conspiracy theories and one was about Bill Gates and the other three were related things to do with Covid, vaccines and where and how Covid originated.

"They then get that 1.8 million tweets down to a small subset aligned to the conspiracy theories and then train a random forest classifier on it, saying given these tweets which class of conspiracy theory is this in?"

How Much is Your Data Worth? 

Gener8, a startup that offers people rewards for their data, recently caused a stir online and in the UK Dragon's Den with the proposition that people can earn rewards, in exchange for their personal data being collected by companies through Cookies and other means. 

On last month's episode, AI Right? talked about the value of data and how its easy for companies to short-change users because they don't necessarily understand the value of their data and what they get in exchange for it. 

"Data has been described recently as the up and coming world's most valuable resource," says host Kris McFadyen, "so, what I want to understand is how can companies measure the value of the consumers data and, as consumers, how can we get rewarded for our data?" 

Co-host and data scientist, Megan Stamper, says, "at the BBC, most of the framing around the financial aspect of data and data science is how much it costs to provide the service. I think about it from the angle of, 'what is the cost to the consumer to provide the data?' and 'what is the value to us?'

"It's not something I've explored in-depth and I haven't tried to put a number on the user's data in the way I work with it. It would be interested, Andy, if you've ever done this."

"Yeah, I've come up against this a few times," Andy says, "it's always around budgets and justifying fancy Machine Learning models. It's a recurring question in every company I've worked for. How much value is added here and what's the ROI? Because the data is a reusable asset...there are interesting economic points that make data different, for instance there's zero cost for replication."

Stream The Full Episode

AI Right is streaming on all major podcast platforms, including Spotify, Google and Apple. You can also stream or download the episode below.