01

Tools

The increasing sophistication of AI and lower technical barriers to entry are making it possible to scale up disinformation campaigns. It’s now more important than ever to monitor new technologies with the potential to spread lies and half-truths. Disinfo Radar provides users with a range of analytical tools to discover the disinformation technologies of tomorrow at an early stage. 

icon

Disinfo Factor Tornado

This tornado chart below shows how each technology-related topic ranks in terms of Cluster Size and DisinfoScore.

The values are sorted by DisinfoScore and the graduation benchmark on the right will help you to understand the level or risk related to each specific cluster.
Please note that the scores are relative to the average, so just because the scores is “lower” it does not mean it does not pose a risk.

icon

Disinfo Factor Dot Plot

The dot chart below shows for each technology-related topic the position of the variables of:

Accessibility;
Content generation;
Automation

icon

Disinfo Factor Scatterplot

These scatter plots map tech-related topics based on pairs of factors, enabling researchers to see which topics are outliers along specific dimensions (e.g. Automation vs. Content-Generation).

icon
icon icon

Our methodology

Disinfo Radar aims to identify new disinformation technologies at an early stage by way of automated text-analysis tools. By auto-collecting and auto-analysing electronic preprint repositories (e.g., arXiv), industry papers (e.g., syncedreview.com), and policy publications (e.g., IEEE), Disinfo Radar scans the environment for indications of emerging technologies. Through a daily updated pipeline, it collects, processes, and subjects texts to state-of-the-art machine learning models in order to identify technical innovations that could be abused for disinformation purposes.

Once texts have been collected (i.e., auto-collection powered by web-scraping), they are assessed using a pipeline of self-trained classifiers. First, each text is broken down into sentences. Each sentence is scanned for mentions of technologies or technology-related terms, using an in-house Span Categorization model (similar to Named Entity Recognition models), trained on data from similar sources.

Every mention of a tech-related term is then assigned a score for several disinformation-potential factors: Automation, Content-Generation, and Accessibility. These scores are assigned by Relationship Extraction (i.e., sentence-level text classification) models trained on curated and synthetic data, with a separate model for each factor.

Low-confidence mentions are filtered out, while the rest are aggregated across the entire data set and normalized to single topics via Affinity Propagation clustering. For example, mentions of “neural networks”, “NN”, and “artificial neural nets” should be normalized to a single tech-related topic, such as “Neural Networks”. Note that since this is unsupervised learning without manual adjustment, there can sometimes be sub-optimal clustering and topic labels.

For every tech-related topic, average scores for each disinformation-potential factor are calculated based on all of its mentions across the dataset. These factors are combined using a weighted average to create a final Disinfo Score for that tech-related topic.

The scores for each of the factors, as well as the final Disinfo Score, are z-scores, meaning that they range from around -3 to around +3. Each integer represents a standard deviation from the mean (0). For example, if “GPT-3” has an Accessibility score of 2.5, that means it much more accessible than average (2.5 standard deviations above average, to be precise). Based on these scores, qualitative grades ranging from “Very High” to “Very Low” have also been defined for each factor and the final Disinfo Score. Topics that rate high in disinformation potential are more likely to be noteworthy.

Identifying outliers in the previous steps assists DRI’s disinformation experts in their qualitative analysis. Using the registry results, they evaluate the identified technologies and determine the threat potential of each by conducting additional desk research. When a technology is seen as embodying a potential threat, meaning that it could potentially be used to produce or amplify disinformation, DRI utilises the data obtained from the registry to inform potential stakeholders.