Penn State Researchers Help Create Technology to Promote World Peace

Understanding the nature of international conflicts is a key component in stabilizing tensions between nation states. Researchers at Penn State and the University of Texas at Dallas are creating technology to analyze large amounts of newswire articles to help scholars address issues of global importance.

“Our goal is to curate a database of news articles that detail militarized interstate disputes,” said David Reitter, an assistant professor at Penn State’s College of Information Sciences and Technology (IST).

Reitter, along with Glenn Palmer, a professor of political science at Penn State and Vito D’Orazio, an assistant professor in the School of Economic, Political and Policy Sciences at the University of Texas at Dallas, were recently awarded a National Science Foundation (NSF) grant of about $1 million to support their project: ‘Updating the Militarized Dispute Data Through Crowdsourcing.’ The goal of the project is to create a digital catalog of militarized incidents between nation states across the world, covering a period of several years.

[Tweet “”This will help others understand where and why conflicts arise.””]

“This will help others understand where and why conflicts arise, what the trends are and, ultimately, how we can counteract militarized tensions,” Reitter said.

How The Database Was Created

To build their database the researchers are using software algorithms that learn something about the world through data. The Correlates of War Project’s Militarized Dispute Data (MID) is the most prominent and heavily used collection in the study of international conflict and is curated at Penn State by Glenn Palmer. The most recent version (MID4) was released in 2014 and covers the years 1816-2010.

Over the course of the MID project experts coded the news documents – a costly and painstaking process. To address the problem, the researchers recently completed a pilot project to determine whether crowdsourcing techniques could be used to code the news stories. In the pilot, non-expert workers were paid small sums to read documents and to answer sets of questions. The answers to these questions were used to identify features of possible militarized incidents. The Penn State software combined and corrected the crowd workers’ answers. A systematic comparison of the crowdsourced responses with those of the MID4 Project’s experts revealed that the coding of the crowdsourced were completely accurate for many of the news reports.

The coding of the crowdsourced were completely accurate for many of the news reports Click to Tweet

“As a result, we are now able to automatically document militarized incidents in near real-time and cost-effectively,” Reitter said.

Political science is a new area of focus for Reitter’s research group at IST, which normally studies cognitive processes involved in communicating and decision making. However, the study of the human brain has inspired much recent progress in machine learning. The group has recently explored techniques that learn from long term streams of natural language data such as decades of news stories. These techniques are fundamental to successful crowdsourcing. The project now examines natural language processing and machine learning to produce the new database.

The project will continuously provide updated data, which will eventually cover the time period 1816-2017. Because the MID data are so widely used by the scholarly community, the updates will benefit researchers who are addressing a wide range of research questions, including the effects of regime type on conflict, the role of natural resource competition on militarized disputes and the effects of power cycles, arms races and alliances on the initiation and escalation of conflict. Extending these data into more recent years will also allow scholars to address timely questions such as whether and how recent climate change has influenced international conflict.

“The expansion of the MID data through 2017 and the continual development of our efficient data collection system ensure that researchers have the data they need to reach empirical conclusions in these important areas of social science research,” Reitter said.