The project builds on existing work conducted by Professor Matthew Williams, Director of the Social Data Science Lab at the Data Innovation Research Institute and Professor of Criminology at Cardiff University and Dr Peter Burnap, senior lecturer in Computer Science & Informatics, to look for 'signatures' of crime and disorder in open source communications, not crimes themselves.
“In our statistical models based on London Met Police crime records and Twitter data, we found social media mentions of the breakdown of the local social and environmental order – mentions of littering, anti social behaviour, drinking disorderly, tipping – were highly correlated with certain types of crime, such as burglary,” says Williams. “These correlations were stronger than those found between crime records and conventional data, such as census data on educational attainment, unemployment etc. This preliminary evidence indicates social media data might be useful in providing near-real-time insights into crime patterns.” The next step, then, is to apply this experimental data to hate crimes.
The new project will collect Twitter posts containing terms that have been labelled as hate speech by human annotators over a 12-month period. The team’s original hate speech detection algorithm was developed in the UK following the Rigby murder and the new project will use similar machine learning techniques to build new hate speech algorithms based on US data. The team will also gain access to 12 months’ LAPD recorded hate crime data. These two measures will then be entered into statistical models to identify if there is a correlation, that is, whether an increase in hate speech in a given area is also statistically linked to an increase in recorded hate crimes on the streets in the same area.
“If the model shows such a relationship, then this social media data may be used in conjunction with conventional data sources to improve predictions of hate crimes offline. These new forms of data are also attractive as they can provide new information on changing risks in near real time, unlike conventional data that is often weeks or months out of date,” he says, adding that since the project is experimental, the absence of a significant correlation will not be surprise.
Significantly the particular award for the University of Cardiff was to fight hate crime, and this, says Peter Wang, CTO and Co-Founder of Continuum Analytics, is an important distinction. Wang was core to the DARPA-funded Memex project to fight human trafficking. Using Anaconda and Continuum Analytics open source software projects, DARPA is able to effectively scale the Memex solution with the ever increasing web to index and cross reference interactive and social media, text, images, and video. These deep searches combined with rich visualisations identify patterns and connect the dots about typically elusive movements across locations. Reports by NYDA’s office attribute Memex to over 20 active sex trafficking investigations and nine open indictments.