Case Study: Releasing Crime Incident Data
"When criminal justice data is of good quality and made available, good things can come of it."
-- The benefits of criminal justice data: Beyond policing, Sunlight Foundation
In December of 2012, the City of Philadelphia initiated the largest releases of crime incident data in its history. This release of data led the nation in what is now recognized as a necessary technique for effective 21st century policing.
Publishing open data is one of the quickest, cheapest and easiest ways for police agencies to enhance transparency, build and engage a community of data users and to start reaping the benefits of data-driven decision making. But police agencies may be hesitant about releasing crime incident and other kinds of policing data for a number of reasons - most importantly, it may contain personally identifying (PII) or sensitive information that needs to be removed before it can be made public.
What does the process look like for reviewing and releasing crime data? What kinds of things should police agencies target in reviewing crime incident data before making that data public?
Here are a few lessons from the City of Philadelphia's crime data release that other governments can look to as they get started down the road to open data.
What to include, and not include
Any successful release of crime data will include sufficient data points needed for outside parties to conduct their analysis, or on which they can build visualizations. There are at least three data elements that are critical to a useful crime incident data release - location (where the crime occurred), time (when the crime occurred or was reported) and type (the crime incident type).
Here is a list of the field names and types used by the City of Philadelphia in its crime incident data releases.
Uniform Crime Reporting (UCR) codes are usually well understood by both police agencies and crime data consumers, so the crime types included in an open data release should map to an existing UCR code. For incident time, the City of Philadelphia uses the dispatch time of the officer in it's data releases - whatever specific data point is used to denote the time an incident occurred, it should be clearly reflected in the metadata for each data release.
Location information for crime incidents should be scrutinized carefully prior to release - depending on how it is structured, such information has the potential to violate the privacy of crime victims. Whether represented as a physical address or as a set of coordinates, steps should be taken to ensure that a crime victim's privacy is not violated.
For example, the City of Philadelphia only releases street block level locational information as part of its crime data releases. The city also provides a set of coordinates to represent the location of a crime incident. These coordinates run along the street centerline, as opposed to using a specific parcel centroid. Other cities take a similar approach - for example, the City of Chicago also only releases block level address information in its crime incident data.
Err on the side of caution
The City of Philadelphia chose to use generalized information on the type of crime in its data release. UCR codes are rounded to the nearest hundred level. This was done to prevent the unintentional release of PII by including UCR codes that were so specific that they could potentially be matched up with other sources of information to identify a specific crime victim.
Consumers of this data have requested more detailed information on UCR codes, and adding more detail could enhance the value of the data. However, police agencies and cities should err on the side of caution and always try to strike the most appropriate balance between the desires of data consumers and the need to protect the privacy of crime victims.
Iterate and Engage
As noted elsewhere in this guide, releasing data is not the end of the process but a beginning. Working with data users to understand how they are using data and the opportunities for enhancing data through successive releases is a key part of the open data process.
Any successful data release must leverage available channels that data consumers have to provide input and feedback to the government publishing data. The City of Philadelphia uses a Google group to interact with members of the data community and often receives feedback on issues relating to crime data. Using this input to enhance future iterations of crime data releases is an important component of the city's successful open data efforts.
One technique that Philadelphia has employed in the past is to provide a "beta" release of data to a specific subset of users, and ask these data users to provide an initial review. This approach can be very effective in spotting unforeseen issues early, and correcting them before a full public release of crime incident data occurs.
Since a number of jurisdictions are already publishing crime incident data, one of the most effective ways of getting started is to look for other police agencies releasing this type of data and contacting them for a discussion of the process they have used to review and release data.