Identifying data sets for public release
"Cities produce a great deal of data, all of which is probably interesting to somebody out there. Given our limited municipal data resources, how do we prioritize the datasets that we publish?"
-- Jim Craner, 2012 Code for America Fellow (City of Santa Cruz)
The first step in releasing open data is identifying which data sets or data resources to release. There are a number of easy ways to identify data sets that are a candidate for public release:
Review existing requests for data and information
Take a look at requests for information received by your department or office - these may be formal FOIA or Open Record requests, or less formal requests sent via e-mail or some other channel. Several recent studies looking at FOIA requests indicate that a large number of these requests may be focused on a relatively narrow collection of data sets. Responding to these frequent requests can be a challenge, particularly for smaller cities and leveraging FOIA requests as a way to identify high value data sets can have a number of other operational benefits as well.
Check for scrapers
Web scrapers are programs that are written by people that want to extract information - usually in bulk - from a website. There are many options for tools that can be used to conduct this type of data extraction, but one of the most popular is website called ScraperWiki. Checking to see if there are any ScraperWiki entries for your websites is a great way to determine if some of your data has value to outside users.
Web scraping tools are often used to quickly extract information from a government website. This means that they can sometimes send large amounts of traffic to a website and potentially disrupt normal operations. There are ways to identify if web scraping activity is occurring on a occurring website, and though there are some options for mitigating this activity the best approach to addressing web scraping of public information from a government website is to release it as open data.
Look at what other cities are doing
A number of other cities in the U.S. and around the world are releasing open data to external users. What are people in other cities looking for? What kinds of data are other governments making available? Chances are, if there is a high demand for certain kinds of data in other cities there may be a high demand for it in the City of Philadelphia as well. This is true for a variety of different kinds of data, but is especially relevant for crime, property and financial data.
Ask the public for ideas
If you want to find out what data people are looking for from your department or agency, one of the best ways to find out is simply to ask. Does your department have a Facebook page or Twitter account? Check to see if there are comments or tweets about data that you might be able to make available.
In Philadelphia, there is a public forum set up to allow city departments and others to interact with data users. Think about whether a posting to a forum like this would be a useful way to solicit feedback on data to release.
Examine existing websites
Look at the existing website for your agency or department. Do they contain documents with data in them or that are based on data, like annual reports, CompStat or CitiStat presentations, financial statements, etc.? Does your website have lists of service locations, or a search page that lets users find a location or service close to them? Are there any web-based application that have a database behind it - like a search feature - that is worth examining more closely for a data set that might be appropriate to release.
If you're already publishing data on your government website that is formatted for consumption by human eyeballs, consider releasing it in a format that is appropriate for consumption by software and analytic tools.
The “Open Data Handbook” also has some practical tips that can help identify data sets that might be suitable for release by your department.