Writing: EveryBlock Statement in Support of Open Data Standards in New York City

Today, on behalf of EveryBlock, I wrote and submitted the following statement to New York City Council:

EveryBlock strongly supports the introduction of a Local Law to create open data standards in New York City.

EveryBlock is a neighborhood news site serving 15 cities, including New York. You can see our work at http://nyc.everyblock.com/, where we combine public records from New York City’s government with news articles, business reviews, images, and other items collected from across the Web.

In a typical month, we add thousands of crime reports, building permits, restaurant inspections, street closings, business licenses, news articles, and other news for every block in the city. Wherever possible, items are published at the block level, so that people can see what’s going on near them. We also offer the public the ability to subscribe to daily updates through e-mail and RSS feeds. We continually search for new ways to add information in more useful ways.

In the course of our work in the past two years, we’ve worked with city leaders — department heads, council members, technology developers, policy makers, and so on in each of the cities we cover. We share some of what we’ve learned about data sharing in New York and other places below, and we look forward to working with DOITT, the New York City Council Technology Committee, and other stakeholders on fashioning an effective local law in ways to benefit all New Yorkers. We really hope that this law — and the data published as a result of it — serve as an example for other municipalities.

Some thoughts on the draft language of the Local Law

We reviewed the draft amendment to title 23 of the administrative code of the city of New York as seen here: http://webdocs.nyccouncil.info/textfiles/Int%200991-2009.htm?CFID=251. Setting aside any word-parsing, we find this to be a strong move forward in open data law for municipalities. There are three areas we see as especially promising:

Raw formats
The provision that “all public records shall also be made available in their raw or unprocessed form” is especially welcome. Very often, municipalities seek overly expensive, complicated technology projects to present data. Leveraging the efforts of citizen developers working with powerful tools is the way to go. As part of the EveryBlock project, for instance, we recently open-sourced the site’s backend code: http://www.everyblock.com/code/.

Structured formats
“All public records shall be presented and structured in a format that permits automated processing.” This is a much less complicated requirement than it seems. See below for many examples of structured formats that are available in existing tools and technologies that are widely available in New York and other municipalities.

Frequent updates
The draft local law states that, “All public records shall be updated as often as necessary to preserve the integrity and usefulness of the records.” Often, we see municipalities publish information once and fail to update it on a regular basis.

Examples of data published by the City of New York and displayed on EveryBlock
The City of New York already publishes a significant amount of data to its Web site. Here’s a quick review of a number of these data types.

Building permit actions
http://nyc.everyblock.com/building-permits/
This data is published in Excel spreadsheet format as Job Weekly Statistical Reports by the Department of Buildings. The department updates its data weekly, and we at EveryBlock publish it shortly thereafter.

Sign permit actions
http://nyc.everyblock.com/sign-permits/
Similar to the Building permit actions, this data comes from the Sign Monthly Statistical Reports published by the New York City Department of Buildings in Excel spreadsheet format. The data is updated regularly and we at EveryBlock publish it shortly thereafter.

Property sales
http://nyc.everyblock.com/property-sales/
This data is published in Excel spreadsheet format on the Rolling Sales Update section of the city Web site. The New York City Department of Finance maintains the data, and updates it once a month. In general, the Finance Department deserves a lot of credit for the amount of data it publishes in this format. The Department also provides RSS feeds (http://www.nyc.gov/html/dof/html/jump/notifications.shtml) to alert users when new data is published.

Examples of data we’d like to see or existing data that fails to meet standards set forth in this local law

Crime data
http://nyc.everyblock.com/crime/
This data comes from the precinct reports published by the police department. These reports are not comprehensive (they only include seven crime types), unspecific (they are only collected to precinct level), and infrequent (published weekly). This data lags far behind many other cities in each of these criteria.

311
On EveryBlock, we publish information that we collect from the NYCscout page, run by the Mayor’s Office of Operations. We obtain this data by scraping the maps in the NYCscout application. This is a tiny sliver of the service requests completed by the city. It would be much better to have formal feed of all 311 data, along with details on the final disposition of service requests.

Graffiti cleaned and Graffiti cleanup requests
http://nyc.everyblock.com/graffiti-cleaned/ and http://nyc.everyblock.com/graffiti-pending-cleanup/
The data comes from this database of completed graffiti cleanup locations and this database of pending graffiti cleanup locations, maintained by the Mayor’s Community Affairs Unit. It would be better if this data was published in formal feeds with structured formats. This allows for more sustainable methods than scraping Web databases.

Restaurant inspections
http://nyc.everyblock.com/restaurant-inspections/
This data comes from the online restaurant inspection database published by the Department of Health and Mental Hygiene. It would be better if this data was published in a formal feed with a structured format. This allows for more sustainable methods than scraping Web databases.

Landmark building permits
http://nyc.everyblock.com/landmark-building-permits/
This data is no longer updated on EveryBlock because the source database, which used to be maintained by the Center for New York City Law CityAdmin search tool, is no longer available. This is a great example of why it’s important to have reliable, centralized, well-structured datasets available to the public.

Other technology methods from other cities
Here are some examples of extremely lightweight data sharing at the civic level — next to zero effort with huge utility to developers like EveryBlock.

XML: San Francisco Police calls
http://sf.everyblock.com/police-calls/
We worked with the San Francisco Police Department to help them create an XML file that they update for the public daily: http://www.sfgov.org/site/uploadedfiles/police/ftpfiles/CADdataZIP.tar.gz. The police do this as an export straight out of their CAD system — no development costs, no maintenance.

Database dumps: San Francisco restaurant inspections
http://sf.everyblock.com/restaurant-inspections/
The City & County of San Francisco does a .mdb file database dump in their native format and publishes it to an FTP server with access to EveryBlock and other approved entities.

Text files: San Jose Building permits
http://sanjose.everyblock.com/building-permit-actions/
The City of San Jose publishes raw text files like this: https://www.sjpermits.com/sanjoseftp/permitdataWeeks/PDIssue_latestWeek.txt directly to its Web site on a weekly basis: https://www.sjpermits.org/permits/permits/general/reportdata.asp

Conclusion
We see this ordinance as a welcome next step in New York City government. If affirms the importance of data sharing, provides clear instruction on what should be published, and begins to set technology standards for format and structure.


Posted

in

, , ,

by

Tags: