Today I participated in an interesting panel at the MIT-Knight Civic Media Conference. Here’s my bit:
Here’s relevant snips from the conference live blog:
Up first are strategies for finding data: He emphasizes using search. Search is your friend, he says. “ I prefer it to asking”. He was responsible for data acquisition for Everyblock. Dan reemphasizes the importance of searching for data on your own. “Asking people for data never got me anywhere,” he says. He gives the example of Dallas crime report, which has several years of data on its website, but you need to look for it. If you look hard enough, it has the best crime data. And it has narrative. After all, cops record everything that happens. Data has more structure than what you would imagine, he says.
Dan created data using text to see why sources were granted anonymity. This is an idea he has been working on since 2005, after the Jayson Blair episode at New York Times. Dan shows a slide that shows a list of reasons for allowing anonymity. He also points us to the Data Journalism Handbook — lot of people helped create it here. On a lighter note, he says it is highly focused on asking for data.
Dan makes the point of how context is important in data. Publishing data without context is not super-useful. Why is most data is boring? Dan says it’s because data is made by people, and most people are boring most of the time. Hence the need for a new model of presenting data: ” I use data to tell stories.” He tells us the story of his incredibly detailed post on a Walgreens in Milwaukee, including information from 10 different data sources.
He gives details of how he developed the story and how he got the data. You have to get as much data as you can, he says. For instance, in this case, he even got building permits from the city of Chicago. There could be many interesting details in this data. Building permits is boring data but there could be exciting details embedded in it. He found a building closed for 20 years and open for three days.
He talked about other resources. Sanborn maps are amazing resources on land use and building use. Original photography is data. He looked in New York Times archive found materials there. He looked for the word JImmied in advanced search and found it was useful data . Police had used it several times. This could be used for further search.
Dan argues that we need to start embedding data in stories. We also need to take personal responsibility for our own data. In the crime records, people call the police and lie. Crime data full of amazing lies. Abstraction of data that is not useful.
He gave an example of relationship between human beings and data. If you are looking for how many planes are struck by birds? The data available is terrible. Reporters wrote about this and then data on these details was released and it was found the San Francisco International Airport was terrible because they had good data.
- A preparatory post I wrote for the panel: Toward a Generic Context Engine for Civic Data and the slides I used in my presentation
- Liveblog notes from the panel
- My complete presentation: Turning Data Into Narrative: Strategies for finding and sharing stories embedded
within sets of data