Data-driven journalism: approaches to interactive storytelling (at #ISOJ)

The session that impressed me most at last week’s International Symposium on Online Journalism was this one, led by Aron Pilhofer of the New York Times. Major innovations are possible in data journalism;the key is getting or creating the requisite databases. The journalism is in extracting meaning.

http://storify.com/jonl/data-driven-journalism-approaches-to-interactive-s

in Uncategorized | 53 Words | Comment

Live blogging #ISOJ

I was actively live tweeting the International Sympsium on Online Journalism when I hit Twitter’s limit of 1000 posts within 24 hours. (I’m finding it a little hard to believe I posted that much, but that’s what Twitter’s algorithm is saying, I suppose).

I’ll do some live blogging here instead, offering this brief post by way of explanation – if you find this interesting, see the @jonl posts for #isoj and #isoj12. I’ll look for a chance to storify some of those tweets later.

Iraq 2006: a bag of words

How to make sense of Wikileaks data? One way is visual analysis, as we see here, via Jonathan Stray of Associated Press:

Click the image for the high res version.

Stray and Julian Burgess created a visualization using data from December 2006 Iraq Significant Action (SIGACT) reports from Wikileaks. That was the bloodiest month of the war, and the central (blue) point on the visualization represents homicides, i.e. clusters of reports that are “criminal events” and include the word “corpse.” These merge into green “enemy action” reports, and at the inteface we have “civ, killed, shot,” civilians killed in battle. Stray tells how this was done, with some interesting notes, e.g.

…by turning each document into a list of numbers, the order of the words is lost. Once we crunch the text in this way, “the insurgents fired on the civilians” and “the civilians fired on the insurgents” are indistinguishable. Both will appear in the same cluster. This is why a vector of TF-IDF numbers is called a “bag of words” model; it’s as if we cut out all the individual words and put them in a bag, losing their relationships before further processing.

As a result, he warns that “any visualization based on a bag-of-words model cannot show distinctions that depend on word order.” (Much more explanation and detail in Stray’s original post; if you’re interested in data visualization and its relevance to the future of journalism, be sure to read it.)

Thanks to Charles Knickerbocker for pointing out the Stray post.

Taking a Wikileak

In my obligatory post about Wikileaks as the story du jour, I point to the great set of questions Dan Gillmor has posted in his column at Salon. These are especially lucid. I like especially Dan’s point about the character of the communications that were leaked, that many of the messages are gossip. Journalists are dutifully reporting “facts” gleaned from the leaked material without necessarily digging deeper, verifying and analyzing. Of course, they don’t have time – the information environment moves too quickly, he who hesitates is lost, accuracy be damned.

Then again, journalism is so often about facts, not truth.  Facts are always suspect, personal interpretations are often incorrect, memories are often wildly inaccurate. History is, no doubt, filled with wrong facts and bad interpretations that, regardless, are accepted as somehow “true.”

The high-minded interpretation of this and other leaks, that people need to know what is being said and done by their representatives in government, especially in a “democratic society,” is worth examining. We’re not really a democracy; government by rule or consensus of a majority of the people doesn’t scale, and it would be difficult for the average citizen to commit the time required to be conversant in depth with all the issues that a complex government must consider.

Do we benefit by sharing more facts with more people? (Dan notes that 3 million or so in government have the clearance to read most of the documents leaked – this seems like a lot of people to be keeping secrets… is the “secret” designation really all that meaningful, in this case?) But to my question – I think there’s a benefit in knowing more about government operations, but I’m less clear that this sort of leak increases knowledge vs. noise.

I’m certain about one thing: we shouldn’t assume that the leaked documents alone reveal secrets that are accurate and true. They’re just more pieces of a very complex puzzle.

Events this week – NPOCamp and Austin News Hackathon

Cross-posted from http://effaustin.org.

Two great events coming up this weekend in Austin, sponsored by EFF-Austin.

Friday, join us at NPO Camp – a Barcamp for Nonprofits and Techs. We had one of these several months ago, and it was a real blast! The idea here is to bring the nonprofit and technology communities together for a day and talk about the technical challenges the NPOs face, while educating the techs about that world. Last event, we had 200+ attendees forming into sessions and pods; all were lively.  Greg Foster, our newest EFF-Austin board member, has done most of the legwork in organizing the event, with major production assistance from Maggie Duval, also a board member and producer of the annual Plutopia event during SXSW. Sign up here.

Saturday, coders and journalists come together to build innovative news applications at the Austin News Hackathon, cosponsored by EFF-Austin and the local Hacks Hackers chapter led by Cindy Royal.  The day will begin with a presentation by Matt Stiles and Niran Babalola of the Texas Tribune, talking about some of the news apps they’ve been developing. Then teams will form to match ideas from journalists with technical expertise from the coders who are attending. These kinds of events are the future of journalism!  This event also benefited from Maggie Duval’s production assistance. Sign up here.

Both events will be catered by Pick Up Stix of South Austin.

Jay Rosen on the state and future of journalism

Jay Rosen has a terrific post about the state of media, beginning with this clip from the film “Network”:

Pretty timely, eh?

Jay analyzes the scene:

… the filmmakers are showing us what the mass audience was: a particular way of arranging and connecting people in space. Viewers are connected “up” to the big spectacle, but they are disconnected from one another. Or to use the term I have favored, they are “atomized.” But Howard Beale does what no television person ever does: he uses television to tell its viewers to stop watching television.

When they disconnect from TV and go to their windows, they are turning away from Big Media and turning toward one another. And as their shouts echo across an empty public square they discover just how many other people had been “out there,” watching television in atomized simultaneity, instead of doing something about the inarticulate rage that Beale put into words. (“I don’t know what to do about the depression and the inflation and the Russians and the crime in the streets. All I know is that first you’ve got to get mad!”)

He goes on to ask what would happen today in response to a “Howard Beale” event…

Immediately people who happened to be watching would alert their followers on Twitter. Someone would post a clip the same day on YouTube. The social networks would light up before the incident was over. Bloggers would be commenting on it well before professional critics had their chance. The media world today is a shifted space. People are connected horizontally to one another as effectively as they are connected up to Big Media; and they have the powers of production in their hands.

Jay follows with an expansion of his comments, and concludes with a set of recommendations for today’s journalists. (The post is a must-read for journalists and news bloggers.)

There’s been too much hand-wringing over the supposed collapse of journalism as we know it, but journalism’s never been more exciting, never had the kind of tools and channels of information available today. We’re seeing, not collapse, but evolution. I’m wanting to spend more and more time with journalists, and think more and more about the relationship of professional journalism to blogging and other more or less informal information channels.

$9 billion citizen journalism hit

At CNN’s iReport.com, a “citizen journalist” calling himself “Johntw” posted a report that Steve Jobs had been rushed to the ER following a heart attack. Word spread to and beyond Digg, across Twitter. Apple stock dropped quickly, a $9 billion loss based on the rumor. Though iReport posts aren’t vetted, the CNN association probably lent credibility to the report. [Link]

The Jobs incident was the second time in a week that mainstream media organizations have been embarrassed by their online citizen journalism arms – sparking debate about the accuracy of reports from these Web sites and showing how it takes only a few minutes for a scurrilous rumor, placed on a site without sufficient editorial checks, to inflict damage.

So what’s the cure? A dozen years ago Bob Anderson and I were talking about the emerging new media ecology and the question of information authority in that context. We figured media literacy should be taught alongside reading, writing, and ‘rithmetic. Support critical thinking, not censorship or authoritarian structures for distributing information.

Education isn’t always enough, sometimes you really do need moderators, hopefully with a light touch. The SFGate story linked above says how sexually explicit photos were posted at CBS’mobile phone application site, after which CBS promised “to redouble its efforts to police content.” A moderator had quickly removed the photos. Some might argue that photos should be screened before they’re posted, and some sites would do it that way, but that’s a daunting task, especially where you may have thousands of posts, and it’s not in the spirit of the many-to-many mediasphere. CNN does have moderators for iReport, but they’re not checking facts… “mostly, it is the job of iReport users themselves to weed out erroneous or inappropriate material.” That’s the social media way – the “vetting” is crowdsourced, and the reader must read critically, never assuming that the “news source” is correct. I would argue that’s always been the case, even with the best journalists. I’ve never been close to a news story that wasn’t wrong in some of the particulars, at least from my perspective. And that’s part of the problem – perspectives and interpretations differ. That’s why I left journalism behind – when I was in journalism school, it seemed pretty clear that it would be hard to tell the truth. Only a few gonzo journalists, a la Hunter Thompson, realized they, and their biases, had to be transparent within the reporting…