27. Journalists and the Media

Open Data Stakeholders - Journalists and the Media

  • Key Points
  • Timeline
  • Read & Engage
  • Cite
  • Data journalists have a key role to play as public interest watchdogs. The label “data journalism” can cover a wide range of practices from using data science to find stories to storytelling with data and creating interactive content and visualisations for articles.
  • The costs and complexity of effective data journalism, combined with the time pressures common in reporting, make for a difficult business case. Finding sustainable models of data journalism is even more urgent at a time when traditional media outlets face competition from online journalists.
  • The promise of “automated journalism” based on open data is largely unfulfilled; however, if more media houses focus on making open data an essential source, we may see more examples of automation tools in the future.
  • This is a critical moment for public trust, and there is no clear template for how data journalism can contribute to society’s response.

How to cite this chapter

Howard, A. & Constantaras, E. (2019) Open Data Stakeholders - Journalists and the Media. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The State of Open Data: Histories and Horizons. Cape Town and Ottawa: African Minds and International Development Research Centre.

Print version DOI: 10.5281/zenodo.2677781

  • Mar 2009

    Guardian Data Blog launched

    The Guardian Data blog has been described as the first systematic effort to incorporate publicly available data sources into news reporting.

  • Jan 2010

    OCCRP launch Investigative Dashboard

    The Investigative Dashboard, created by the Organized Crime and Corruption Reporting Project, provides tools for data-driven journalism. The platform has been through a number of iterations and provides advanced searching of hundreds of different public datasets.

  • Mar 2011

    La Nación establishes data journalism unit

    La Nación Data in Argentina is recognised as one of the first data journalism teams to be established in Latin America. Following data-driven reporting in 2010, the La Nación Data unit was established inside the daily newspaper in early 2011.

  • Feb 2014

    ProPublica launch data store

    Data-driven reporters ProPublica create their data store to provide access to open data and to develop a funding model by charging for access to premium datasets collected during their reporting.

  • Apr 2016

    Panama Papers released

    The cache of 11.5 million leaked financial and client documents from an corporate service provider supporting offshore companies were analysed using a range of data-driven tools.

  • Apr 2017

    BureauLocal releases first report

    The Bureau Local, developed by the Bureau of Investigative Journalism, was established to support local data-driven journalism.

  • Jun 2017

    Daily Nation Newsplex data-driven election reporting

    The Nation Newsplex, the data team of the Daily Nation, the highest circulation daily newspaper in Kenya, uses data to focus its election coverage on policy not politics during the 2017 presidential elections, running a fact-checking operation.

Journalists and the Media


The social impact of open data depends upon its use and reuse by different actors across society. For decades, journalists and media outlets have served and informed the public as a key infomediary for government statistics, academic and scientific research, business information, and their own analysis.

In the 21st century, data journalism is now an emerging field of practice around the world, but there is still no universally shared definition. Some practitioners consider data journalism to be thoughtful storytelling with data as the centrepiece. Others say it is including statistical data in stories, visualisations, interactive elements, or the application of data science to journalism.

Journalists who acquire, clean, analyse, report on, and, in some cases, create open data, now play a key role as public interest watchdogs. Reporters can transform the raw data into new insights, facilitate public engagement in the democratic process, inform consumers, and hold powerful institutions accountable. Journalists have made opaque financial transactions more visible, catalysing regulatory reform, such as with the International Consortium of Investigative Journalists (ICIJ)1collaborative reporting on massive leaks of data. Reporters at the Washington Post and The Guardian have created a database of undocumented killings by law enforcement,2raising awareness of racial injustice and shaming government agencies into keeping better statistics. In “Life in the camps”,3Reuters journalists mapped the abhorrent living conditions of refugees, revealing the failures of global diplomacy.


The computer-assisted reporting of the last century, when the tools and methodologies of social science were first applied to journalism using mainframe and then desktop computers, has been transformed by the rapid public adoption of broadband internet connections, smartphones, open source software and frameworks, and the unprecedented scale of data generation and publication. While the data-driven journalism of 2019 is distinguished from what has come before by computers and the internet, journalists have been reporting on data as a source, the most basic definition for “data journalism”, for centuries, printing statistics and tables in pamphlets, tabloids, and newspapers long before “open data” was an idea.

Around the world, a growing number of media organisations are downloading, analysing, and reporting on open data from multiple sources in ways that inform, engage, and empower the public. Over the past decade, pioneers in the global transparency movement have adopted and adapted principles and practices from the open source software world, where “showing your work” and collaborating around a shared code are important signals for both trust and transparency.

At their best, journalists are as appropriately sceptical of the open data published by corporations and governments online as they would be of the accounts from human sources.

Expanding horizons

The geographic reach of data journalism has grown over the past decade, driven by foundational investments, industry adoption, and grassroots organisations training reporters on the ground. The data journalism community, traditionally centred around the annual National Institute for Computer-Assisted Reporting4(NICAR) conference run by investigative reporters and editors, now hosts a much more cross-border conversation.

The Global Editors Network5and their annual Data Journalism Awards, the European Investigative Journalism & Dataharvest Conference,6the Global Investigative Journalism Conference,7and a plethora of smaller country-based data journalism events organised under the auspices of Hacks/Hackers or Schools of Data, are convening communities of data journalism around the world.

The growing diversity in the practice has opened new “scrappier” models for data journalism with a more explicit public interest mission in countries where small teams of data journalists expose blatant inequality and corruption.

Enduring challenges to the field

Like journalism that relies upon humans as a source, there is no global consensus that the purpose of data journalism is to inform the public about governance issues, reveal corruption, or hold powerful institutions accountable. Despite the extraordinary work produced by publications around the world, some recent research about the state of data journalism paints a grim picture of the field.

Google News Lab’s Data journalism in 20178found that data skills remain specialised and time-consuming to develop. Time pressures prevent aspiring journalists from producing the data-driven stories they believe need reporting. While some interactive features that engage readers have proven wildly popular, the business case for media houses to invest in data teams is still a hard sell in boardrooms.

The editorial decisions of organisations also determine where the lens of data-driven reporting is directed. When researchers at the University of Hamburg evaluated the projects that were nominated for the Global Editors Network’s annual Data Journalism Awards, they found that most of the coverage is on political rather than social issues, and that most of the data used was disclosed by official sources rather than collected by reporting teams.

The failure of data journalism to become just journalism

The data journalism community has exploded in the last decade with more and more media houses now including at least one data journalist on their staff. Yet six years after Simon Rogers, founder of The Guardian DataBlog,9declared “Anyone can do it. Data journalism is the new punk”,10everyone is not doing it. While every reporter now uses a computer and a smartphone in their reporting, the “nerds” have not taken over the newsroom.

Despite the potential of data journalism to harness open data to provide critical context, insight, and actionable information in times of information crisis, some media outlets are turning away because it is difficult, slow, and expensive.11Even after massive projects like the Panama Papers12prompted political upheaval and an examination of global financial oversight, some publishers remain reluctant to take the risk of investing in their own data team.

The unfulfilled promise of automation

A universal complaint among data journalists is the lack of quality, reliable, timely, and disaggregated open data, which, in turn, inhibits opportunities for its use in local journalism. Stakeholders emphasise that government failures to facilitate access to structured, high-quality open data make it onerous for journalists to keep datasets updated or to build relevant apps to monitor spending, performance, and government services. They also highlight governments’ use of loopholes and grey areas in access to information laws being used to hide data, a strategy also employed by businesses. Still, in many countries, the data journalism community has not made use of the limited statistics that are made public, and only a few journalists have requested data through official Freedom of Information Act processes.

There have, however, been some encouraging developments. The Organized Crime and Reporting Project, for instance, provides an Investigative Dashboard13that centralises open data resources and helps reporters avoid repetitive scraping and cleaning. Reuters’ Lynx Insights14guides reporters to relevant data sources by “surfacing trends, facts, and anomalies in data, that reporters can then use to accelerate the production of their existing stories or to spot new ones”. If more media houses focus on making open data an essential source, we may see more automation tools in the future.

Making the business case for data journalism

The sustainability of data journalism is all the more urgent as more legacy media outlets struggle for survival in the digital age. Around the world, only a few newsrooms have made an effort to ensure reporters understand how the business model for their newsroom works. Some outlets need to produce in-depth, value-added investigations that are not available elsewhere in order to attract and retain subscribers. Media companies that rely upon advertising need to create compelling interactives, infographics, and visualisations that attract and retain large audiences.

Economic challenges within the industry have also led to competing priorities within newsrooms with editors emphasising urgency of delivery and data journalists seeking more time to produce complex investigations with statistical rigour. In that context, there is also an enduring debate about whether public interest journalism should move to a non-profit model entirely. If more publishers regarded data, and staff with the skill to report on it, as more of an asset and not as a resource drain, data journalism could fulfil an essential role in the transparency and accountability cycle.

While some outlets, such as ProPublica,15have established data stores to try to monetise the data that they have obtained and cleaned, this is still a nascent strategy. Other outlets are experimenting with offering personalised news apps with mainstream appeal to subsidise their work on longer-term investigations.

Many of the current challenges faced by stakeholders across the journalism world, including the financial pressures that have resulted from the massive disruption of the business model for traditional publishers and the dominance of tech companies in the online advertising markets, stem from a crucial failure to make data an essential, strategic resource for all the journalism produced by outlets.

Inclusion and accessibility for data-driven democracy

With the rise of nationalist, populist political movements in many countries, an increasing number of members of the global journalism community are exploring how to produce more accessible journalism by, about, for, and with marginalised communities.

OpenNews16has taken the lead in this work, proposing that “a diverse community of peers working, learning, and solving problems together can create the stronger, more representative ecosystem that journalism needs to thrive”. These efforts include creating opportunities for journalists of colour and women within an industry that traditionally has been primarily white and male. It also means having a greater focus on supporting local media in rural areas to produce data-driven content relevant to their communities. ProPublica’s Local Reporting Network17and the Bureau Local18are both relevant efforts to accelerate such content production.

In some developing countries, both legacy print and startup news organisations are experimenting with harnessing data to make injustice, inequality, and discrimination measurable, visible, and solvable. Data teams at La Nación Data19in Argentina, IndiaSpend20in India, and Nation Newsplex21in Kenya are devoting themselves almost exclusively to analysing open data to explain policy issues that affect ordinary citizens. Even much bemoaned technical obstacles like weak content management systems, low internet speeds, mobile display limitations, and low-quality data sources have not stopped these journalists from pushing for better socioeconomic policies and evidence-based decision-making.

Nation Newsplex: Inclusivity and an alternative lens

The Nation Newsplex, the data team of the Daily Nation, the highest circulation daily newspaper in Kenya, used data to focus its election coverage on policy not politics during the last presidential elections. Nation Newsplex has a rare combination: a public interest data-driven mandate and an audience with a diverse background. Whether fact-checking economic issues related to small business growth,22foreign direct investment,23and unemployment;24examining public services related to health25and food security;26or analysing compensation for internally displaced people,27the focus of reporting and analysis was on the welfare of citizens, not on the politicians involved or the populist rhetoric that dominated mainstream coverage.

Another common frustration expressed by data journalists in both Western and non-Western newsrooms is the dearth of professional growth opportunities and clear advancement paths. This may play a role in dissuading professionals from more socioeconomically diverse backgrounds from pursuing careers in the media, which, in turn, has negative effects on the industry and leads to errors or blind spots in the journalism itself.

While reporters with data skills continue to be in high demand in many newsrooms, peer-to-peer learning and online courses remain insufficient to meet the growing demand for more advanced data journalism skills. The soaring popularity of the annual NICAR conference in recent years, which features many hands-on workshops, speaks to the demand for skills development. Unfortunately, entry-level data journalists complain of being pigeonholed into roles such as simply creating visualisations for more senior reporters or managing websites instead of producing their own data journalism. Mid-career professionals bemoan the lack of a clear career ladder as many senior-level editors are typically selected from the traditional newsroom.

Data-driven beat reporting

In one of the poorest areas of Pakistan, where self-censorship is rampant, an online publication, Balochistan Voices, used data to take on a corrupt, inefficient government.28In“Balochistan government builds roads while people sink further into poverty”,29Adnan Amir explored Balochistan’s corrupt procurement process. He showed how siphoning off public funds had left Balochistan’s citizens sicker, poorer, and less educated than their neighbours.

In a follow up story, “In Balochistan, five times more people die in highway accidents than suicide blasts”,30published in late February, he offered the government an easy way to save face and improve access to healthcare. Sure enough, the National Highway Authority signed a Memorandum of Understanding31on 18 March to establish five trauma centres along Balochistan’s highways.

Reusability and archiving

An underreported challenge to measuring the evolution of data journalism over the last ten years is the lack of consistent archiving. Interactive journalism, created using a plethora of proprietary tools, technologies, and data formats across the industry, has been inconsistently stored and maintained by publishers, unlike the print and microfiche archives of traditional journalism.

Many data-driven interactives now no longer work or have even disappeared entirely. Archiving news apps and visualisations presents a particular challenge, because they depend on tools and systems that may be managed by external entities. Without a consistent conservation strategy, this first draft of history will be lost.32

In many newsrooms, data units are simply too short-staffed to address pressing issues like archiving or data reusability. The lack of developers or data scientists in the newsroom to clean and maintain key datasets, preserve projects, and ensure data reusability prevents many media houses from investing in the very systems that would make their journalism faster, more efficient, and more impactful.

Journalists have sought to harness open data to inform the public about institutionalised discrimination, inequality, and oppression in an attempt to uncover issues that were invisible before, and now, may become invisible again if journalists stop counting the causes, the victims, and the wasted funds, or if the projects disappear from the internet entirely. Consistent coverage driven by independent editors combined with effective archiving has the potential to foster systemic change.

Data reusability

In the United States, ProPublica was founded to “expose abuses of power and betrayals of the public trust by government, business, and other institutions, using the moral force of investigative journalism to spur reform through the sustained spotlighting of wrongdoing”.33

The independent non-profit’s digital-first journalism not only uses data as a source in its narrative reporting, but publishes the data with the stories and the code that drives the news applications that make that data more understandable.

ProPublica takes government information disclosed directly to the public online, sometimes as the result of a successful public records lawsuit and court-ordered disclosure under the federal Freedom of Information Act, cleans and structures it, and then publishes the open data, newsapps,34code, and stories online.

Notably, ProPublica partners with other news organisations and the public online in collaborative reporting projects that then produce open data, like collecting political ads on social media platforms.

A decade later, the social impact of the release of public records as open government data by federal agencies can be measured by the social impact of ProPublica’s data-driven reporting.

ProPublica’s Local Reporting Network,35launched in early 2018, seeks to address the dearth of local investigative reporting by using its expertise in open data and investigative reporting to strengthen and amplify local reporting. Investigations lowered the barrier for accessing large national and state-level datasets on issues such as conflicts of interest, housing, mental healthcare, criminal justice, and workplace safety.

The new project has a special focus on state government accountability reporting. Recent research by Poynter finds that the great majority of Americans trust their local news sources.36Making open data sources relevant to local communities is an emerging strategy for reinvigorating democratic participation at the local level through journalism.


In 2019, the journalism industry around the world is grappling with multiple challenges after quicksilver societal changes in how information is being created, shared, and discovered by humans everywhere.

Public trust in mass media is sinking to historic lows in the United States,37fuelled by the spectre of an American president decrying “fake news” and describing the press holding his administration accountable as the “enemy of the people”. To close that deficit, editors and producers will need to “optimize for public trust”, as New York University professor Jay Rosen has proposed,38prioritising verification, openness, listening, engaging, co-production, and diversity. These principles echo many of the conversations and frustrations faced by data journalists who are navigating and shaping the many ways in which communities are consuming and engaging with the news.

Technology companies like Facebook, Google, and Amazon now dominate the markets for online advertising and information discovery, sharing, and distribution by operating vast digital platforms for billions of users around the world. Media companies face a dramatically different landscape for their work and its publication. News organisations need to find readers, viewers, and subscribers, and get their attention to inform and engage them. National, state, and city governments are publishing open data, usually driven by only a nominal desire to inform the public about public business.

It’s a difficult historic moment for journalists and their publishers who face these challenges at a time when the industry itself has been so diminished. Tens of thousands of journalists have lost their jobs, leaving local governments without partners for their online disclosures. At the same time, far too many cities and states have engaged in “transparency theatre”, engaging in openwashing with low quality data disclosures. Officials cloak themselves in the rhetoric of “open data” while denying freedom of information requests from the press.

Worse still, less democratic states have shown how open data programmes and policies can exist without open government, where disclosures are focused on driving economic impact or even weaponised against marginalised populations. The levels of press freedom, strength of access to information laws, and the capacity of the government institutions that exist in a given country heavily influence whether journalism based upon open data will lead to positive societal changes.

The transparency and accountability that governments publishing open data cite as a key goal will only occur when journalists do not face hazards from seeking information under the law, publishing reports that reveal corruption, or bearing witness to human rights violations or abuses of power. In that context, the local journalists who report on that data should be viewed as the community’s guardians, ensuring that the rights of the governed are upheld by informing the public about the public’s business.

Alexander B. Howard

Alexander B. Howard is an independent writer, digital governance analyst, and open government advocate based in Washington, DC. He is the former deputy director of the Sunlight Foundation and the author of ‘The Art and Science of Data-driven Journalism’ from the Tow Center for Digital Journalism at Columbia University, among other publications.

Eva Constantaras

Eva Constantaras is an investigative data journalist specialized in establishing data units in mainstream media in developing countries. Her teams operate in Kenya, Afghanistan, Pakistan and Central America covering public interest topics ranging from extractive industries and access to justice to reproductive rights and food security.

Further Reading