Since the early 1960s, we have seen an increasingly vocal response to unmitigated anthropogenic impacts on the environment.1Although there were earlier activists and movements, the 1960s marked the period when disparate voices started to coalesce. Environmental activists started conceptualising environmental problems as political matters, and, in doing so, using scientific knowledge as part of their armament. This led to a significant change in policy-making with regard to the use of scientific outputs and knowledge as supporting evidence. Data and information have become forms of power that are used to drive or change political discourse on issues affecting the environment. Knowledge derived from science, coupled with activism, played a major role in getting governments to endorse the Declaration of the United Nations Conference on the Human Environment in Stockholm in June 1972.2It was at this conference that governments accepted that anthropogenic impacts on the environment were a reality and that more research was needed to understand the causes, impacts, and mitigation measures. Since that time, we have had subsequent international environmental engagements that rely on scientific knowledge to guide activism, decision-making, and policy development.
The 1990s brought the digital revolution. Data generation and exchange became easier, and, by 1996, the internet had become mainstream, allowing for easy digitisation and the dissemination of data. Environmental data became easier to acquire and to share. Although access to environmental data, information, and knowledge is not a recent phenomenon, over time the emphasis for open access has shifted from information and knowledge as products to include the underlying elements: the data that comprises these products.
Environmental concerns are all-encompassing, ranging from microbial research through to large planetary weather systems research. Open data provides an opportunity to promote review, transparency, accountability, participation, and the identification of knowledge gaps. The growth in environmental open data portals to support research, advocacy, decision-making, and communication indicates the importance of sharing data on a range of environmental issues.
Earth, air, and water
The following sections present an overview of the progress on open data in relation to four key environmental domains: climate change, air quality, biodiversity, and water resources.
Open data and climate change
Known research into climate change can be traced back to 1824, when Joseph Fourier3noted the warming of the Earth. In the 1890s, Swedish scientist Svante Arrhenius4made the connection between carbon dioxide and rising temperatures, the “greenhouse effect”. It took another century of research, publications, and advocacy before the issue secured global attention.
The Intergovernmental Panel on Climate Change (IPCC) has achieved great success in putting climate change on the international political agenda and ensuring that almost every national government is paying attention to the issue. The data underpinning IPCC research comes from various open sources, and there are robust processes in place to ensure data integrity. The transformation of statistical climate data into easily digestible visuals through data visualisation, such as maps, also helped convey the importance of the issue to the general public (see Figure 1). The IPCC Fourth Assessment Report provided credible evidence to gain the necessary political traction;5however, the identification of “major errors” in the main report had some sceptics questioning its veracity. The greatest error related to the incorrect referencing of 2035 as the date by which the Himalayan glaciers will have melted; however, a correction was made after a review of the source data, and the date estimate was changed to 2350.6Other perceived “errors” were not actual errors, but rather questions regarding the validity of including content that had not been peer reviewed.
The 4th IPCC Assessment Report
The main criticism of the 4th IPCC Assessment Report has been that errors can be attributed to the referencing of non-peer reviewed literature, such as a World Wide Fund for Nature report, as well as various grey literature. The outcome of the criticism has had two positive effects: 1) the correction of the errors and 2) refinement in the process and structures to review data to support any claims the IPCC makes. In an open data environment, robust and well-documented data management processes are essential for credibility.
Due to the political, economical, and social visibility, as well as the importance of climate change research, a number of open data platforms have been created as detailed in Table 1, which also demonstrate various levels of open data licensing.
|Name||Year launched||Core focus||Data licence|
|IPCC Data Distribution Centre http://www.ipcc-data.org||1998||To facilitate the timely distribution of a set of consistent up-to-date scenarios of changes in climate and related environmental and socio-economic factors for use in climate impact and adaptation assessment.||OECD Principle of “openness”|
|World Bank Climate Change Knowledge Portal http://sdwebx.worldbank.org/climateportal/||2010||Hub for climate information||Various CC licences|
|Southern African Science Service Centre for Climate Change and Adaptive Land Management http://www.sasscal.org/||2012||To host, safeguard, and make data and information resources available openly, yet ensure the integrity and ownership of the contributing parties.||Open access to data (incl. climate change and weather data) for southern Africa.|
|European Union Copernicus Climate Data Store https://climate.copernicus.eu||2018||The Copernicus Climate Change Service (C3S) will combine observations of the climate system with the latest science to develop authoritative, quality-assured information about the past, current, and future states of the climate in Europe and worldwide.||Free of charge, worldwide, non‐exclusive, royalty free, and perpetual.|
Table 1: Open data platforms to access climate-related data
Climate change open data portals present one of the best case studies of how open access to data, and the resulting scientific and advocacy collaborations, has led to a major shift in public understanding of science-backed policy and to large financial investments in further research and mitigation. Although data on the monetary investment and outcomes of mitigation measures is more limited, highlighting a gap still to be filled, a number of projects are now tracking climate-related financing. The National Determined Contributions Explorer aims to publish national climate change mitigation plans and data on progress as the means to hold governments accountable.7Transparency International (TI) also publishes data on the use of global funds to tackle climate change impacts,8noting that the amount pledged by national governments will be running at USD 100 billion per year by 2020, and set to increase over time. TI has also been exploring the adoption of the Open Contracting Data Standard to ensure transparency and accountability in the contracting chain for climate-related finances.9
Open data and air quality
Air pollution has been an historical concern since the industrial revolution. However, it was only in the 1970s that scientists made the link between air pollution and its impact on human health. It was also during this decade that the United States and the United Kingdom started to implement regulations to curb air pollution. Today, policy-makers rely heavily on air quality data to inform policy review and development.
Air quality monitoring requires the implementation and management of monitoring stations, which may take the form of real-time digital instrumentation or manually monitored diffusion tubes. While governments often collate and publish this data, the 2016/2017 Global Open Data Index ranks the openness of air quality data by national governments as very low with only 8% of governments sharing air quality data as accessible open data.10However, several initiatives are now working to aggregate and analyse air quality monitoring from around the world.
The World Air Quality Index (WAQI), created in 2007 by a team in Beijing, provides access to open air quality information from more than 10 000 stations in 800 cities from 70 countries.11Only data on particulate matter of PM2.5/PM10 and greater from official government or professionally maintained measuring stations is published.12This data is validated through neighbourhood and historical comparisons. The data from this platform conforms to the data requirements for reporting on the Sustainable Development Goal (SDG) health-related indicators,13and is, therefore, able to inform government policy and support SDG reporting obligations.
The OpenAQ initiative also aggregates data from government monitoring stations and is exploring the inclusion of data from citizen-run low-cost sensors. With a strong open source and open data ethos, and an emphasis on permanently archiving data, the project is a key example of data being used to influence people’s behaviour and government action.14
Both OpenAQ and WAQI offer maps of the sensor networks they draw upon. A cursory glance at these reveals a dearth of measuring stations in Africa. This is supported by research conducted by Wetsman15that notes South Africa is the only country in Africa with an air-quality monitoring programme. The map (Figure 2) below illustrates the global distribution. The lack of data collection and open data in certain regions will, therefore, negatively impact research and mitigation-related actions. Future work in this sector will have to focus on extending measures to collect data from more locations in developing countries.
Open data and biodiversity
Biodiversity is about the variety of life on earth. Typically, biodiversity data covers genetics through to landscapes and all the floral and faunal species in between. Many open data sources exist, ranging from the Biodiversity Heritage Library (BHL) and the Encyclopaedia of Life (EoL) to the Global Biodiversity Information Facility (GBIF). As an example, GBIF collates and shares over 1 billion biodiversity records from more than 1 400 institutions, covering the globe.16 Figure 3 illustrates an extract from the GBIF portal of the available open biodiversity data for Niger where the 83 449 recorded occurrences contribute toward this resource. The general conclusion is that data collections on biodiversity held at the local, regional, and international level are vast and very often made available under open access licences.
While these datasets may be valuable at a local level or thematic scale, it is in the connectedness of this data that the true value is found. The ultimate goal of this data is to answer overarching questions on ecological interactions and interdependencies within the biotic and abiotic environment at different scales. This can create major challenges for data-sharing infrastructures, requiring systems, standards, and collaborative mechanisms to enable the discovery of data and to manage information on provenance. Many initiatives, such as the Biodiversity and Protected Areas Management Programme (BIOPAMA),17are now actively integrating the collation and collection of data into their project designs to encourage open data sharing. Funders are also playing an important role in creating funding conditions to share data. For example, the JRS Biodiversity Foundation18and many other grant-making agencies are including conditional clauses to enforce the free sharing of data collected as the result of grant funding.
Generally, the biodiversity community has self-organised to limit the overlap in data collection and management. Accordingly, organisations, such the Internal Union for the Conservation of Nature, BirdLife, and the World Conservation Monitoring Centre, have adopted specific focus areas for the type of biodiversity data collected as part of their project work, assessments, and other related activities. These organisations also play a very important role in supporting national reporting obligations toward the Aichi Biodiversity Targets19and the SDGs.20It is important to note that not all biodiversity data is considered to be open data. BirdLife International, for example, has protocols that restrict access to certain bird data that it deems sensitive, such as nesting sites. The aim is to protect species from local or even global extinction as a result of poaching, illegal hunting, collection, or intrusive behaviour.
Open data and water
Water is a basic human need, and access to clean water is becoming a major global concern. Climate change has had a significant impact on rainfall patterns, most notably in Sub-Saharan Africa. Changing rainfall patterns, coupled with poor management of existing water supplies, pose major livelihood challenges to millions of people. Those most affected by the lack of clean water are women and children in developing countries.21
The water sector has a fair number of dedicated data portals. The United Nations Educational, Scientific and Cultural Organization (UNESCO) has recently launched22the Water Data Quality Portal to provide access to related global datasets.23The Global Environment Monitoring System for freshwater (GEMS/Water) provides data on fresh water quality intended to support scientific assessments and decision-making related to water management.24Sharing Water-related Information to Tackle Changes in the Hydrosphere - for Operational Needs (SWITCH-ON), a European Union (EU) initiative, provides access to water-related information to assist in managing water in a sustainable manner.25The International Water Management Institute’s Water Data Portal provides access to global water-related information.26The European Commission, using Google Earth Engine, has developed the Global Surface Water Explorer, which maps the location and temporal distribution of surface water for the period 1984–2015.27Given the many available data portals, it is interesting to note that the Global Open Data Index28still ranks the openness of water quality data from national governments as very low with just 1% of index surveys able to access open data on water quality direct from governments.
Access to clean water is an immediate and critical concern. This is especially true in rural areas, where water contamination can affect human lives, livestock, and crops. The data currently collected at the global level is analysed using remote sensing tools coupled with water quality information obtained from available sensors. The challenge ahead will be to expand the collection of water quality information, using the power of technology to immediately communicate changes in water provision or quality. Therefore, the future of open data within the water sector relies on developing technology that can be used in the most remote locations in developing countries. Through the application of technology, the data collection activities will need to improve to near real-time with higher levels of accuracy to assist emergency response activities and policy development.
Cape Town drought
Since 2015, Cape Town has experienced an unprecedented drought, leading to serious water shortages. Although many causes have been postulated, and blame apportioned, defensible evidence was sought to understand whether the crisis was caused by less rainfall, increased evaporation, increased agricultural and urban use, or poor management. A study by the Climate Systems Analysis Group at the University of Cape Town, using open data, found the main cause of the water crisis to be a result of low rainfall between 2015 and 2017.2930
Open datasets were used to create two separate maps to analyse the temporal levels of the Theewaterskloof Dam, the largest water source in Cape Town. Figure 4 shows that the dam levels were fairly constant for the period 1984–2015. Figure 5 illustrates the rapid decline of water volumes between 2016 and 2018. These two different datasets, using different visualisation techniques, complement the UCT study that found exceptional low levels of rainfall since 2015 had resulted in the water crisis.
Opportunities and challenges
Stakeholders and sustainability
Governments, civil society, business, and academia are the four major groups driving the environmental open data agenda. Governments have been changing policies and legislation to support open data,31mostly as the result of pressure from civil society and academia. Traditionally, business is an active user of open data, but is not widely known for the release of open data.
Keeping open data portals open requires resources. Wealthier countries typically fund their own environmental open data initiatives; however, for developing countries, continuous access to open data is very much dependent on available funding to generate, curate, and publish datasets. Typical major funding sources include the World Bank, the United Nations, the Global Environmental Facility (GEF), bilateral foreign aid, and many private donors. This presents a particular challenge for emerging economies, where data management is linked to project-based funding and the data becomes “lost” or “orphaned” after a project has been completed. Therefore, the true value of the new data is not realised and the investment is not able to generate ongoing value. New projects then re-invest in data collection, often collecting the same or similar data, and the cycle repeats itself.
The pathway to sustainable data management practices must be multi-pronged and not rely on any single approach. To be successful in the long-term, the management of open datasets will require investment from host agencies in the form of money or in-kind resourcing, such as staff, infrastructure, or content. It is also important that donor funding be moulded to support the needs of the specific country or agency and to ensure that data collection and management is not responding solely to short-term donor agendas. The funding model used must be structured to build internal data management capacity within recipient organisations that will have a legacy impact after the temporary needs of a project have been met. In this manner, internal capacity and resources can be developed over time as the result of donor support. Importantly, a fresh take on the role of the private sector is also needed in order to evaluate how it can enhance the shared value of public datasets used by business as a means to contribute to the public good. One way is for private sector data users to return enhanced datasets to governments for publication; another approach is for the private sector to provide expertise and infrastructure to support the management and publication of data.
Collaboration, cooperation, and benefit sharing
The environmental sector has a history of collaborating toward common goals. An example of this is the initiative to combat illegal wildlife trafficking, where environmental actors collaborate with non-environmental agencies, such as Interpol, by exchanging critical data. International conservation organisations, such as the World Wildlife Fund and the International Union for Conservation of Nature, share their data to drive cooperation, transparency, and accountability, and to encourage community review of quality. The collections of natural history museums and herbaria are being digitised and placed in the public domain with the aim of the data being used to aid conservation and management.
Collaborations like these can also be extended to the management of open data. The Atlas of Living Australia32is an international leader in publishing collated open biodiversity data with more than 76 million records made freely available from 311 different data providers. Citizen science is becoming very popular and it is also adding volumes of data to established scientific collections. Through collaboration, environmental organisations are able to secure a range of benefits, including shared skills, experts, and infrastructure.
The award winning Cybertracker33app was created to provide the indigenous Kalahari San with technology to capture complex field data. The technology has been developed to be intuitive and to allow non-literate people to record data and knowledge for scientific conservation and management applications.**
Indigenous knowledge, knowledge passed on from one generation to the next, can advance scientific research and improve the public image of science. However, this type of knowledge is often viewed as “unscientific” although it is the basis upon which we built our existing scientific knowledge. Ironically, we have seen the appropriation and exploitation of Indigenous knowledge on the use of plant-based natural resources by multinational corporations: a phenomenon known as biopiracy.34The World Intellectual Property Organization is currently working on international legal instruments to protect Indigenous knowledge and ensure appropriate benefit sharing.35
Many new companies have been established using public open data. As noted earlier, the private sector is an active user of public data, and the potential exists to create valuable public–private partnerships to further advance the private sector as a contributor of open data. Recognising the value of sharing data as the means to stimulate innovation and build positive public relations, the private sector is becoming more transparent. While the overall open data market value is projected to be in the region of € 286 billion by 2020,36the exact potential value of open environmental data is not known. However, it is reasonable to assume that the value of this open data is significant. In 2013, the Climate Corporation, a private company built on open climate data to support farming decisions, was sold for USD 1.1 billion to Monsanto, a multinational agricultural company.
Further evidence on the use of environmental data in the private sector comes from the Open Data 500 project,37which provides information on private companies using government open data through studies in six countries. The project seeks to map the economic and social impact of government open data by looking at the businesses using it. Figure 6 illustrates the number of businesses per country in the environment and weather sector. Canada tops the list with 45 businesses, followed by Italy (24) and Korea (16).
Standards are necessary to define acceptable quality metrics for data, ensure consistent use, and to facilitate data sharing. The lack of common standards negatively impacts the credibility, use, and exchange of data across the environmental sector.
While environmental data collection has become easier, the development and maintenance of metadata has become increasingly laborious; however, without metadata, the value of the data erodes and data interoperability becomes extremely difficult. Making environmental data interoperable creates the capacity to share data and important indicators across systems regardless of geographic boundary, vendor, or organisation, but this requires consistent adherence to standardised metadata, ontologies, and vocabularies for the description and organisation of the data. The Committee on Data (CODATA) of the International Council of Science, established in 1966, is actively working toward coordinating data standards among scientific unions at the international level and has made major steps in embedding open data principles in their work.38
The lack of skills, expertise, and equipment within governments needed to meaningfully exploit the vast quantities of available environmental open data is also a major constraint in addressing environmental challenges, especially in developing countries. It is widely noted that developing countries will be the most impacted by climate change with one (proprietary) index of climate change vulnerability identifying the Central African Republic, the Democratic Republic of the Congo, Haiti, Liberia, and South Sudan as facing the greatest risks.39Many developing countries are also home to vast natural resources that are under the pressure of exploitation or destruction. These very countries are under social and political pressure to protect their natural resources while simultaneously under economic pressure to grow their economy.
Providing capacity building for developing countries has been on the developmental agenda for many years and has taken the form of institutional, individual, and infrastructural interventions. Very often, capacity development has been focused on the needs of donor-funded projects, limited to the funding period or conditions and not structured around government-led interventions that can sustain impact. Linked to this technical capacity constraint are the political challenges that face institutions intending to make environmental data openly accessible. For example, the Government of Tanzania has recently withdrawn from the Open Government Partnership.40The systemic impact of this decision is to further limit disclosure of data into the public domain, restricting capacity development in publishing data, hindering innovation in using open data, and limiting potential private sector expansion using open data.
Generally, although substantial expertise exists within the research community, the broader environmental sector, including government and civil society actors, is lagging behind in terms of applied data management expertise. This has a profound effect on the quality, quantity, access, and frequency of data that can be released as open data, and further frustrates attempts to use data to mitigate environmental damage and the negative impacts of climate change.
Open data plays a crucial role in advancing our collective efforts to ensure sustainable management of all our natural resources. It has fostered collaboration that would not have been possible 30 years ago. It has allowed scientists to review the veracity of their work and hold them accountable for their conclusions, as it does politicians for their decisions. Furthermore, it has also supported instances of greater civil participation in the public and private sector spheres with the potential to give poor and marginalised people greater power through knowledge. Open data has also helped to drive the development of innovative products and services, not only in developed countries, but also in developing countries, addressing issues of environmental conservation, skills development, and economic growth. Overall, open data has shown revolutionary potential, although the measurement of impact remains difficult.
However, there is still much effort needed to ensure that environmental data becomes fully accessible to address environmental challenges. The advancement of the environmental open data agenda must happen at both the macro and micro levels. At the macro level, changes are necessary on an institutional scale to challenge closed governments to open their data. The collaboration between thematic sectors must be encouraged to avoid data duplication and gaps, as well as to maximise the value of open data. A coherent and collaborative approach must be adopted to address data gaps, specifically in developing countries. These gaps can be filled through adopting vendor and ‘donor agnostic’ data management systems, integrating data sharing agreements for funded programmes, and establishing formal data sharing programmes with the private sector without compromising personal information or trade secrets. The development of case studies is a powerful mechanism to encourage sharing as it can illustrate effective processes and the value of open data.
At the micro level, institutions should develop formal or structured data management strategies that can proactively lead to open data. Data management strategies must always be focused on organisational needs and address standards, quality, applications, and capacity building.
Environmental open data has helped shape national and international policies and decisions. Notwithstanding the challenges of getting governments and private sector entities to share data, the volume of open data is increasing. Our task is to ensure that the release of environmental open data is needs-based, user friendly, and of sufficient quality to address the local, regional, and global challenges in developing a sustainable future.