Municipal data: open or closed?

Good decisions are based on accurate information. Without meticulously collecting data and analysing it, we are reduced to making guesses, biased by our own personal experiences. When decisions are small, mistakes are less expensive, but as the impact of a decision grows, so does the cost of incorrect planning due to poor access to data.

We know this and yet, despite the masses of data collected by government, it is often not accessible, either by the public or even across departments. The open data movement is attempting to address this issue by advocating for a change in mind set by organisations and especially government. Rather than treating data as a commodity, it should be considered as an essential service, along with all the other services provided by government.

In the early 1980s the international software freedom activist Richard Stallman recognised the value of developing free software, ultimately culminating in the open source movement. The liberation of software, coupled with the growth of the Internet dramatically accelerated the rate of innovation in the software industry. Previously, software companies developed all software components internally, resulting in much duplication of effort and high development costs.

As open source software entered the common lexicon, a large percentage of the software stack of most software is now comprised of best-of-breed open source software. Through the collaborative development of software in public space, open source software leverage the skills of high-end programmers volunteering their time. The result is software that is of a higher quality than internally built systems.

In this context, the open data movement was born. Proponents believe that much data is lying dormant in spreadsheets and databases owned by government, civil society, academia and the private sector. By democratising data, innovative uses can be found which would otherwise not exist. According to the open definition, open data is simply defined: “Open data and content can be freely used, modified, and shared by anyone for any purpose”. A more detailed definition found at http://opendefinition.org/od/  implies that data is open if it can be used by anyone for any purpose, this includes commercial enterprise.

How is this relevant for local government? By opening their data, local governments are saying to their constituencies, “Let’s talk”. Open data promotes an engaged citizenry where residents are able to voice their concerns and work with government to highlight problems areas and ensure effective governance.

Data is important in other areas too. Often data is held in silos where even other branches of government cannot access information. Obtaining data is often a slow process as requests need to be approved by decision-makers. In cases where it is not clear whether data should be released, it is easier to refuse. As a result, decision-making cannot be fully informed.

Finally, when data is made publicly available, entrepreneurs are able to make better use of the data where government does not have the capacity to do so. For instance, consider Go Metro (http://gometro.co.za), a company that provides information on public transport to commuters.

The Go Metro example

Whereas transport and routing information is often only available for a single mode of transport such as Metrorail or the local bus service, Go Metro has been able to collate transport information for busses, trains and other public transport options across municipalities. They also provide information on real-time delays and help passengers optimise their travel experience. Companies such as Go Metro have the potential to reduce the load on government call centres by servicing many of travellers’ information needs. By dentifying this niche, a new business has been created which creates employment and ultimately results in increased tax revenue.

This approach contrasts with the traditional model of contracting a single service provider to provide the service. As data is made freely available, the barriers to entry are reduced, thereby encouraging even small enterprises to compete with larger business. By creating an environment where business is able to compete in the space, higher quality services and tools are provided to ultimately benefit the consumer. Procurement and project oversight are also simplified since these services are provided by external parties. And all these benefits are accrued with minimal impact on municipal budgets.

Cynically, the public often believes that government refuses to release data in order to obscure corruption. In most cases, this is not the case. In reality, governments do not consider the provision of data as an integral part of service delivery. Departments do not have adequate internal processes to manage the collection, processing and publication of data. Also, an attitude of data ownership exists within many government structures where bureaucrats talk about not wanting to release “our data” when often there is no basis for refusal.

A common retort is that requestors should submit a Promotion of Access to Information Act (PAIA) request in order to obtain data. Rather than representing a formal procedure for releasing data, insisting on a PAIA request should be considered as a breakdown in communication and ultimately trust between government and the public. PAIA is a cumbersome legal process and should not be used in lieu of a well-designed process for making data available. This is especially true when the data in question rightfully belongs in the public domain in the first place.

Countries and cities the world over have recognised these benefits mentioned above. Starting with the American open data portal (http://data.gov) and the British equivalent (http://data.gov.uk), dozens of countries (http://index.okfn.org/) and even more cities have developed their own open data portals and actively maintain open data portals. These portals often take the form of a repository which collates all datasets made available by government. They are actively maintained with data being added on a regular basis. More recently, in January 2015, the City of Cape Town became the first city in Africa to develop an open data policy. It has also recently launched its own data portal (http://web1.capetown.gov.za/web1/OpenDataPortal/).

So what’s next?

Unfortunately, it is not as easy as simply collecting data and posting it on a website. In order to make data available, government needs to consider the political and legal implications of making data ‘open’. A policy is required to describe what data should be made available and more importantly, what data should not. Ideally, data should be made ‘open by default’ where all data should automatically be available. Justification needs to be given for why a dataset should be closed. The City of Cape Town has approached this topic more cautiously and crafted its policy on the principle of ‘closed by default’.

Processes around the release of data should be carefully considered. Data often grows and changes as new information becomes available. Almost always, data is dirty and needs to be cleaned. While data may be hosted on the open data portal, the original custodians of the data (departments that originally created the datasets) should remain responsible for them. More importantly, they should be using them internally. This ensures that data is accurate and up-to-date.

Another question that arises is: How much data should one publish and what should the selection process be? Once an organisation has been bitten by the open data bug, there is a temptation to try to release as much data as possible. While in an ideal world, this should be our goal, in reality, maintenance of data is costly. Collection, cleaning, updating and overall management take time. As the collection of published data grows, so does the cost of maintaining it. If data that is released is not ultimately used, enthusiasm for the open data project will quickly wane. For this reason, the initial focus should be on high-quality datasets first. Some criteria to consider are:

  • Data that is readily available.
  • The dataset has an identified custodian who is able and willing to actively maintain the dataset.
  • The data has specifically been requested by end users.
  • The data is clearly not sensitive and is unlikely to place the organisation at risk.

Clean data

When making data available, some effort must be made to ensure that the data is clean with consistent coding, spelling mistakes corrected and outliers examined. It is often expensive to ensure that the dataset is 100% perfect and so perfection should not be sought. Instead, data should be released with a disclaimer that the data may contain mistakes. In addition, the public should be encouraged to submit any corrections that are found in the data. The data custodian should have the capacity to receive these corrections and update the dataset. Even this simple process encourages collaboration between the public and government.

As alluded to in the criteria for dataset selection above, a demand-driven approach to releasing data is more cost effective than investing effort in publishing data that no-one will use. A mechanism for receiving requests for datasets should be made available where the public submits requests for data.

What about the practicalities of publishing data

Typically, there are three options currently available.

  • The first option is to use CKAN (http://ckan.org), an open source data portal developed by the Open Knowledge Foundation. This software is used by both the American and British open data portals (although their implementations are heavily customised). Although it is possible to maintain a CKAN platform within the City, it is better to contract an external provider such as OpenGovGear (http://www.opengovgear.com/). This Canadian organisation provides commercial support for CKAN, which is likely to be a much cheaper solution than hosting the platform internally.
  • An alternative is Socrata (http://www.socrata.com), an American-based company that provides a premium open data portal offering. Socrata is used by dozens of cities and other governments and organisations. New York State for instance uses Socrata to host over 2 400 datasets (https://data.ny.gov/). Apart from a data portal, Socrata also provides visualisation tools that allow users to explore datasets without the needing specialised tools. This includes the creation of graphs and even laying data out on a map.
  • The City of Cape Town has chosen to build its own portal internally. This system is an extension of their existing IT infrastructure and provides a seamless experience from the main City website to the open data portal. The cost of this approach is likely to be high since in-house (or even externally tendered) development requires expensive software developers to build software. The final product is also unlikely to be as good as the two options suggested above as additional features such as data visualisations are usually prohibitively expensive to develop internally.

Concerns are often raised about data literacy and how ordinary citizens can benefit from making data available. After all, very few people have access to Internet and laptop computers, never mind the skills required to analyse data. In reality, only a small percentage of users access government data on a regular basis. These users however are generally considered to be what is termed “info-mediaries”. They include civil society, journalists, researchers, data analysts and private enterprise. They are the primary users of data and are able to package it in formats that are usable by the data illiterate.

Consider journalists who are able to use data to write stories or private enterprises that develop products, academics produce research and civil society who use data to advocate for better service delivery. These are the users who will translate numbers in spreadsheets into information that can be used by residents to make informed decisions.

The case for open data has been made globally. In South Africa, bodies in all spheres of government are already considering how they can become more transparent and how to release their datasets to the public. For a government to be truly open, it needs to share its data with its constituency and invite them to participate as active citizens.

About the author

Adi Eyal runs Code for South Africa, a non-profit open data advocacy organisation which provides advice to governments and other organisations on how to make their data available. They also develop tools and applications that promote informed decision-making by leveraging open data. You can find out more at http://www.code4sa.org.



