OGGO Committee Report

If you have any questions or comments regarding the accessibility of this publication, please contact us at accessible@parl.gc.ca.

PDF

CHAPTER TWO: OPEN DATA BY DEFAULT

The first open data principle – Open Data by Default – encourages the government to foster expectations that government data be published openly while continuing to safeguard privacy. As noted in the G8 Open Data Charter, in certain cases, “there are legitimate reasons why some data cannot be released.” These reasons are listed in G8 Open Data Charter – Canada's Action Plan and are mainly related to privacy, security and confidentiality. Throughout the Committee’s study, witnesses discussed the implementation of the open data by default principle, the creation of an inventory of datasets, the prioritization of certain datasets and the protection of personal information.

A. Implementation

Many witnesses emphasized the importance of the open data by default principle. In fact, some witnesses reasoned that government data is a public asset, and therefore should be published openly. Ray Sharma, founder of XMG Studio Inc., who also contributed to the Government of Ontario report entitled Open by default: A new way forward for Ontario, stated that even though open data is an intangible asset; it can be just as valuable to the public as a physical asset.

In the past, certain federal datasets were only available to the public on a cost-recovery basis. Throughout the study, the Committee heard from many witnesses who acknowledged the federal government’s initiative to provide free access to certain federal datasets. For example, Ted Mallett, Vice-President and Chief Economist for the Canadian Federation of Independent Business, supported the decision to make data from Statistics Canada’s CANSIM database available free of charge. This data is also available on the federal government’s open data portal. As a result, Mr. Mallett noted that researchers can now analyze this data to a greater extent. Some witnesses also agreed that datasets should be made available for free and accessible within quicker deadlines.

Open data by default requires a cultural change in government. A representative from the Government of British Columbia explained that if we anticipate that our data will be shared, then from the outset, we will design our data so that it can be usable by all, which will essentially reduce red tape.

As of June 2014, there were over 200,000 datasets available through the federal government’s open data portal, the vast majority of which were geospatial data from Natural Resources Canada. The other data providers were primarily federal departments and agencies, with Statistics Canada being the second largest in terms of datasets. As well, some Crown corporations have provided data through the open data portal. However, Crown corporations will not be included under the directive on open government, as confirmed by a TBS official.

B. Inventory of datasets

The CIO of the Government of Canada indicated that the upcoming directive on open government would require departments to compile and publish an inventory of all datasets that they possess, provide it to the TBS and gradually publish those datasets on the federal government’s open data portal. This directive will be issued by December 2014. Officials from the governments of Ontario and Newfoundland and Labrador explained that they were already building an inventory of their datasets. The representative from the Government of Ontario added that it would be useful to have such an inventory at the federal level.

Several witnesses agreed that there is a need for transparency in relation to the principle of open data by default. According to Barbara-Chiara Ubaldi, E-Government Project Manager for the OECD, “there is a need for transparency in the actions taken, in the case of Canada, by the cabinet office in relation to which data sets to open.” Several witnesses mentioned the U.K. as one of the leaders in the development of its governmental open data initiative. One of the U.K.’s practices consists of requiring each department to report its progress on open data through the Cabinet office, which then publishes reports on departmental progress.

C. Prioritizing the release of datasets

One challenge to implementing the principle of open data by default is the resource constraints that governments face. A representative from the Government of British Columbia explained to the Committee that because they had limited resources to verify data quality, they had to pick specific datasets to publish. An official from the Government of New Brunswick also emphasized that there is a need to consult with the public and the industry to know what data constitutes a priority.

When governments release open data, they often start with the datasets that are already publicly available and can be easily added to their open data portal. Denis Deslauriers, Director of the Information Technology and Telecommunications Service for the City of Québec, said that that his city chose to prioritize the release of data that would be most useful to citizens. Along the same lines, an official from the Government of Newfoundland and Labrador mentioned that his government prioritizes the release of data on its portal based on the number of requests by users. In the federal context, a Health Canada official said that the Department has two criteria for the prioritization of data: relevance to the Health Canada mandate and strategic outcomes; and responsiveness to what users want.

D. Privacy and confidentiality

Government data often include personal information, such as data related to an individual’s income, education and occupation. However, when data are published, they have to be aggregated in such a way that no individual or organization can be identified. Some witnesses raised concerns about confidentiality issues. For example, Mr. Sharma warned that, no matter the aggregation method used, nothing is absolute. In some cases, there is still a small risk that personal information could be identifiable through the release of a dataset.

An official from Citizenship and Immigration Canada observed that many techniques can be used to protect privacy, such as aggregating data by categories (e.g., income ranges), rounding data or masking certain values. All federal departments and agencies that appeared before the Committee reiterated having similar techniques and viewing this aspect as a fundamental part of their work. An official from Statistics Canada specified that the agency does not release public use microdata files on the federal government’s open data portal because additional licensing restrictions apply. These restrictions are in place to ensure that these microdata files are not linked with other files, which could put confidentiality at risk.

A TBS official pointed out that their colleagues in the U.S. and the U.K. are developing new technologies to anonymize data. An official from Health Canada said that the data published by the department were only aggregated data, not individual data, and therefore did not need to be anonymized. Mr. Stirling remarked that in the U.K., there is an organization called the UK Anonymisation Network, an independent group that helps to ensure that all the necessary steps have been taken before any large dataset is released.

According to Ms. Ubaldi, “in order to protect privacy it is extremely important to have clear guidelines for the public servants.” She explained that public servants are key actors in the open data ecosystem, and therefore it is essential to train public servants and raise their awareness of breaches of privacy that may emerge from a number of actions they can do in relation to open data.

With respect to open data by default, the Committee recommends that:

RECOMMENDATION 1

The Government of Canada continue to implement its open data action plan and report back to the Committee on its progress by 31 March 2015. In addition, the Government of Canada should report back to the Committee on the implementation of its commitments in relation to the G8 Open Data Charter.

RECOMMENDATION 2

The Government of Canada should make its datasets available by default to the public free of charge through its open data portal.

RECOMMENDATION 3

The Government of Canada should examine the possibility of including Crown corporations in the list of organizations covered by its directive on open government.

RECOMMENDATION 4

The Government of Canada, in its directive on open government, should require departments to publish an annual progress report with respect to their release of datasets on the open data portal.

RECOMMENDATION 5

The Government of Canada, in its directive on open government, should require departments to document the reason for which a particular dataset will not be released on the open data portal and publish this justification as part of its inventory of datasets.

RECOMMENDATION 6

The Government of Canada continue to take all precautionary measures to ensure the confidentiality of data, using the most current techniques to ensure that information published on its open data portal cannot be linked to a particular individual or organization. In addition, the Government of Canada should consider engaging an independent organization to verify whether all the necessary steps are taken to ensure confidentiality of data before its release on the federal government’s open data portal.

RECOMMENDATION 7

The Government of Canada should develop guidelines for public servants so they are able to ensure that confidential information is not revealed through the release of datasets on the federal government’s open data portal.