Partnerships founded on trust: Introducing Contracts for Data Collaboration (C4DC)
This brief was written by Tom Orrell, DataReady and Hayden Dahmm, SDSN TReNDS. The authors would like to acknowledge and thank the following people and organizations for their insights that have helped to shape this brief: Jessica Espey and Jay Neuner, SDSN TReNDS; Scott David, University of Washington; Bill Hoffman, World Economic Forum; Stefaan Verhulst and Andrew Young, the GovLab at New York University; Charlene Migwe, Beverley Hatcher-Mbu, Taryn Davis and Paige Kirby, Development Gateway; Tracey Li, Flowminder; and, Fredy Rodriguez, Cepei.
Contracts for Data Collaboration (C4DC) is a partnership between the Sustainable Development Solutions Network’s Thematic Research Network on Data and Statistics (SDSN TReNDS), the University of Washington’s (UW) Center for Information Assurance and Cybersecurity, the World Economic Forum (WEF), and the Governance Lab at New York University (“the GovLab”).
The project aims to shed light on the opportunities and challenges inherent to data collaboratives. GovLab has worked to document the experiences of data collaboratives, specifically practices that involve the exchange of data or other data-related actions taking place between public and private entities (GovLab 2015). These exchanges and other data actions take place in various ways, from the informal sharing of insights and data-handling practices that don’t involve the sharing of raw data, through to more formal and legally-binding data sharing agreements (DSAs).
The project is directed towards a number of audiences. These span government data and information management professionals, statisticians, and policymakers, through to development and humanitarian organizations that rely on data to carry out their programs of work. Businesses, data producers, and data stewards considering sharing their data for public good and researchers and academics exploring the use and potential of data sharing agreements are also intended audiences. In this way, the C4DC partnership spans the data supply-and-demand landscape.
The project aims to provide these groups and others with a range of tools to facilitate understanding of the opportunities and challenges related to formalized data sharing that is underpinned by written agreements. Project outputs will include an online repository of DSAs, together with an analytical framework that will help to demystify these types of agreements for non-legal professionals, raising their understanding of the issues involved in data sharing and related actions. In the medium-term, the project hopes to also produce and publish a series of case studies explaining when and how DSAs are, and should be, most effectively applied for maximum impact and with minimum risk in the public interest.
The Importance of Data Collaboration
Data collaborations often involve the transfer of specific data from one set of hands to another, or the granting of access to datasets through a variety of means–whether through physical access to a server, remote access via digital methods, or providing access to data for algorithmic querying of data. Data collaborations raise a whole host of questions around data rights, ownership, use, control, and risk that can seem overwhelming when first encountered.
The complexity of these interdependent issues can create risks that require partners to balance costs and opportunities and can undermine the entire effort. In turn, the sum total of collaborators’ rights and duties that result from the balancing by each party informs the collaboration documents (sometimes called a “trust framework”) that guide the data-related activities of those collaboration partner organizations and individuals seeking to improve the use of data in the public interest.
Fortunately, some groups have started to identify and catalog successful data-related practices in an effort to make them available for reference by other initiatives. For example, to capture practices and collaborations, the GovLab has produced an online repository of data collaboratives (The GovLab 2019) and enabled an active community of data stewards in order to both document and incentivize more, and better, data sharing and governance practices. Similarly, in the sustainable development and humanitarian sectors, the need for focused research and work on data sharing practices in particular is widely recognized, and TReNDS’ flagship report, Counting on the World (SDSN TReNDS 2017) has articulated the need for multi-stakeholder partnerships, principles and standards to drive better practices. The UW Information Risk Research Initiative has produced the Atlas of Risk Maps presents a checklist of over 175 varieties of risks to organizations from networked information systems. [Contact Scott David at email@example.com for copies of the Atlas.]
Data collaboration has the potential to unlock information that can strengthen public decision-making. By making data available for application in multiple contexts and interactions, it is able to inform a greater range of decision-making, as each subsequent receiving party brings their own meaning and context to the same data. When data is applied in meaningful contexts it can inform decisions, thereby converting from “data to knowledge to action.” Governments, private companies, researchers, and development practitioners all have to navigate similar challenges as they strive to reach agreements on how to collaborate with respect to data actions to aid their respective decision-making processes. The difficulty of anticipating all of the variables that can affect group’s performance, the nuances of navigating contractual negotiations, and all the other considerations that emerge are complex and informed by a combination of technological, institutional, legal, policy, economic and, of course, human factors, which is what this project seeks to unpack.
Our analytical framework: Demystifying data collaboration
DSAs generally share common elements that can be conceptually organized into categories of: where, why, what, who, how, and when. In the C4DC project, we have developed an analytical framework to parse the many terms in DSAs into logical categories consistent with the below framework developed by Stefaan Verhulst and Andrew Young at the GovLab as part of their Data Collaboratives program (Verhulst 2019).
WHY is data being shared? What is the context and purpose?
WHAT kinds of data are being shared? What are the sources, formats, and other technical requirements?
WHO is party to the agreement? Who will be providing and using data resources? Are there any other ‘third parties’ that are also involved? Who has certain rights and duties?
HOW are data being shared? How is the relationship managed? How will issues such as security, privacy, and risk be handled?
WHEN will data actions take place? At what point does the agreement start and end?
WHERE are data being shared to and from? Are there jurisdictional issues to consider? Are there any international laws that apply?
These conceptual categories help data collaborations to engage with and handle the complexity of DSAs. It starts with general categories that naturally emerge in data stakeholder discussions, and then offers increasingly detailed sub-analysis in a logical sequence from those general categories, in an effort to demystify structured data actions and facilitate user understanding of real-life DSAs. The analytical framework and the data collaboration concept are intended to address the lack of clarity around these data sharing and data action issues that often results in high transaction costs and significant delays during the process of negotiating the terms of a DSA.
In Colombia for instance, the Centro de Pensamiento Estratégico Internacional (Cepei), with support from SDSN TReNDS, has piloted an innovative project with the Bogotá Chamber of Commerce to reconcile local data sources. The result has been a supply of relevant data on economic growth, infrastructure, and industrialization now available to the National Administrative Department of Statistics (DANE). Although the collaboration has been a success, securing the necessary arrangements proved more difficult than initially anticipated. Cepei was able to analyze the Chamber of Commerce data in less than two months, but the process of negotiating a one-and-a-half-page agreement to enable them to do so took over six months (Rodriguez 2019).
Similarly, Development Gateway, which promotes data-driven development solutions, regularly finds that it takes three to four months to negotiate data sharing agreements with its partners: “developing the document, getting buy-in from the different stakeholders, the different country officials, that all takes time.” (Hatcher-Mbu 2019). Most recently, Flowminder took a year to negotiate a three-way data sharing agreement in Ghana between themselves, Vodafone, and Ghana Statistical Services (Li 2019).
What the three examples share is an understanding that any data collaboration activity or collaboration agreement that involves data actions of some form must be founded in trust, which typically takes time to develop through discussions and negotiations, even where it is supported by the behavior-normalizing backdrop of broader regulatory and legal frameworks.
Next Steps and What We Need
The C4DC partnership is at the beginning of an iterative process. While significant progress has been made to date in the development of the analytical framework and understanding of user needs, the next steps will be crucial to the project’s success. Immediate priorities include:
Collecting and growing a pool of agreements from around the world and across different domains;
Analyzing these agreements for common terms and drafting solutions to shared challenges, highlighting how specific challenges around the who, what, where, when, why, and how of data collaboration are addressed;
Designing and building the online repository and resource center to house the agreements and any supplementary materials and collaboration tools; and
Engaging with prospective user groups to glean feedback and develop case studies to inform future development.
Public and private organizations are invited to contribute sample agreements associated with data collaboration from their own work. Please email documents or any questions you may have to TReNDS Analyst Hayden Dahmm (firstname.lastname@example.org). Do note that all contributed materials will be made publicly available online, so please remove any party names or identifying information as necessary from the agreements before sending, and make certain that you are not sharing any private or confidential materials.
Over time, the project will shed light on the opportunities and challenges inherent to data collaboration. In particular, through the sharing of good practice examples and guidance, the benefits of responsible data stewardship and public-private data collaboratives should become clearer and common risks and costs reduced. Ultimately, it is hoped that improved practices around data sharing will not only speed up for process of negotiating agreements and reduce transaction costs, but also strengthen the foundations of public-private data collaboratives by fostering the trust that sits at their heart.
Hatcher-Mbu, Beverley (Development Gateway). 2019. Interview by Tom Orrell and Hayden Dahmm.
Li, Tracey (Flowminder). 2019. Interview by Tom Orrell and Hayden Dahmm.
Rodriguez, Fredy (Cepei). 2019. Interview by Tom Orrell and Hayden Dahmm.
The GovLab. 2015. “Data Collaboratives.” https://datacollaboratives.org/introduction.html.
The GovLab. 2019. “Data Collaboratives Explorer.” http://datacollaboratives.org/explorer.html.
SDSN TReNDS. 2017. Counting on the World. New York: Sustainable Development Solutions Network, 2017. https://www.sdsntrends.org/research/2017/9/17/counting-on-the-world-2017.
Verlhust, Stefaan G. “Data Collaboratives: The Emergence of Public Private Partnerships around Data for Social Good.” Presentation to the AI for Social Good Conference, Doha, February 17, 2019. https://qcai.qcri.org/wp-content/uploads/2019/03/Data_Collaboratives__Stefaan_Verhulst__GovLab.pdf.
A joint project of