Data as a Utility

‘Utility Services’ is a chic word to describe things like electricity, piped drinking water, and other basic services, generally provided by the municipality or other government institutions. When I was young, the postal services and some banking services (postal savings services) were utilities but that is actually a separate topic.

Utilities are those services that are crucial for people and many believe their access and quality should be assured for all. These services have huge overhead costs so it generally doesn’t make sense for more than one organization to set up the same services in parallel or compete on these. For water, it would be cumbersome for each house to be connected to 3 or 4 parallel networks of tubes that carry water and it would not bring many benefits to change providers and switch to a different tube system. It is not practical or efficient for each town to run parallel water networks.

In this blog, I will argue that behavioral data is like drinking water; it is an essential service that most organizations need, it has huge overhead costs and it is not efficient to run all these parallel infrastructures. However, for data, we set up a new pipe network each time there is some type of data need.

In our work, we first decide what data we want, then we typically try to design a representative group (or ‘sample’), after which we hire a team of interviewers, train them and then the interviewers go out and work hard to identify people to form the sample. They put significant effort into convincing people to participate and explaining to the respondents how the data will be collected, stored, and used. Finally, the potential participant may or may not sign their consent form, which is then carefully stored to protect their private data.

Only after that, the survey implementation can start.

The reasons for erroneous data include a lack of trust and understanding in the initial relationship phase; for instance, respondents worry that the researcher may report them to the tax office…

The survey implementation phase brings with it quite a list of genuine challenges, such as respondents who don’t know what to expect from the interview and hear all sorts of questions for the first time and may, for a lot of different reasons, actually provide false or insufficient information. The reasons for erroneous data include a lack of trust and understanding in the initial relationship phase; for instance, respondents worry that the researcher may report them to the tax office, they believe there are right and wrong answers rather than each person answering according to their situation, or the respondent thinks that some answers are preferred, or in some cases, the respondent doesn’t really know the answer as they never paid attention to what the interviewer is posing questions about. Respondents do not want to answer “I don’t know”, especially for questions like “How many hours do you spend on housework?” because they have not yet monitored this or don’t yet understand what activities qualify as “housework”.

On top of that, the survey needs to ask a whole list of basic, demographic data like age, education, profession, household composition, etc. Only after the demographic questions can the survey proceed to ask the information they are actually after.

In most cases, the survey data gets analyzed by a maximum of three people (academics, market researchers, data experts, NGO staff, or similar people).

After a period of time, the entire data set needs to be destroyed, to protect the privacy and confidentiality of each participant. To link the findings for specific individuals from one study to another is out of the question, due to confidentiality.

For each new data need the troubles of getting teams and designing and getting the sample, repeat. People from previous samples cannot be explicitly sought out unless this was asked and consented from the start.

What also happens, more often than not, is that the participants aren’t informed about the findings of the study, or, as we are often hearing in the field, don’t get access to their own reported data which they feel is their property. As a result, there is a reluctance to participate in studies. Just a few days ago a field researcher in Uganda reported:

“My experience today was not any good at all. I approached around 50 enterprises but managed to complete only 20. The women were so rude, so tired of Research cause they claimed they give information but [don’t] gain much as individuals ….”

So, back to the metaphor of research as drinking water provision: most research is currently building its own piping system from scratch (research infrastructure like a sample), delivers a little bit of water (data needed for a few months maybe, and then taps are closed (research closed), all the piping is removed (interview team ends contract), and even the design drawings of the piping system are thrown away (personal identifiers of the sample of respondents are destroyed).

L-IFT however, is in the business of actually setting up long-term data collection. It constructs these ‘pipes’ to give water for at least a year and sometimes multiple years. The advantage of this is that you can keep going back to the same people, and verify whether you have understood what they reported or verify whether things have changed. These long-term piping systems serve a number of other purposes too. New questions may arise during the research, because ‘the more I learn the less I know.’ Our research infrastructure makes it possible to actively keep asking for clarifications from individuals to the entire group.

We have the vision that multiple organizations (including competing ones, for example, microfinance organizations in one country) join forces and together pay for a study and all use the same ‘water pipes’ of getting the data to flow to them.

We ourselves are also guilty of using our data insufficiently though, and we prioritize that our research infrastructure is used more intensively. We actively offer data that we own to universities and other organizations. This way, more people use the data, more of the data gets proper analysis, and the data serves several different data needs. The University of Agdar and the University of Leuven are exploring the diaries data from a group of refugees to see what the role of savings groups is in their financial lives. The WorldBank is now about to use the same refugee data set to understand energy expenses and the adaptation of solar power. The University of Groningen also will be making use of our Corner Shop Project data.

Our future plans for ‘Data as a Utility’ are a lot more collaborative still. We have the vision that multiple organizations (including competing ones, for example, microfinance organizations in one country) join forces and together pay for a study and all use the same ‘water pipes’ of getting the data to flow to them. They will each have their own focus and use, but the big overhead costs are shared and therefore affordable.

We also think that government agencies and large donors, can play an important role in supporting trials of this “data utility system” and they can encourage actors to think beyond competition and embrace collaboration.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.