Data integrity is a concept that has the utmost importance for a research organization. It is how the organization makes sure that the data it gathers is accurate, complete, and consistent. That is why I felt that it is important I share my experience working for L-IFT as a Data Integrity Intern.
My name is Beemnet Jember. I am a 22-year-old Marketing Management graduate. I started working at L-IFT as a Data Integrity Intern in January 2022. To be honest, when I first started working as a data integrity intern, I didn’t have a strong knowledge of what data integrity was and it was a bit difficult to familiarize myself with all the terms. But with the help of the L-IFT team, I understood a lot of things and started to enjoy my work and the experiences I gained throughout the process.
My first assignment concerned research that was being conducted in my home country “Digitizing the agricultural value chain payments in Ethiopia”. The research collected different types of data including the farmers’ incomes, the loans they took, their expenses, and data on their employees. I first started my job by learning how to create queries for the data that had been recorded by the field researchers. As I stated before, it was not as easy as I thought it would be because I had to carefully check for specific details like income patterns, the employees’ hours worked, and asset recording. Some examples of queries include: if the expense or income accounts are not recorded or if the currency is wrong and the hours worked are stated as zero while there is income. I sent out these queries in an excel format. During the early stage of my work, for 1 week of data, I sent out nearly a thousand queries from 140 respondents.
My next job after creating those queries was to check whether they were solved. This was not easy either since it takes a significant amount of time to determine whether all of the created queries have been resolved as there was roughly 30 weeks’ worth of data to check. My second assignment started by first receiving the addressed queries I sent to the field researchers with the field team indicating which data indeed needed correction and how it was corrected. I received those data in an excel file that included explanations of why the original was accurate (if it was), as well as written responses to each query. To check if the data was really solved, I download the updated files and check each solved query’s firm ID and see if they were solved there. If they were, I would mark them as ‘cleaned data’ and if not, I would mark them as “NO” and send it back again for the field team’s follow-up. Of course, sometimes the queries can be labeled as solved but there can be a delay in the data being actually synced properly for us to see it in the system. So, it is necessary to periodically check after a while if there is any update on the new data set. This way, data integrity is assured.
I am currently working on checking employee reports, which includes checking if every firm has active employees and checking the employees’ positions. For example, we have to check if the firm has a single owner or multiple co-owners. If the firm has co-owners, we identify which employee/s is a co-owner. In addition to this, we check how the owners pay themselves because, in Africa, most business owners do not pay themselves a fixed salary, they usually take money from the business in the form of “as-needed-withdrawals” or sometimes “In-kind pay”. Lastly, we try to record the correct number of employees by checking if there are casual workers that were registered but no longer work in the firm but whose employment status was never changed to “terminated”, and we finally create queries for them.
The queries we created have to be solved by the author in FINBIT meaning the field team who collects the data. It is not normally possible for the data to be changed on the back end to correct any errors. Of course, this causes extra work, as it would be faster if just the data staff could change what appears as incorrect data. However, L-IFT believes that the person reporting is the ‘owner of the data’ and that the whole process primarily serves the respondents’ data empowerment.
It has been almost a year since I started working at L-IFT and I couldn’t be more grateful for the experience I gained throughout the whole process. With the support of the L-IFT team, I learned how to manage data and the importance of data integrity, lessons I will not soon forget.