I have recently joined L-IFT as a data analyst and was given the Myanmar project as my first project to handle. I did not know anything about the country other than it existed in Asia, further research showed me that it borders China, India, and Thailand. I have since studied a few other things about the country to understand it better such as population size, economic activities and so on and I have discovered the country and its people to be very interesting.
The L-IFT data itself comprises of bi-weekly data from Myanmar that is in the form of financial diaries. I had encountered similar data in school called panel data but had not interacted directly with the diaries data or with self-reported data so it proved a challenge at first. The dataset has 20 datasets (spanning from the beginning of February 2018 to end of December 2018) with information on the spending and business habits of women entrepreneurs from Myanmar. The data has two sections, one on the personal and demographic information of the participants in the survey and the other on questions that they have answered about their financial habits.
The questions are in the form of:
1. Question (e.g., what income you received from casual labor).
2. Form (What form the income took: bank, cash…).
3. Digital transactions.
4. Value of the transactions.
The questions touch on various sources of income, costs, investment spending for the participants such as:
1. From Agriculture (crop and livestock).
2. From formal employment
3. From gifts and remittances.
4. From spouses etc.
The data also captures business costs in their various forms as well as spending on investments for the business. The data further highlights the participants’ financial behavior such as loans taken. Furthermore, there is also data on social media and messaging behavior among other very detailed information.
My Impression and Insight:
I found the data collection to be very thorough with many questions that the participants of the survey are expected to fill. The questions themselves are very detailed such that they capture most of the financial behavior of the participants and more. A typical biweekly dataset from the area has over 6000 variables from the questions and over 900 respondents. This, therefore, requires a lot of time for someone to get an understanding of the data structure and get a general feel.
That being said, the data also includes a number of qualitative questions about the business they run that help users offer contextual information about business and demographics.
The data have two dimensions. The main dimension is time: the data is collected over a period of one year every two weeks and therefore capture the progression of the participants in the course of that time, and this means that methods that handle time series can be applied to this type of data. These methods would capture seasonality eg: How various annual events affected respondent’s behavior, they would also highlight the movement of the variables in the course of time and therefore capture the trend.
The second dimension is the cross section attribute of the data where each dataset represents a picture of what is happening in the research areas with micro-business women around the period that the data is collected. This is like a photo of the respondents’ activities at any given time. Because of these two dimensions, the data is rich in insights and a lot of different analysis can be done on the data.
Useful Parts of the Datasets:
The data has demographic information about the participants as well as answers to the survey questions. The demographic aspect such as name, location, etc. is very important to sort of place the participant in context.
Likewise, responses to the questions asked are also equally important and have a lot to reveal about business in the area, such as income, expenditure end the like.
I especially found the longitude and latitude data to be very useful to show where the respondents live and sort of understand the geographical aspects of the respondents. Additionally, all questions on income, expenses, costs, and investment are very important for getting an understanding of the participants’ behavior and for comparison, for instance, are incomes higher than expenses on average or not, and so on.
Questions also on the welfare of the participants such as happiness and stress levels are good and can help understand how the people in the area feel and how this affects other variables or their income levels.
For the data, I would like to perform various analysis that will reveal more insight into the data.
Summary statistics on the various variables such as income and expenses. This will give an overview of the data distribution and central tendency and help to make sense of the large dataset. Questions such as which income activity is preferred in the country can be answered through this type of analysis.
Group comparisons. This is hypothesis testing for the data that will reveal if some behavior is similar to other behavior or statistically different, for example, are some areas good for generating more income or expenses than others, etc.
Visualizations for the various variables as a way of summary as well.
These and other analysis will help understand and present the data in a better and more understandable manner. This will help convert the large dataset to insights that can be used for decision-making.