Once you’ve gathered the data, how do you analyse it to render relevant FinNeeds insights?
The first step when engaging with a new dataset is to clean the data. Data cleaning primarily focuses on checking that the data matches the questionnaire and removing outliers. Another important consideration is checking weighted and unweighted tabulations of the data to ensure that it is in line with what economic theory would predict.
Once the data is cleaned, the analytical framework for demand-side data analysis follows the different elements of the FinNeeds framework, namely needs and use, devices, usage, drivers and outcomes:
The first step is to take a granular look at the use cases within each need category. This is done by calculating the proportion of the population who experienced the respective use cases making up the four financial needs and then highlighting key use cases. This positions the landscape of financial use cases, grouped according to the four needs categories, that will form the basis for the rest of the analysis.
The next step in the analytical framework is to consider how use cases are met. In answering the questionnaire, the respondent specifies the financial action taken when addressing each use case. During the process of analysis, the use case should be recoded into the underlying need and the financial device response into the product market (credit, savings, payments or insurance) or provider type (formal, informal, social or personal). For example, if one borrows from a family member or friend to buy food, it implies that credit from family and friends is used to address a liquidity need. A comprehensive view of the contribution of the different product markets or provider types in addressing a specific need provides an understanding of how well the financial system is serving the population. At this stage, the use of specific devices in meeting financial needs can already provide early insights on the functioning of the financial system or financial outcomes. For example, if most people use informal credit or savings rather than insurance to cope with large medical expenses, it may point to a gap in the insurance market to serve front-of-mind resilience needs.
It may also be relevant from a policy perspective to consider the proportion of people who responded that they “did nothing” in response to a use case, or who employed welfare-reducing strategies such as selling an asset or reducing expenditure. For instance, the proportion of people who sold something to address a resilience need may point to gaps in risk-coping mechanisms. Identifying these gaps is an important step in coming up with appropriate solutions to improve the role of the financial sector in building resilience.
Often, conducting the analysis at a consumer segment level will render more meaningful insights than for the population as a whole. For instance, if 20% of the sample population use informal services to address a prominent resilience use case, further analysis may reveal significant gender effects. If women largely constitute the 20% using informal services, targeted interventions are required to close the gender gap in the use of formal financial services.
Applying the FinNeeds framework requires an intuitive approach in identifying sensible segmenting variables that may give a granular view of financial inclusion. Other variables useful in segmentation are locality (rural vs. urban), age, and income or socio-economic status. Education or relationship status may also be relevant. While it may not be necessary to indicate results for all segments in all instances, it is useful to run the analysis at a segment level to pick up on skews to represent where relevant.
A dedicated FinNeeds questionnaire contains a module requiring respondents to list their reasons for choosing particular devices. The first step in analysing drivers is to tabulate the responses, which will elicit, for example, the percentage who indicated that they prefer a device for functional reasons such as convenience or cost-effectiveness, or because of relational reasons such as trust or a sense of belonging. Analysing the output should reveal variation in preferences across types of devices or population segments. Not all outputs will be contained in the eventual analysis: it is important to gauge which comparisons are most relevant and interesting given the overall research objective.
Beyond the functional and relational drivers, the demographic information in the survey can be used to get a sense of contextual drivers. This can be done by cross-tabulating personal characteristics with devices used to see if interesting trends emerge. It is useful to define hypotheses upfront, which can then be tested through exploratory data analysis. For example, a hypothesis might be formal users are more likely to be from a higher socio-economic class, more educated and more urban.
Drivers of use can also be tested statistically through regression analysis. See the discussion under transactional data analysis below for more details.
Gauging outcomes of use is the most complex part of the analytical framework. The FinNeeds theory holds that one should be able to analytically classify survey respondents into different outcome segments (those who are “resilient” or “not resilient”; those who maintain liquidity vs. those who have infrequent experiences of illiquidity vs. those who are chronically illiquid, etc.) based on the questionnaire responses. For example, where the questionnaire asks a person how recently they were not able to balance income and expenses and how often it happens that they cannot do so, a combination of these two responses can be used to classify respondents into three categories: those who never experience any problems as “liquid”, those who experienced liquidity distress in only one month in the past 12 months as “some liquidity distress” and the rest as “severe liquidity distress”. The exact parameters to be used will depend on the nature of the questionnaire and on the country context. It will require the analyst to run some scenarios and make a judgment on the most meaningful distinctions in context.
The next step is to compare the profiles of the different outcome segments: do those who regularly maintain liquidity have a different device portfolio than those who experience liquidity distress? Are lower-income people more likely to experience liquidity distress than the middle class? Once again, it is useful to set hypotheses upfront that can then be tested. For example, that people with insurance are more likely to be resilient than those without. If the findings reveal that this is indeed not the case, it raises compelling questions from a policy perspective.
One could also test the determinants of outcomes using various statistical procedures (subject to a large enough sample size). The appropriate method will follow from the nature of the outcome variable and assumptions made in the model. For example, a binary outcome variable classifying respondents into “resilient” and “not resilient” could be modelled using an alternative-specific probit or similar statistical technique, as appropriate.
Outcomes analysis is exploratory, and it may require consultation between the analyst and policymakers and financial service providers to explore the most meaningful outcome parameters to set, which hypotheses to test and angles or segments to explore.
Transactional data can come from a range of financial products and from different types of financial service providers. Thus, the variables of interest differ depending on the nature of the dataset. In general, however, transactional data analysis focuses on two elements of the FinNeeds framework: usage and drivers:
No two transactional datasets are alike. Thus, when working with transactional data, the analyst’s first step is to map the structure and contents of the dataset to gauge the data gaps and understand the scope of analysis that will be possible. The analytical approach should then be adjusted accordingly.
Cleaning the data is very important when working with an existing transactional dataset. Cleaning the data usually involves removing or, where relevant, filling in blanks in the data. It is also necessary to check for database accuracy and remove outliers accordingly. For example, are there people older than 120 years in the database?
To analyse usage intensity, it is useful to construct a composite usage indicator. For example, in one of our pilot studies, we constructed a composite usage score consisting of four parameters:
The composite usage score can be used to model the determinants of usage. In the example above, usage was modelled based on ordered logit regression techniques to determine what the most statistically significant demographic determinants of usage are.
The statistical model can be built using only transactional data or, where available, integrating demand-side variables from the merged dataset into the model and testing their predictive power.
As with the demand-side data analysis, it is useful at the outset to set some hypotheses to test. For example, are high users more likely to be male or female, urban or rural? Does socio-economic class matter more than gender in determining usage? To what extent does the device portfolio matter in explaining usage? Further examples include, does receiving income into an account remain the most likely predictor of high usage or engagement? Do people with more than a certain number of devices outside of their formal account have lower usage profiles than those whose portfolio is largely formal? The exact hypotheses to be set will depend on the context and the particular financial inclusion policy questions and gaps to be considered.
Another important way to analyse usage is to classify users with similar usage patterns into usage clusters and then to compare the usage and demographic profiles of different clusters. Understanding the different characteristics of different usage clusters can help financial service providers and policymakers improve the use of formal financial products.
Clustering is achieved by using algorithms that identify clusters of users with similar usage profiles. For example:
In one of our pilot studies, we used k-means clustering methods to generate statistically significant clusters of users. This was done by asking the algorithm to run various iterations whereby users are compared according to six variables: average number of transactions, average amount transacted, gender, age, income and education. The exercise stops when the algorithm determines clusters of users that are sufficiently different not to warrant separate classification. In this instance, the exercise rendered six distinct clusters of users.
This is a form of unsupervised clustering. An alternative would be to conduct supervised clustering whereby quintiles of users are determined according to their usage intensity scores as discussed above.
Learn how to apply the FinNeeds approach to measuring financial services usage