How do we know if we are making a difference? The challenging nature of impact evaluations in practice
20 June 2014
By Kate Pruce.
ESID’s aim is to “create a robust, relevant and accessible body of evidence that will help local, national and international efforts in developing countries to secure states that are more effective at and committed to delivering inclusive development”. Rather than looking for quick fix solutions, we want to ensure scientifically rigorous evidence leading to the creation of useful knowledge about what works and what doesn’t in the field of development.
This approach is the basis for a workshop series on impact evaluation for international development, with the focus of the first workshop being on tools and methods that can be used to create a deeper understanding of the process of change. In his opening remarks Prof David Hulme set the scene within the context of increasing interest in impact evaluation, visible in DFID’s focus on evidence and the ESRC’s commitment to impact assessment. The programme was varied, with presentations ranging from studies in challenging conflict-affected environments to health, education and poverty alleviation. This provided the opportunity to tackle key debates based on empirical data and researcher experiences of the complexities of carrying out an impact evaluation in practice.
Randomista vs non-randomista
Randomised control trials (RCTs) have received a lot of interest recently, particularly from donors, as they are considered to be reliable i.e. quantitative, robust and scientific. However the challenges of conducting an RCT in the social sciences are significant. These include the ethical implications of not treating a community in need in order to use it as a control, and the difficulty of ensuring absolute separation of treatment and control communities which could lead to spillover, never mind the feasibility and expense of conducting an RCT in a development setting.
Abhijit Banerjee of the Abdul Latif Jameel Poverty Action Lab (J-PAL) suggests that RCTs force researchers to be more rigorous and to grapple with causality, while Angus Deaton recommends an ‘Angry Birds’-style trial and error approach. Deaton counteracts Banerjee’s argument that trial and error is not a realistic way to create policy by pointing out that an RCT is testing one type of trial which only works in a certain context, and in fact may well end in error. The discussion in our forum was that RCTs can be useful in certain situations but are most useful as one possible method supported by others (qualitative) to explain causal patterns.
One such alternative method is Qualitative Comparative Analysis (QCA), presented by Dr Wendy Olsen, which can be applied to RCTs. It is based on fuzzy set analysis and should be carried out after statistical work. Statistics tests for a best fit, while QCA measures for sufficient and necessary causes. In this way it is possible to test every permutation and combination. If a factor is common across all pathways then it is necessary, while if every factor X has a factor Y but not every Y has an X then it is a sufficient causal mechanism. This overcomes the ‘dosage’ model, which is a limitation of regression.
Not judging but learning
The presentations by Prof Stefan Klasen, Prof Menno Pradhan and Prof Armando Barrientos had findings which demonstrated zero or negative effects of interventions. This does not necessarily need to be a bad outcome of an evaluation, as long as lessons are learned and measures can be taken to make improvements. However, this is not straightforward for organisations and donors who are under pressure to demonstrate results and justify the use of funds. This was highlighted in a recent workshop on ‘Aiding reform: lessons on what works, what doesn’t and why’. The final quote – “just being able to say ‘yes, it has failed’ would create a great sense of relaxation in DFID” – indicates that failure isn’t an option. Therefore if the outcome is uncertain, the evaluation may not go ahead at all, which happened in the case presented by Dr Philip Verwimp about South Sudan.
This contradicts the aim of impact evaluations and can lead to missed opportunities. By describing the process as monitoring and evaluation, it can be interpreted as threatening – checking up, judging. Prof Tilman Brück suggested that a more useful approach is to focus on the process of systematic learning. Treating an evaluation as an external exercise has the advantage of increasing objectivity but denies agency to the organisation staff. Instead if it is integrated into the programme this can increase ownership and improve learning opportunities making it more beneficial for the organisation, the current intervention and possible future interventions.
The politics of evidence
A fundamental assumption in the use of impact evaluation to create knowledge about what works in development is that evidence is actually used for policy-making, an assumption that is questioned in Barrientos’ presentation about antipoverty transfers in Latin America and sub-Saharan Africa. He suggested that impact evaluations can be used for the creation of evidence-based policy, but that they can also be a tool to overcome political resistance and competition. He also found that evidence does not influence political processes. In the two cases where evaluations were demanded by the government – Kenya and Ethiopia – the decisions made were not actually based on the findings. This illustrates the dilemma of the role and politics of evidence in development.
In one word – ‘challenging’
Co-convenor and ESID Joint Research Director Prof Kunal Sen summarised the presentations in the workshop as ‘challenging’: challenging in the environments that impact evaluations were being conducted; challenging in the methods; challenging in the applications as we saw in the case of Dr Ralitza Dimova and Dr Katsushi Imai’s presentations as the data was not experimental; and challenging in the findings as we saw from the presentations showing us that policy interventions can have zero or negative effects, and that the use of impact evaluations are not apolitical.
In closing, Kunal emphasised that it was positive to see that the papers approached impact evaluations from different methods, asking difficult questions, and that we needed to move away from the randomista vs non-randomista debates that has plagued impact evaluation recently, as there is a lot to be learnt from both sides.
The Manchester Workshop on Impact Evaluations for Development Policies, Part I – Methodologies and Applications, was held on 12 June 2014 at The University of Manchester. Part II will take place on 10 September 2014. The second workshop will engage with debates about impact evaluation, and the relationship to philosophical debates on causal inference in the social sciences.
For further information view the presentations from the workshop:
Impact evaluation and conflict – Prof Tilman Brück
Benefits trickling away: the health impact of extending access to piped water and sanitation in urban Yemen – Prof Stephen Klasen
Double for nothing? The effects of unconditional teacher salary increases on performance – Prof Menno Pradhan
Qualitative comparative analysis grows up: surprising applications of fuzzy sets – Dr Wendy Olsen
Preferential access into the Chinese market: how good is it for Africa? – Dr Ralitza Dimova
Evaluating anti-poverty transfer programmes in Latin America and sub-Saharan Africa: better policies? Better politics? – Prof Armando Barrientos
Workfare as collateral: the case of the National Rural Employment Guarantee Scheme (NREGS) in India – Dr Katsushi Imai