How Amnesty increased first year F2F retention by 12 points

Street fundraiser talking to member of the public

Guest Bloggers | 22 January 2018

a.k.a. are all supporters the same or not? Hard to tell looking at sector ‘best practice’…

(Reader note: this post will end with a control beating finding with a huge impact on the otherwise stubborn retention metric but take the ride to get there, it is (hopefully) worth it.)

The random nth, that tried and true approach to testing assures (in theory) the test group matches the control. So (in theory) the only difference is our treatment of the test group. And we can therefore (in theory) conclude any difference in response is because of the test idea rather than unknown, unseen differences between the test and control group. 

This approach makes one massive assumption that is never talked about and, we’d argue, never understood, conceptually, mathematically, or otherwise.

The difference between human beings is smaller than the anticipated impact of our treatment/test/idea.  Said differently, everybody is the same.  

How is this being assumed? What if men love the test idea but women hate it? Because your file skews female their reaction pulls down the test average and it fails to beat the control. And yet, there was a control beating idea hidden in the losing results because you assumed everyone is the same; men and women in this case. 

But wait a second, nobody in fundraising thinks all supporters are the same, after all our sector segments the crap out the file. There are cash, monthly, lapsed, supporters who also do advocacy, total nonsense RFM groupings, the even more nonsensical demographics and psychographics used to highlight all the lovely differences. And then the crème de la crème, personas, either statistically derived groups of donors based on a potpourri of random attributes or worse, just made-up archetypes that likely describe no single donor well (your first indication to ignore it entirely). And, more to the point, failing to explain the behavior of a single donor (your second indication to ignore it entirely).

On the one hand, we segment like crazy on random but deceptively alluring crap. On the other, we do the vast majority of our testing with the random nth assignment. A tacit assumption our test idea is so powerful it will supersede all these differences we think exist and be a winner. All because most people like it better than the control idea; because most people are the same…

Human beings are complex and yes, very different from each other but not in the ways we traditionally think. (We’ve written extensively about why demographics are garbage here).

The takeaway point, that may represent the most misapplied/misunderstood concept in fundraising and the biggest, missed opportunity; the main reason to segment human beings is because you discovered groups with different reasons for supporting you that warrant different treatments.   

Instead, most charities stick to the generic average, which is exactly what most controls represent, an average idea that is average for most people. We then bestow an enormous gift to this average idea that gives it enormous power having nothing to do with the quality of the idea itself; exposure via volume over time. 

In short, think time. Upfront think time to formulate a test that is based on a hypothesis about human behaviour. Put in some desk research, a lit review or a consultation with a subject matter expert (i.e. rigour) into what is often a slavish adherence to the idea and process of testing instead of ideas themselves. 

Here are test results and an illustration of the upside this type of thinking and rigour can provide. 

So how the hell do you beat the (average) control with all this artificial power?

This is a year-long test with newly acquired donors. The metric that matters is retention rate.

Two test groups were created using the standard, random nth approach and visually illustrated with our blue, green, yellow and orange people. The treatment is sending additional, ‘engagement’ (i.e. no-ask) communications, beyond the control number of comms, to newly acquired donors; heavy get 12 more during the year, light get 6. 

At the end of the year, the yellow bars reflect failure. No difference in the retention rate between the test groups and the control but more money/time was spent with the additional comms so the tests are big losers.

But, what if, hypothetically, the blue people loved it and the orange people hated it as the callout suggests? We had a winning idea for an important, sizeable subgroup (blue people) but because we treated everyone the same, that finding was lost. This happens all the time. 

Kevin Schulman F2F blog image 1

Finding your blue people cannot be arrived at after the fact by breaking out the average results (i.e. the yellow bar) with all the data on your CRM. Why? It isn’t because you won’t find differences. In fact, the more data you have appended to your CRM the more differences you will find. This approach is what is referred to as a fishing expedition; just random hunting for what amounts to statistical noise. Lots and lots of noise. 

To find your blue people you need a hypothesis upfront about what makes blue people different from orange and then design a test to try and find support for this theory. 

This is exactly what we did. Our hypothesis was not that sending more no-ask communications would increase retention. 


Our hypothesis was two-fold:

1)      Higher Commitment (DonorVoice’s proprietary measure of relationship strength) donors require less content designed to build the relationship that is already built.  In fact, this additional content will create irritation; suggesting the charity doesn’t know them as supporters and by extension, is wasting their money, which means sending more stuff will make retention worse. 

2)      A second hypothesis is that Low Commitment donors would benefit, to a point, with some additional content designed to build that relationship. At point of acquisition we measured Commitment for every single donor in the control and test groups.  


The real analysis, not the ‘fake news’ analysis that assumes everybody is the same, required analyzing the High Commitment donors (blue and green people) separately from the Low Commitment donors (orange and yellow people). 

As you can see, the results are exactly as hypothesised and massive. All of this massive impact (positive and negative) was hidden in the random nth, ‘fake news” world where we assume everyone is the same.

If any charity (or agency) has discovered a way to increase first year retention by 12 points, let us know. This sort of outcome has never yet come from an internal brainstorm, a white board session or an agency response to a brief. This only comes about with rigour and subject matter expertise about why people behave as they do, a well-designed test and the proper analysis.

But, the starting point is accepting that people are indeed different just hardly ever in the traditional ways we currently slice and dice the world.

And for a free consultation on all of this, email DonorVoice Managing Director, Charlie


Kevin Schulman F2F blog image 2


Kevin Schulman, founder of DonorVoice

Kevin Schulman is the founder and Managing Partner of DonorVoice, a retention and donor experience company serving non-profits in the US, Canada, Europe, UK and Australia. Before founding DonorVoice Kevin served as CEO of a modeling and analytics company servicing non- profits. Visit DonorVoice website.


Post a comment

Please click the box below to indicate you are a human rather than an automated system completing this form.