2 The Sampling Challenge

The Sampling Challenge

Now it's time for your first real decision.

You have 2,000 women in the program, spread across 15 villages. How many do you need to sample for a reliable evaluation?

You could use a tool like the IndiKit Sampling Calculator—a free tool many of you may know. For a population of 2,000, with 95% confidence and a 5% margin of error, IndiKit calculates you need a minimum of 322 participants.

What do those numbers actually mean?

  • 95% confidence means if you did this survey 100 times, 95 of those times your results would be accurate.
  • 5% margin of error means if your result shows a 40% increase, the true number is somewhere between 35% and 45%.

It's good practice to round up to 350, not just 322. That gives you a buffer for non-response, sampling errors, and dropouts. That's real-world MEAL practice.

Why does this confidence level matter? Because donors will ask: "How confident are you in these numbers?" You need to be able to say, "We are 95% confident, plus or minus 5%." That's the difference between credible evaluation and guesswork.


Understanding the Context

Now the question is: how do you choose which 350 women will be involved in the evaluation? Before you answer, you need to understand the context.

Village types:

  • 5 market towns: Good road access, 30 minutes or less to market
  • 6 rural/semi-urban villages: Moderate access, 2–3 hours by motorcycle
  • 4 remote villages: River crossings, 4+ hours to market, seasonal road closures

Not all villages are the same. Five are near market towns where women can sell products easily. Six are rural but accessible. Four are remote—dirt roads become mud during the rains, sometimes completely cut off.

This changes everything about your evaluation design.


Choosing Your Sampling Strategy

With this context in mind, how should you select your 350 women? What's your sampling strategy?

Option A: Random sampling Every woman has an equal chance of being selected—like picking names out of a hat.

Option B: Stratified sampling Intentionally sample from different locations and subgroups.

Option C: Purposive sampling Handpick women to represent different situations.

Think about it: which approach will show you the full picture of how your project performs across different contexts?


Why Stratified Sampling?

It's most likely you would have chosen stratified sampling. Here's why the other options fall short:

Random sampling sounds mathematically fair in theory. But in the real world, no field team is going to trek to the farthest village just for three or four interviews. They may substitute or skip. Your data ends up weighted towards accessible villages, and remote women almost disappear. Purely random sampling creates a hidden bias.

Purposive sampling is where we handpick informative cases. But whose story is the "right" story? The risk is we choose only shining examples—those who live near market centers, those who are better organised or more literate. We call that women's economic empowerment, but that's not evaluation. That's marketing.

Stratified sampling deliberately allocates those 350 interviews in proportion to the actual population. Or, if you want to hear the voices of remote women more strongly, you may even slightly over-sample in remote areas.

Our final allocation:

  • 69 from remote villages
  • 158 from semi-urban areas
  • 123 from market centres

This means every type of reality is forced into your data. You'll see the exact places where the program is failing—and you can fix it before wasting millions scaling up the wrong model.

Stratified sampling isn't just statistically correct here. It's the only way to get the truth when geography shapes opportunity so dramatically.


Adding Layers of Stratification

Once we've locked in the 350 women using geographic stratification, we don't stop there. We add a second layer inside those 350 spots, so the sample tells us the full story—not just a success story.

Stratify by economic activity:

  • Farmers
  • Poultry keepers
  • Traders

For each group, their challenges and results could be completely different. We need all three voices.

Stratify by age:

  • 18–30
  • 31–45
  • 46+

Younger women often have different levels of access to technology, training, and capital than older women. If we only interview one age band, we miss half the picture.

Stratify by participation level:

  • Active, committed participants
  • Women who joined but dropped out early

This is critical. If we only interview women who are still active, we hear nothing but positive reviews: "The training changed my life. My income went up 50%. Everybody is happy."

But the dropouts tell you the real barriers. They'll tell you the meetings were too far away, they couldn't manage childcare, or the loan repayment schedule didn't match their harvest season.

Early dropouts are your early warning system. If 30–40% of women in remote villages drop out in the first three months, that's not a small detail. That's the reason your whole model could fail when you try to reach the next 20,000 women.

By building active and dropout into the stratification, you guarantee those uncomfortable truths show up in the data. You can't hide them. You can't average them away. And then you don't scale a fantasy.

The final sample isn't just geographically representative. It's brutally honest. That's how evaluation becomes useful rather than just performative.


Reflection Questions

  1. Think about your last evaluation sample. Was it genuinely representative of all contexts and participant types—or did it accidentally over-represent the easier-to-reach, more successful cases?
  2. Have you ever included dropouts in your evaluation sample? If not, what barriers or uncomfortable truths might you have missed?
  3. What stratification layers would be most important for a program you're currently working on? Geography? Gender? Participation level? Something else?


Next Lesson