If you’ve started researching ESSA evidence requirements, you’ve probably encountered terms like “quasi-experimental design” and “randomized controlled trial” within the first few paragraphs — and then closed the tab. The language used to describe educational research methodology can feel like it belongs to a different profession entirely, and for edtech founders and product leaders, that’s a real barrier. This is a plain-language breakdown of what each ESSA tier actually requires, why Tier II is the most practically achievable milestone for the majority of edtech companies, and what “quasi-experimental” actually means when you move from definition to implementation.
What Are the Four ESSA Evidence Tiers, and How Do They Differ?
ESSA’s four tiers exist on a spectrum of research rigor. The higher the tier, the more confident you can be that a product’s positive results were actually caused by the product — and not by some other factor in the environment.
| Tier | What It Requires | What It Can Claim |
|---|---|---|
| Tier IV Rationale |
A logic model grounded in existing research. No original study required. | The product’s design is informed by evidence. Not that it produced results. |
| Tier III Promising |
An original study measuring outcomes before and after product use, with statistical controls. | Students improved. Can’t rule out that they would have improved anyway. |
| Tier II Sweet Spot Moderate |
A quasi-experimental study comparing students who used the product to a matched group who didn’t. | Meaningful causal evidence that the product drove the difference. |
| Tier I Strong |
A randomized controlled trial with randomly assigned treatment and control groups. | The strongest possible causal claim. The gold standard. |
Tier I is the randomized controlled trial — the gold standard of research in virtually every field, and also the most logistically demanding. Students or classrooms are randomly assigned to use the product or not, which eliminates the selection bias that other designs can only approximate. For most edtech companies, it’s the long-term goal, not the starting point.
The practical reality: Tier II is the evidence standard that most edtech companies with a few years of real-world implementation can credibly pursue — without a randomized trial, without asking districts to withhold your product from any students, and often using data that already exists.
What Is a Quasi-Experimental Design in Education Research?
The term sounds more intimidating than the concept. A quasi-experimental design (QED) is simply a study that compares outcomes between two groups — one that used your product and one that didn’t — without randomly assigning who went into which group.
In a true experiment, a researcher randomly decides which classrooms get the intervention. In a quasi-experiment, the groups form the way they naturally do in schools: some teachers adopted your platform, others didn’t; some districts purchased your curriculum, others continued with what they had. The researcher’s job is to find a comparison group that’s as similar as possible to the treatment group, then use statistical methods to account for any remaining differences.
What makes a QED credible is the quality of that comparison — how well-matched the groups are at the outset, whether they’re comparable on prior achievement and demographics, and whether the analysis accounts for the ways they differ. A well-executed QED can generate strong causal evidence. A poorly executed one, where the two groups were fundamentally different from the start, can be misleading even if the math is technically correct.
What Study Designs Qualify for ESSA Tier II?
For most edtech companies pursuing Tier II, there are two designs worth understanding in depth:
Researchers pull students from non-implementing schools in the same district, match them to treatment students on prior test scores and demographics, then compare end-of-year performance. Concurrent, clean, and favored by reviewers like Evidence for ESSA.
This year’s students using the product are compared to last year’s students from the same schools who didn’t have access to it. Logistically easier — same schools, same assessments — but requires care if anything meaningful changed between years.
LXD Research has used both approaches in recent client studies. Our Just Right Reader work in 2024–25 used a matched comparison group design — comparing students in classrooms using decodable text libraries to demographically similar students in the same district. Our first grade study used a cohort comparison, comparing the 2024–25 implementation cohort to the 2023–24 pre-implementation cohort from the same schools. Both produced statistically significant literacy outcome results, and both approaches are legitimate pathways to Tier II evidence.
We’ve put together videos walking through each of these study designs in plain language — including what makes a comparison group credible and where cohort studies fall short. Watch on the LXD Research YouTube channel →
How Do You Recruit Schools and Districts for a Tier II ESSA Study?
This is one of the most practical questions edtech companies ask, and it doesn’t get discussed nearly enough in research methodology guides. The honest answer is that recruitment for a retrospective QED usually starts with the districts you already have.
You’re not asking anyone to change what they’re doing — you’re asking for permission to analyze data from what already happened. That’s a meaningfully lower bar than asking a district to participate in a prospective randomized trial, where they’d need to agree to assign some schools or classrooms to a control condition. In practice, this means identifying districts where your product has been implemented with reasonable fidelity for at least a year, where student-level outcome data is available, and where district leadership is willing to sign a data sharing agreement.
For the comparison group, within-district comparison is the cleanest option: schools in the same district that didn’t adopt your product share the same policies, assessment instruments, and calendar, which controls for a lot of noise. If the whole district adopted your product, neighboring districts with similar demographics are workable with more careful matching. A prior-year cohort from the same schools is the third path. The main barrier to recruitment isn’t usually willingness — most districts are genuinely interested in knowing whether something worked. It’s data infrastructure.
Do All Tier II Studies Have to Meet the Same Evidence Standard?
This nuance matters a lot in practice, and it’s often glossed over. The short answer is no. When people say “Tier II,” they’re describing a category of study design — quasi-experimental with a comparison group — but there’s a meaningful range of quality within that category, and different evaluation contexts hold studies to different standards.
Sites like Evidence for ESSA, run by the Johns Hopkins Center for Research and Reform in Education, apply rigorous review criteria: baseline equivalence between groups must be established, attrition must be documented, outcome measures must be valid and independent, and minimum sample size thresholds must be met. A study that calls itself quasi-experimental but doesn’t demonstrate baseline equivalence is unlikely to pass their review.
State rubrics for edtech procurement, on the other hand, are often more inclusive. Many states award meaningful evidence points to any study using a comparison group design with positive effects — even if it wouldn’t clear all of Evidence for ESSA’s methodological hurdles. This isn’t a loophole. It’s a realistic reflection of the market. If your goal is to inform district procurement decisions or compete in state RFPs, a well-designed QED that falls slightly below Evidence for ESSA’s bar can still be highly valuable. If your goal is specifically to appear on their list, you’ll want to design for their criteria from the outset.
What Are the Common Challenges in Conducting a Quasi-Experimental Study for ESSA?
Data access is the most common obstacle — not all districts have the infrastructure or willingness to share student-level data for research purposes, and building those relationships takes time. Comparison group availability also varies; in a small district where virtually every school adopted your product, finding a clean comparison population locally may not be possible.
Statistical matching can reduce but not eliminate selection bias. If schools that chose your product were systematically different from those that didn’t — say, they had more tech-forward administrators or stronger professional development resources — that difference could influence results even after matching. Cohort comparison designs carry their own caveat: if something meaningful changed between the comparison year and the implementation year, the comparison loses interpretive power. A research partner can help you assess whether a cohort design makes sense for your specific context, or whether a concurrent comparison group would be more defensible.
None of this means Tier II is out of reach. It means the quality of the study depends significantly on the quality of the data and the rigor of the design — which is exactly why having an experienced research partner matters as much as having the right implementation history.
Does Your Implementation History Support a Tier II Study?
LXD Research offers a free consultation to help you assess your options before committing to a research plan. We’ll look at your existing data, your comparison group options, and the evaluation context you’re designing for.
Schedule a Free Consultation View Our Services