Cohen's Kappa Calculator for Reviewer Agreement

Enter how your two reviewers classified the same set of records and this free calculator returns Cohen's kappa, the observed agreement, the agreement expected by chance, and an interpretation. It is built for the title and abstract screening stage of a systematic review.

Agreement table

Enter how many records fell into each cell when two reviewers screened the same set of titles and abstracts.

Reviewer B: Include
Reviewer B: Exclude
Reviewer A: Include
Reviewer A: Exclude

The two green cells are the records both reviewers agreed on. The other two are the disagreements.

Your agreement

Enter the counts for at least one cell to calculate Cohen's kappa and the observed agreement between your two reviewers.

Why kappa, and not just percent agreement

When two reviewers screen the same records, you could simply report the percentage they classified the same way. The problem is that two reviewers excluding most records will agree a lot by chance alone, so a high raw agreement can hide poor reliability. Cohen's kappa corrects for that chance agreement, which is why journals and methodologists expect it as the measure of inter-rater reliability rather than a bare percentage.

How to read your result

The calculator reports kappa alongside the observed and chance agreement so you can see how much of the raw agreement was genuine. The interpretation band uses the widely cited Landis and Koch benchmarks:

  • Below 0.00: poor, worse than chance
  • 0.00 to 0.20: slight
  • 0.21 to 0.40: fair
  • 0.41 to 0.60: moderate
  • 0.61 to 0.80: substantial
  • 0.81 to 1.00: almost perfect

These bands are conventions, not hard rules. A low kappa is a prompt to revisit your eligibility criteria and re-pilot, since the usual cause is an ambiguous criterion that the two reviewers interpreted differently. Our guide to running a reliable two-reviewer screening process covers how to calibrate before you screen at scale.

Where agreement fits in the review

Reviewer agreement is a checkpoint inside the screening stage, the same stage whose totals you report in your PRISMA 2020 flow diagram. A calibrated, well-documented screening process is what makes the records screened and records excluded counts in that diagram defensible. Reporting the kappa from your pilot is also one of the methods details the PRISMA reporting checklist expects under the selection process item.

Frequently asked questions

What is Cohen's kappa used for?

+
Cohen's kappa measures how well two reviewers agree when they classify the same items, correcting for the agreement you would expect by chance alone. In a systematic review it is most often used to report inter-rater reliability at the title and abstract screening stage, where two reviewers each decide whether each record should be included or excluded.

How do you calculate Cohen's kappa?

+
Build a table of how the two reviewers classified each record, then compute the observed agreement (the proportion of records they classified the same way) and the expected agreement (the proportion they would match by chance, from the marginal totals). Kappa is the observed agreement minus the expected agreement, divided by one minus the expected agreement. The calculator above does this from your four cell counts.

What is an acceptable kappa value?

+
Using the common Landis and Koch benchmarks, a kappa of 0.41 to 0.60 is moderate, 0.61 to 0.80 is substantial, and above 0.81 is almost perfect. Many review teams aim for substantial agreement or better before screening continues, and resolve a low kappa by clarifying the eligibility criteria and re-piloting on a fresh sample rather than pressing on.

Can you calculate Cohen's kappa in Excel?

+
Yes, Excel can compute kappa from a two-by-two agreement table using the observed and expected agreement formulas, though it has no single built-in kappa function so you build it from cell references. The calculator on this page saves that setup: enter the four counts and it returns kappa, the observed agreement, and the chance agreement directly.

What is the difference between ICC and Cohen's kappa?

+
Cohen's kappa is for categorical decisions by two raters, such as include or exclude, while the intraclass correlation coefficient (ICC) is for continuous or ordinal ratings, such as a quality score out of ten. For systematic review screening, where the decision is a yes or no inclusion call, Cohen's kappa is the appropriate statistic.

Create Your PRISMA 2020 Flow Diagram Free

Enter your study selection numbers and get a publication-ready PRISMA 2020 flow diagram with live preview, all four official templates, and an instant high-resolution PNG download.

Start Creating Your Diagram