Deduplication Rate Calculator for Systematic Reviews

Enter the records each database returned and the number left after you removed duplicates. This free calculator works out your total records identified, the duplicates removed, your deduplication rate, and the exact figures to enter in your PRISMA 2020 flow diagram.

Your search results

Enter the number of records returned by each database, then the number of records that remained after you removed duplicates.

Your deduplication figures

Enter your database counts and the records remaining after deduplication to see your figures and where each number belongs in the PRISMA 2020 flow diagram.

What the deduplication rate tells you

When you search several databases for a systematic review, the same study is usually indexed in more than one of them. The deduplication rate is the share of your total search results that turned out to be duplicate records of studies you had already found. It is calculated as the number of duplicate records removed divided by the total records identified, expressed as a percentage.

The figure matters for two practical reasons. First, the records to screen number, the count that remains after duplicates are gone, is the workload your reviewers actually face during title and abstract screening. Second, the PRISMA 2020 flow diagram requires you to report the duplicates separately in the identification phase, so you need a clean, defensible number. For a fuller treatment of the screening counts that follow, see our guide on what every number in the PRISMA flow diagram means.

How to calculate your deduplication rate

  1. Total your database results. Add the number of records returned by each database you searched. The calculator above sums these for you as you type.
  2. Add records from other sources. Include records found through registers, citation searching, and other methods beyond database searching, since PRISMA 2020 counts these in the identification phase too.
  3. Enter the records remaining after deduplication. This is the count your reference manager or screening platform reports once duplicates are merged or deleted.
  4. Read off your figures. The duplicates removed, deduplication rate, and the records to screen are calculated automatically, along with where each value belongs in the flow diagram.

A worked example

Suppose your searches returned 1,240 records from MEDLINE, 1,890 from Embase, and 410 from Cochrane CENTRAL, plus 60 records from citation searching. Your total records identified is 3,600. After importing everything into your reference manager and removing duplicates, 2,480 unique records remain.

  • Duplicate records removed: 3,600 minus 2,480 equals 1,120.
  • Deduplication rate: 1,120 divided by 3,600, which is 31.1 percent.
  • Records to screen: 2,480.

In the flow diagram you would report 3,540 records identified from databases, 60 from other sources, 1,120 duplicate records removed before screening, and 2,480 records screened. You can carry those numbers straight into the free PRISMA flow diagram generator to produce the figure.

Common mistakes when reporting duplicates

  • Counting duplicates as exclusions. Duplicates are removed before screening and never appear in the excluded at screening count. Mixing them up inflates your exclusion numbers and breaks the flow diagram arithmetic.
  • Deduplicating after screening has started. Removing duplicates late means the same study can be screened twice with conflicting decisions. Always deduplicate first.
  • Forgetting other sources. Records from citation searching and registers belong in the total identified. Leaving them out understates your search and your duplicate count.
  • Reporting a single combined removed number. PRISMA 2020 asks you to separate duplicates from records removed by automation and other reasons. Keep the duplicate figure on its own line. Our guide on removing duplicate references the right way walks through the workflow in detail.

Frequently asked questions

How do you calculate the deduplication ratio?

+
Add up the records returned by every database and other source to get the total records identified. Subtract the number of records that remain after you remove duplicates. That difference is the number of duplicates removed. Divide the duplicates removed by the total records identified and multiply by 100 to express the deduplication rate as a percentage.

What is the dedup rate?

+
The deduplication rate is the share of your identified records that turned out to be duplicates, written as a percentage. Searches across three or more large databases such as MEDLINE, Embase, and Cochrane CENTRAL commonly land somewhere around 20 to 40 percent because the databases index many of the same journals. The rate is descriptive, not a quality threshold, so report your own figure rather than aiming for a target.

What does deduplication mean?

+
Deduplication is the step of merging records that describe the same article so each one is counted and screened only once. When you search several databases the same paper is returned multiple times, and every appearance counts as a separate record at identification. Deduplication resolves that overlap before title and abstract screening begins.

Does Covidence automatically remove duplicates?

+
Yes. Covidence runs an automatic deduplication pass as you import references and reports the number it removed, and Rayyan offers a similar detect-and-merge feature. Automated matching catches the clear repeats, but you should still review the borderline pairs the tool is unsure about, since variant titles, accents, or abbreviated journal names can split one article into two records.

How do you remove duplicates in EBSCOhost?

+
EBSCOhost has no built-in deduplication across separate database exports, so export your results with full bibliographic fields and deduplicate in a reference manager or screening platform instead. Import every export into one library, run the automated match on title, author, year, and identifiers, then manually adjudicate the near matches before recording the final duplicate count.

Create Your PRISMA 2020 Flow Diagram Free

Enter your study selection numbers and get a publication-ready PRISMA 2020 flow diagram with live preview, all four official templates, and an instant high-resolution PNG download.

Start Creating Your Diagram