What the deduplication rate tells you
When you search several databases for a systematic review, the same study is usually indexed in more than one of them. The deduplication rate is the share of your total search results that turned out to be duplicate records of studies you had already found. It is calculated as the number of duplicate records removed divided by the total records identified, expressed as a percentage.
The figure matters for two practical reasons. First, the records to screen number, the count that remains after duplicates are gone, is the workload your reviewers actually face during title and abstract screening. Second, the PRISMA 2020 flow diagram requires you to report the duplicates separately in the identification phase, so you need a clean, defensible number. For a fuller treatment of the screening counts that follow, see our guide on what every number in the PRISMA flow diagram means.
How to calculate your deduplication rate
- Total your database results. Add the number of records returned by each database you searched. The calculator above sums these for you as you type.
- Add records from other sources. Include records found through registers, citation searching, and other methods beyond database searching, since PRISMA 2020 counts these in the identification phase too.
- Enter the records remaining after deduplication. This is the count your reference manager or screening platform reports once duplicates are merged or deleted.
- Read off your figures. The duplicates removed, deduplication rate, and the records to screen are calculated automatically, along with where each value belongs in the flow diagram.
A worked example
Suppose your searches returned 1,240 records from MEDLINE, 1,890 from Embase, and 410 from Cochrane CENTRAL, plus 60 records from citation searching. Your total records identified is 3,600. After importing everything into your reference manager and removing duplicates, 2,480 unique records remain.
- Duplicate records removed: 3,600 minus 2,480 equals 1,120.
- Deduplication rate: 1,120 divided by 3,600, which is 31.1 percent.
- Records to screen: 2,480.
In the flow diagram you would report 3,540 records identified from databases, 60 from other sources, 1,120 duplicate records removed before screening, and 2,480 records screened. You can carry those numbers straight into the free PRISMA flow diagram generator to produce the figure.
Common mistakes when reporting duplicates
- Counting duplicates as exclusions. Duplicates are removed before screening and never appear in the excluded at screening count. Mixing them up inflates your exclusion numbers and breaks the flow diagram arithmetic.
- Deduplicating after screening has started. Removing duplicates late means the same study can be screened twice with conflicting decisions. Always deduplicate first.
- Forgetting other sources. Records from citation searching and registers belong in the total identified. Leaving them out understates your search and your duplicate count.
- Reporting a single combined removed number. PRISMA 2020 asks you to separate duplicates from records removed by automation and other reasons. Keep the duplicate figure on its own line. Our guide on removing duplicate references the right way walks through the workflow in detail.