The Newcastle-Ottawa Scale is a tool for judging the methodological quality of non-randomised, observational studies, specifically cohort and case-control designs, within a systematic review. It awards stars across three domains, Selection, Comparability, and Outcome (or Exposure), and the total number of stars gives a quick read on how much weight a study's findings deserve. It exists because the tools built for randomised trials do not fit observational designs, where the central threats are selection and confounding rather than randomisation failures.
Most reviews use it as a screening lens rather than a precise instrument: it flags which studies are robust and which are fragile, informing how much they should influence the conclusions.
Why observational studies need their own tool
Randomised trials and observational studies fail in different ways, so they need different quality tools. A trial's main vulnerabilities are in how randomisation and blinding were handled; a cohort or case-control study has no randomisation to scrutinise, and its main vulnerabilities are whether the groups were comparable to begin with and whether exposure and outcome were measured soundly. The Newcastle-Ottawa Scale is shaped around exactly those observational threats, which is why it sits alongside, not in competition with, trial-specific tools like the Cochrane risk of bias tool. Choosing the right instrument for the design is part of any defensible quality assessment in a systematic review.
How the star system works
The scale is scored by awarding stars, with separate versions for cohort and case-control studies. A study can earn a maximum of nine stars distributed across three domains.
| Domain | What it assesses | Maximum stars |
|---|---|---|
| Selection | How representative and well-defined the groups are | 4 |
| Comparability | Whether the design or analysis controls for confounders | 2 |
| Outcome / Exposure | How soundly the outcome or exposure was ascertained | 3 |
Within each domain, you award one star per item if the study meets the stated standard, with Comparability allowing up to two stars for adjusting for the most important confounder and additional factors. Reading the manual's decision rules for each item is what keeps scoring consistent between reviewers; awarding stars by impression rather than by the rules is the main source of disagreement.
What counts as a good score
There is no universal cutoff fixed by the scale's authors, but a widely used convention treats roughly seven to nine stars as good or high quality, around five to six as moderate or fair, and below five as low quality. Because these thresholds are conventions rather than validated rules, you should state in your methods exactly which cutoffs you applied, so readers can judge your quality categories on their own terms.
How to interpret and use the result
The total stars are not a verdict to be reported and forgotten; they should inform the synthesis. Low-scoring studies might be down-weighted, examined in a sensitivity analysis, or discussed as a limitation, while a body of evidence dominated by low-quality studies tempers how confidently you can state conclusions. The scale also feeds into wider certainty frameworks such as GRADE, where study-level risk of bias is one input among several. Present the scores transparently, ideally as a per-study table, so readers see how each study fared rather than only a summary label.
Where it fits in the review
Quality assessment with the Newcastle-Ottawa Scale happens after data extraction and before, or alongside, synthesis. It is one stage in the broader systematic review process, and the studies you assess are those that survived your eligibility screening. Once the review is complete and you are assembling the figures, the free PRISMA flow diagram generator documents how those studies were selected in the first place.
Frequently Asked Questions
What does the Newcastle-Ottawa Scale do?
The Newcastle-Ottawa Scale assesses the methodological quality and risk of bias of observational studies, namely cohort and case-control designs, in a systematic review. It awards stars across three domains, Selection, Comparability, and Outcome or Exposure, giving each study a quality rating that helps reviewers decide how much weight its findings should carry.
How do you calculate the Newcastle-Ottawa Scale?
You use the appropriate cohort or case-control version and award one star for each item a study satisfies, following the manual's decision rules. Selection items contribute up to four stars, Comparability up to two, and Outcome or Exposure up to three, for a maximum of nine. Two reviewers should score independently and reconcile differences, then the stars are summed for each study.
What is a good Newcastle-Ottawa Scale score?
The scale's authors set no official cutoff, but a common convention treats about seven to nine stars as high quality, five to six as moderate, and below five as low quality. Because these thresholds are conventions, you should state explicitly which cutoffs your review used so that readers can interpret the quality categories consistently.
How do you interpret the Newcastle-Ottawa Scale?
Interpret the star total as a guide to how much confidence a study's results deserve, not as a precise measurement. Use it to inform the synthesis: down-weight or sensitivity-test low-scoring studies, and temper your conclusions when the evidence base is dominated by low-quality studies. Reporting the scores in a per-study table makes the assessment transparent.