How We Identify Potential Research Gaps at mineris

Most literature tools are designed to help users retrieve papers. Far fewer are designed to help users understand where evidence is limited, fragmented, or absent.

At mineris, we treat research gap detection primarily as a coverage analysis problem, rather than an inference problem. Instead of asking a model to generate gaps directly from unstructured text, we first construct a structured representation of the literature and then identify areas in which evidence coverage is limited.

Structured coverage map for candidate research gap detection

From articles to structured evidence

Each article in a corpus is transformed into structured fields based on the PICO framework:

Population
Intervention
Outcome
optionally, Study design

Before analysis, these fields are normalized, cleaned, deduplicated, and mapped to more consistent forms. This step is important because even minor variation in terminology can fragment the evidence landscape and obscure meaningful patterns.

The objective is not to impose a perfect ontology on the literature. Rather, it is to produce a representation that is sufficiently consistent to support systematic analysis of what the corpus actually contains.

Constructing a coverage map

Once articles have been structured, we identify the most frequent terms along each axis, such as population, intervention, and outcome. These terms define a bounded analysis space.

We then construct a PICO coverage grid, in which each cell corresponds to a specific combination of:

Population × Intervention × Outcome
optionally extended by Study design

Each cell stores the set of articles supporting that exact combination. In this way, a collection of publications is converted into a structured map of evidence density.

Detecting low-coverage regions

Within this coverage grid, all combinations inside the selected analysis space are evaluated. Each cell can then be characterized as one of the following:

Well-studied, when supported by multiple articles
Sparse, when supported by only a small number of articles
Empty, when no supporting articles are present in the corpus

Sparse and empty cells form the basis of what we describe as candidate research gaps. However, absence of evidence within a grid cell should not automatically be interpreted as a meaningful gap.

Filtering likely artifacts

A central component of the method is filtering. Raw combinations can generate misleading outputs due to:

extraction noise
inconsistent terminology
semantically incompatible terms

To reduce these artifacts, we apply two constraints before labeling a cell as a candidate gap.

1. Semantic plausibility

We remove combinations in which:

an outcome resembles a population label or intervention term
terms belong to clearly different clinical domains
the combination is biologically or clinically implausible

2. Pairwise support

A candidate combination must also show at least minimal co-occurrence among its constituent parts somewhere in the corpus:

population + intervention
population + outcome
intervention + outcome

If none of these pairwise relations appear, the full combination is more likely to be an artifact of the combinatorial search space than a credible low-coverage research area.

What the system returns

After filtering, the remaining cells are classified as:

empty, indicating no evidence in the corpus
sparse, indicating minimal evidence

These are returned as candidate research gaps together with:

supporting articles, when present
examples of well-studied combinations for context

The resulting output is intended to be:

interpretable
inspectable
suitable for downstream synthesis or prioritization

Why we refer to them as candidate gaps

This approach does not claim to identify definitive research gaps automatically.

An empty or sparse cell may reflect:

a genuinely under-studied question
limitations of the corpus
differences in terminology
a clinically unimportant combination

For that reason, we treat these outputs as structured signals rather than conclusions. They are intended to support expert review, not replace it.

Why this approach matters

Many gap-detection systems depend heavily on model-based inference. Our approach follows a different sequence:

first, map what is present in the literature
then, identify where coverage is limited
only afterward, interpret those low-coverage areas

This makes the process:

more transparent
easier to validate
less dependent on opaque model behavior

What comes next

Coverage alone is not sufficient to define a research gap. A more complete system should also account for:

study quality
recency
consistency of findings
patient relevance

Even so, coverage remains a useful foundation.

By transforming literature into a structured evidence map and highlighting its sparse regions, we provide researchers with a more systematic starting point for identifying potentially important unanswered questions.

Not only what has been studied, but also what has not been studied sufficiently, and whether that absence is likely to matter.