MAP² — microRNAs Analysis Portal

User guide

A walk-through of the analyses MAP² offers to the final user, with a note at the end about what changes respect to the legacy MAP.

Background

MAP² home — at-a-glance overview with stat tiles and feature cards

MAP², the microRNAs Analysis Portal, is a powerful resource to build and explore meaningful biological queries around microRNA-related literature. MAP² relies on SMAC, an automated data selection and retrieval system implemented by our group. Gene Expression Omnibus (GEO) identifiers are used to establish computational links between literature and any associated data: when data is available in the public domain, the system downloads the underlying expression matrices and feeds an analytical pipeline (PCA, functional enrichment, sample × sample and gene × gene correlation, breast-cancer specific profiling).

Once you access the Explore tab you can query the full microRNA-related corpus with ease: literature features (year, journal), medical terms (MeSH), free-text search and the list of analyses available for each linked dataset are all filterable. Selecting a publication that has GEO data attached opens the Dataset page, where the pre-computed analyses can be inspected and live analyses can be run on a user-selected subset of samples.

Exploring the literature

Filter the corpus by journal, year, has-data, and analysis availability; each filter narrows the live count

The Explore tab presents an interactive table of all the microRNA-related publications curated by SMAC. Each row shows the PMID (linked to PubMed), title and authors, journal, year, impact factor and the SMAC composite score. When a paper has one or more linked GEO datasets the row carries a data · GSE… ↗ badge and a list of the pre-computed analyses available for that dataset.

Filters can be applied in any combination (they combine with AND; multiple values in the same dropdown combine with OR):

The dropdown counts next to each value reflect the filters you have already applied elsewhere: pick a journal and the MeSH / keyword / analysis dropdowns immediately update to show how many options remain in that subset.

Access dataset details

Dataset page — paper metadata, analysis tabs (PCA / Correlation / Enrichment / Breast cancer / Live), and per-sample table

An interactive table containing the metadata of the samples that compose the dataset is presented at the top of the Dataset page. Columns can be hidden by clicking their header; rows can be individually ticked, selected in bulk (Select all), inverted or cleared. The selection drives the Live analysis tabs further down the page: any sample subset you build here is what PCA, heatmap, correlation and gene-network run against.

Principal Component Analysis

PCA scatter — samples coloured by biological group; variance explained is shown on each axis

Principal component analysis (PCA) reduces the dimensionality of data while retaining most of the variation in the dataset, making it possible to visually assess similarities and differences between samples and determine whether samples can be grouped. This exploratory analysis makes it easier to identify the key factors that could be affecting the variability within expression data.

For each dataset the Pre-computed PCA tab presents an interactive 2D / 3D scatter plot of the first principal components, with samples coloured by an automatically inferred grouping variable; a barplot underneath shows the fraction of total variance attributed to each PC (the scree plot). The same PCA can be re-run on a user-selected subset via the Live analysis · PCA sub-tab — useful for interrogating a specific contrast inside a larger study.

Expression profiles

Per-gene expression boxplot across sample groups; outliers visible above the upper whiskers

The Gene boxplot live-analysis tab allows tracking the changes of a single gene of interest across the biological conditions present in the selected samples. Expression values are shown both as quartile boxplots (one box per group) and as per-sample bars, so the granularity of differences between individual samples is preserved alongside the summary view.

The Heatmap tab presents the z-scored expression levels of the most-variable genes (top N, default 50) across all samples in the selection, with rows and columns clustered hierarchically. This is the closest equivalent to the Differentially expressed view of the legacy MAP.

Gene interaction network

Interaction subnetwork around the genes of interest; hub size encodes node degree

For each subset of samples defined in the Samples tab, the interactions between user-chosen seed genes and their primary neighbours are displayed in an interactive network. Nodes represent the genes and are coloured according to their mean expression z-score across the selected samples; edges represent the interactions reported in the bundled interactions.tsv database, a SIGNOR / mentha-derived collection that ships with MAP² and is refreshed at every updater run.

Correlation among genes

Clustered Pearson correlation heatmap across the top-variance genes (or samples)

This module performs pairwise comparisons of expression levels between user-defined genes (between 2 and 50) in the same dataset. For each comparison Pearson, Spearman or Kendall coefficients are calculated; results are presented as an interactive heatmap. The colour of each cell indicates the correlation coefficient between the genes on the x and y axes, with the colour key on the right (red = positive, blue = negative).

A Pre-computed correlation tab is also available for every dataset that completed step 4c of the pipeline: it shows either the sample × sample or the gene × gene matrix at full resolution, with one click to switch between the two.

Functional enrichment

Top enriched pathways across KEGG, GO BP, Reactome and WikiPathways; bars show -log10(FDR)

Where the SMAC pipeline detected a meaningful set of input genes for a dataset, MAP² exposes a Pre-computed enrichment tab. For each enrichment library (GO biological process, KEGG, WikiPathways) MAP² embeds the interactive gseapy-rendered plot and offers a CSV download of the underlying table. The input gene list itself is shown alongside, with a download button.

This module replaces the dot-plot / upset / heatplot triplet that the legacy MAP attached to MirCompare runs. In MAP² the enrichment is computed once per dataset, at pipeline time, so it is available for browsing without launching a job.

Breast-cancer profile (receptor status & tumour purity)

Per-sample receptor status (ER / PR / HER2) plus tumour purity, derived from expression signatures

When the SMAC pipeline identifies at least one breast-cancer sample in a dataset, it computes two complementary readouts and surfaces them in the Pre-computed breast cancer tab on the dataset page:

Both panels are now fully interactive — earlier static PNGs are still available as a download alongside the underlying CSVs (receptor_status.csv, tumour_purity.csv).

Corpus-wide statistics

Corpus-wide aggregates — top journals, year distribution, and the cleaned keyword/MeSH bar charts

The Statistics tab presents an aggregated view of the curated literature: top MeSH terms, top keywords, top journals, publications-per-year time series and a barchart of how many datasets carry each precomputed analysis. The counts are computed live from the same DuckDB views that drive the Explore page, so a re-run of the updater is reflected on the next page load.

MirCompare

MirCompare — plant ↔ host miRNA alignment with the highlighted seed region, filtered hits, and downstream TarBase + Enrichr enrichment

Background

MirCompare compares libraries of miRNAs belonging to organisms from the plant and animal kingdoms, to find cross-kingdom functional homologies. MAP² ships with a faithful re-implementation of the original two-tier alignment scheme, improved with respect to speed and quality of predictions while respecting the concept of functional homology coined by our previous studies. Analyses are submitted to a background worker (default 2 concurrent jobs) and the user is notified by email when the run is complete; the per-job UUID is the access token for its results.

A renovated strategy of alignment

The methodology of alignment uses a scoring system that takes into consideration the presence of open and extended gaps in the global (whole sequence) and local (seed-specific) alignments. Following the previous version, the global alignment assigns +1 in case of match and 0 otherwise, normalised by alignment length. The seed-specific alignment (last 8 nt) is much more stringent: −0.5 for mismatch, −1 for open gap, −1 for extended gap. Filtering is applied as global ≥ G AND seed ≥ S AND both p < 0.05.

Assessing the statistical significance

For every comparison the system assesses whether the magnitude of the alignment is far from randomness. Given two sequences A (plant) and B (mammalian), MirCompare determines the nucleotide composition of B and generates N scrambled sequences B' (default 50). A series of SA,B' alignment scores is calculated and a one-sample t-test is performed on the observed score against the scrambled distribution. The procedure is applied independently to the global and seed alignments, yielding two p-values per comparison.

Submitting a job

The submission form on the MirCompare page accepts plant and host miRNA libraries either as a FASTA paste or as a file upload (.fa / .fasta / .txt). The user can tune the global and seed thresholds, the number of scrambles per pair (more scrambles = tighter p-values but longer runtime) and supply an email address to receive the completion notice with a link to the results page. The results page lists job state, file downloads (the input FASTAs, comparison.full.tsv, comparison.filtered.tsv, summary.json) and a paginated preview of the comparison tables.

COMPASS

COMPASS — exogenous miRNAs scored against human gene 3′ UTRs with a Gradient Boosting model; rank curve, top-N bars, score heatmap across all submitted miRNAs/genes, and per-unit result cache

COMPASS (the COMPASS tab in the main menu) is a sequence-based machine-learning classifier that scores exogenous miRNA → human gene pairs for likely targeting. Where MirCompare answers "does this exogenous miRNA look like a known host miRNA?", COMPASS answers the harder question "which human genes is this exogenous miRNA likely to target?". The bundled source-species reference pools (ath, osa, zma, gma) are convenient defaults for cross-kingdom work; a custom FASTA upload lets you point COMPASS at any other source.

The model was trained on experimentally-validated human miRNA-target interactions from miRTarBase using three feature families:

Two analysis directions are offered:

A typical single-sequence run in fast mode finishes in ~10 seconds; the full all-genes forward scan takes 10–30 minutes, so leave an email address on the form to be notified when results land. Per-unit caching (keyed on the canonical input) means re-submitting the same miRNA or gene — or even a FASTA that overlaps a previous one — returns instantly. Submitted analyses follow the same 30-day retention policy as MirCompare jobs.

What changes compared with the legacy MAP

MAP² is a complete rebuild of the original MAP. All the biological analyses you knew are still here, with several improvements you'll notice as you use the site:

More — and more accurate — data per paper

Cleaner search and statistics

Per-dataset analyses

Day-to-day use

Two features of the legacy MAP are intentionally not carried over: