MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations: what it means for business leaders
MEETI unifies raw ECG traces, high-resolution plots, beat-level metrics and GPT-crafted clinical narratives, letting health-tech teams prototype multimodal cardiology AI and validation pipelines without licensing hurdles or heavy preprocessing.
1. What the method is
MEETI extends MIMIC-IV-ECG with 780 k studies, each offering four perfectly aligned modalities: ten-second raw signals, 300-dpi paper-style images, beat-wise quantitative features from FeatureDB, and GPT-4o–generated interpretive reports grounded in machine metrics. It is the first open corpus ready for large vision-language or signal-text models in cardiology.
2. Why the method was developed
Multimodal cardiac AI needs synchronised waveforms, images and texts, yet public datasets rarely offer more than one channel. MEETI eliminates costly data-wrangling and licensing barriers by auto-deriving images and features from open signals and generating rich narratives, creating a turnkey sandbox for research and product teams.
3. Who should care
- Digital-health CTOs building ECG-aware virtual-care platforms
- Pharma and device teams validating cardiac-safety pipelines
- Academic groups exploring multimodal foundation models
- Regulators seeking reproducible cardiology-AI benchmarks
4. How the method works
De-identified signals are rendered with ecg_plot
, parsed by FeatureDB for fiducials and intervals, then paired with machine reports and metadata inside a GPT-4o prompt to draft human-style interpretations. All artefacts share a study ID and folder, enabling constant-time retrieval of any modality combination for downstream loaders.
5. How it was evaluated
Parameter distributions matched adult physiology norms; random image-signal audits confirmed visual fidelity, and cardiologists spot-checked GPT reports for clinical coherence. Throughput tests showed full-modal ingestion at ~150 studies / s on one CPU core, supporting scalable model pre-training.
6. How it performed
The dataset comprises 784 680 ECGs from 160 597 patients with zero missing modalities. A multimodal transformer trained on MEETI beat single-channel baselines by up to 8 F1 on arrhythmia detection and halved false negatives in long-QT screening while retaining explainability via beat-level features and text rationales. (Source: arXiv 2507.15255, 2025)
← Back to dossier index