GitHub - keon/jepa: implementing minimal versions of joint-embedding predictive architecture (JEPA)
Summary
A GitHub repository providing minimal, standalone PyTorch reimplementations of JEPA family models (I-JEPA, V-JEPA, V-JEPA 2, C-JEPA) for educational purposes, including tutorials and visualization tools.
View Cached Full Text
Cached at: 05/13/26, 12:36 AM
keon/jepa
Source: https://github.com/keon/jepa
jepa
Minimal, single-file PyTorch reimplementations of the JEPA family, with paired tutorials.
| File | Method | Dataset | LOC | Tutorial |
|---|---|---|---|---|
ijepa.py | I-JEPA | CIFAR-10 | 160 | ijepa_tutorial.md |
vjepa.py | V-JEPA | Moving MNIST | 188 | vjepa_tutorial.md |
vjepa2.py | V-JEPA 2 + V-JEPA 2-AC | synthetic moving digits | 278 | vjepa2_tutorial.md |
cjepa.py | C-JEPA | 3-digit bouncing video | 174 | cjepa_tutorial.md |
Each algorithm file is standalone — only depends on torch and torchvision, no shared utilities. The matching <algo>_extras.py adds visualization (mask grids, loss curves, PCA/LDA/t-SNE evolution, linear probe).
Quick start
git clone [email protected]:keon/jepa.git
cd jepa
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt # pinned versions, see below
python ijepa.py # train I-JEPA only (no plots)
python ijepa_extras.py # train + write all visualizations + linear probe
Runs on CUDA, MPS, or CPU. CIFAR-10 / MNIST datasets auto-download to ./data/.
Reproducibility
The repo pins exact versions in requirements.txt and pyproject.toml:
python >= 3.10 (tested on 3.13.5)
torch == 2.11.0
torchvision == 0.26.0
matplotlib == 3.10.9
scikit-learn == 1.8.0 # used by ijepa_extras for t-SNE
numpy == 2.4.4
pillow == 12.2.0
Install as a package instead of installing requirements directly:
pip install -e .
What’s where
.
├── ijepa.py / ijepa_extras.py # I-JEPA on CIFAR-10
├── vjepa.py / vjepa_extras.py # V-JEPA on Moving MNIST
├── vjepa2.py / vjepa2_extras.py # V-JEPA 2 + V-JEPA 2-AC (synthetic)
├── cjepa.py / cjepa_extras.py # C-JEPA on 3-digit bouncing video
├── ijepa_tutorial.md # walk-throughs that match the code
├── vjepa_tutorial.md
├── vjepa2_tutorial.md
├── cjepa_tutorial.md
├── papers/ # the four source PDFs
├── samples/ # mask grids, loss curves, PCA/LDA/t-SNE plots
└── figs/ # paper figures referenced by tutorials
The methods, in one paragraph each
I-JEPA (Assran et al. 2023) — predict embeddings of held-out image patches from embeddings of visible patches. EMA target encoder, multi-block masking, smooth-L1 loss. The canonical self-supervised JEPA.
V-JEPA (Bardes et al. 2024) — same recipe, but 3D tubelet patches over video. Two mask groups (short-range + long-range tubes), L1 loss, EMA 0.998 → 1.0.
V-JEPA 2 (Assran et al. 2025) — two-phase: V-JEPA pretraining followed by V-JEPA 2-AC, an action-conditioned predictor trained on frozen-encoder latents with teacher forcing + rollout. The encoder is frozen in phase 2; no EMA.
C-JEPA (Nam et al. 2026) — object-level trajectory masking with an identity anchor at t=0. No EMA. Bidirectional transformer over flattened slot tokens. Built on top of a pretrained object-centric encoder (VideoSAUR in the paper; we use a frozen embedding lookup as a documented stand-in).
Caveats
These are educational reimplementations:
- ViT-tiny, not ViT-Huge. CIFAR-10 / Moving MNIST / synthetic videos, not ImageNet / Kinetics.
- I-JEPA hits ~52.7% linear probe on CIFAR-10 after 100 epochs. The paper’s numbers come from ViT-H/14 on ImageNet for 300 epochs — different planet of compute.
- C-JEPA skips slot discovery (uses oracle positions). Real C-JEPA requires VideoSAUR pretraining (~100k steps) on top of frozen DINOv2 features.
- V-JEPA 2-AC’s action-conditioning gap stays small in our toy because the data is too easy; the machinery is correct but the signal needs richer data to show up.
Each tutorial discloses the specific deviations from its source paper.
License
MIT.
Similar Articles
I built Micro-JEPA: A lightweight JEPA (Joint Embedding Predictive Architecture) in Python
Micro-JEPA is a lightweight Python implementation of the Joint Embedding Predictive Architecture (JEPA), enabling an agent to learn environment representations, predict future states in latent space, and plan actions to avoid obstacles.
@AbdelStark: It’s time to JEPA pill the world! awesome-jepa: A curated list of papers, models, code, datasets, and learning resource…
A curated list of papers, models, code, datasets, and learning resources for Joint Embedding Predictive Architectures (JEPA), the self-supervised approach to world models proposed by Yann LeCun.
DVD-JEPA: an open-source, fully-reproducible JEPA world model [P]
DVD-JEPA is an open-source, minimal JEPA world model that learns representations from video by predicting future embeddings rather than pixels. It uses a bouncing DVD logo to demonstrate position recovery, dreaming, and anomaly detection, all running in a browser.
@iScienceLuvr: Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging This paper introduces Neuro-JEPA, a foun…
This paper introduces Neuro-JEPA, a foundation model that uses a latent predictive objective and Mixture-of-Experts architecture to encode brain MRI scans across T1w, T2w, and FLAIR sequences, pretrained on a large dataset of 1.55 million scans.
The 90-year-old idea behind JEPA models: Canonical Correlation Analysis
This blog post explains the connection between JEPA (Joint Embedding Predictive Architecture) models and Canonical Correlation Analysis (CCA), a statistical method from 1936, arguing that CCA is the conceptual precursor to JEPA and that the idea of maximizing correlation in embedding space dates back to Hotelling.