IJSMT Journal

International Journal of Science, Strategic Management and Technology

An International, Peer-Reviewed, Open Access Scholarly Journal Indexed in recognized academic databases · DOI via Crossref The journal adheres to established scholarly publishing, peer-review, and research ethics guidelines set by the UGC

ISSN: 3108-1762 (Online)
webp (1)

Plagiarism Passed
Peer reviewed
Open Access

SCALING LAWS AND ARCHITECTURAL ADVANCES OF HIERARCHICAL JEPA (H-JEPA) MODEL FOR PLANNING, CONTROL AND ROBOTICS IN PHYSICAL SYSTEMS

AUTHORS:
Mayank Lal
Mentor
Abdul Khalid
Affiliation
B.Tech (Information Technology) NIET, Greater Noida
CC BY 4.0 License:
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract

Hierarchical Joint-Embedding Predictive Architec- ture (H-JEPA) is increasingly viewed as a promising family of world models for embodied intelligence because it learns to pre-dict abstract future representations rather than reconstructing raw sensory inputs. This distinction is especially important in robotics, where successful control depends less on recovering exact pixels and more on learning compact state abstractions that are stable, semantically meaningful, and useful for planning. In this paper, we present an extended student-level study of H- JEPA from three complementary angles: architectural principles, scaling behaviour, and practical deployment for robotic planning and control. We first review the conceptual line from predictive coding and world models to JEPA, I-JEPA, V-JEPA, and recent action-conditioned variants. We then formalize a two-level H- JEPA suitable for physical systems, in which low-level predictors model short-horizon action-conditioned transitions and higher- level predictors produce temporally coarse sub-goals for long- horizon planning. Next, we analyze scaling trends with respect to encoder width, predictor depth, temporal hierarchy, and dataset size, arguing that downstream planning performance follows a weak power-law regime but saturates earlier than language- model loss scaling because control success is bottlenecked by rep- resentation utility, action coverage, and model-planner mismatch. We also describe a practical pipeline that maps raw observations to latent state estimation, hierarchical rollout, cross-entropy method planning, task-conditioned evaluation, and iterative re- finement. To ground the discussion, we compare a hand-tuned model-predictive controller against an H-JEPA-driven planner on simulated reaching and pushing tasks. The results suggest that hierarchy provides larger gains for long-horizon contact- rich behaviour than simply increasing parameter count, while the main engineering difficulties remain representation collapse, prompt or context sensitivity, latent oversmoothing, and the absence of a universally trustworthy proxy loss. In addition to quantitative comparisons, we include ablations, failure analysis, and workflow observations that highlight when hierarchical latent prediction genuinely helps and when human intervention remains indispensable. The goal of this work is not to claim state-of-the- art performance, but to provide a more detailed and structured foundation for future student projects on JEPA-style world models for robotics.


 

Keywords
Article Metrics
Article Views
72
PDF Downloads
6
HOW TO CITE
APA

MLA

Chicago

Copy

Lal, M. (2026). Scaling Laws and Architectural Advances of Hierarchical JEPA (H-JEPA) Model for Planning, Control and Robotics in Physical Systems. International Journal of Science, Strategic Management and Technology, 02(05). https://doi.org/10.55041/ijsmt.v2i5.169

Lal, Mayank. "Scaling Laws and Architectural Advances of Hierarchical JEPA (H-JEPA) Model for Planning, Control and Robotics in Physical Systems." International Journal of Science, Strategic Management and Technology, vol. 02, no. 05, 2026, pp. . doi:https://doi.org/10.55041/ijsmt.v2i5.169.

Lal, Mayank. "Scaling Laws and Architectural Advances of Hierarchical JEPA (H-JEPA) Model for Planning, Control and Robotics in Physical Systems." International Journal of Science, Strategic Management and Technology 02, no. 05 (2026). https://doi.org/https://doi.org/10.55041/ijsmt.v2i5.169.

References
1.LeCun, “A Path Towards Autonomous Machine Intelligence,” Open- Review Position Paper, 2022.

2.Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child,Gray, A. Radford, J. Wu, and D. Amodei, “Scaling Laws for Neural Language Models,” arXiv preprint arXiv:2001.08361, 2020.

3.Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, O. Vinyals, J. W. Rae, and Sifre, “Training Compute-Optimal Large Language Models,” in Proc. NeurIPS, 2022, pp. 30016–30030.

4.Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat,LeCun, and N. Ballas, “Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture,” in Proc. IEEE/CVF CVPR, 2023, pp. 15619–15629.

5.Schmidhuber, “Formal Theory of Creativity, Fun, and Intrinsic Mo- tivation,” IEEE Trans. Autonomous Mental Development, vol. 2, no. 3,230–247, 2010.

6.Ha and J. Schmidhuber, “World Models,” arXiv preprint arXiv:1803.10122, 2018.

7.Bardes, Q. Garrido, J. Ponce, X. Chen, M. Rabbat, Y. LeCun,Assran, and N. Ballas, “Revisiting Feature Prediction for Learning Visual Representations from Video,” arXiv preprint arXiv:2404.08471, 2024.

8.Garrido, R. Balestriero, L. Najman, and Y. LeCun, “On the Duality Between Contrastive and Non-Contrastive Self-Supervised Learning,” in Proc. ICLR, 2023.

9.Nair, A. Rajeswaran, V. Kumar, C. Finn, and A. Gupta, “R3M: A Universal Visual Representation for Robot Manipulation,” in Proc. CoRL, 2023.

10.Radosavovic, T. Xiao, S. James, P. Darrell, J. Malik, and T. Pinto, “Real-World Robot Learning with Masked Visual Pre-Training,” in Proc. CoRL, 2023.

 
Ethics and Compliance
✓ All ethical standards met
This article has undergone plagiarism screening and double-blind peer review. Editorial policies have been followed. Authors retain copyright under CC BY-NC 4.0 license. The research complies with ethical standards and institutional guidelines.
Indexed In
Similar Articles
Integrative Therapeutic Approaches in Polycystic Ovary Syndrome: From Conventional Pharmacotherapy to Herbal Interventions
string(13) "N.MOHANAPRIYA" N.MOHANAPRIYA,
(2026)
DOI: 10.55041/ijsmt.v2i4.159
Hybrid Solar and Hydro Powered Ocean Cleaning Robot
string(11) "Aishwarya M" M, A.et al.
(2026)
DOI: 10.55041/ijsmt.v2i5.372
Visionguard: an AI-Driven Real-Time Driver Attention and Distraction Monitoring System using Deep Learning
string(8) "Ismail H" H, I.et al.
(2026)
DOI: 10.55041/ijsmt.v2i4.452
Scroll to Top