VLA Training on Robot in Isaac Lab
Training Vision-Language-Action models on a mobile manipulator robot in NVIDIA Isaac Lab for autonomous retail shelf manipulation tasks.
Overview
This project builds a full training pipeline for Vision-Language-Action models within NVIDIA Isaac Lab. The simulated environment replicates a retail grocery aisle where a mobile manipulator must respond to natural language commands and execute the corresponding manipulation sequence.
VLA models combine vision encoders, language understanding, and action prediction into a unified architecture, enabling generalist robot policies that can follow open-ended instructions.
Software Architecture
The simulation environment is built in Isaac Lab using USD-based scene composition. The robot model (mobile base + 6-DOF arm with gripper) is imported from URDF and configured with accurate physics properties and articulation drives.
Training uses a combination of imitation learning from teleoperated demonstrations and reinforcement learning for policy refinement. Domain randomization is applied to lighting, object textures, positions, and robot dynamics to improve sim-to-real transfer.
Results & Achievements
Ongoing training has demonstrated the robot successfully interpreting shelf manipulation commands and executing pick-and-place sequences in simulation. Domain randomization significantly improved policy robustness across varied object placements. The pipeline is designed for eventual deployment on physical hardware for real-world retail automation tasks.