Machine Learning + Software Project

VLA Training on Robot in Isaac Lab

Training Vision-Language-Action models on a mobile manipulator robot in NVIDIA Isaac Lab for autonomous retail shelf manipulation tasks.

Overview

This project builds a full training pipeline for Vision-Language-Action models within NVIDIA Isaac Lab. The simulated environment replicates a retail grocery aisle where a mobile manipulator must respond to natural language commands and execute the corresponding manipulation sequence.

VLA models combine vision encoders, language understanding, and action prediction into a unified architecture, enabling generalist robot policies that can follow open-ended instructions.

Software Architecture

The simulation environment is built in Isaac Lab using USD-based scene composition. The robot model (mobile base + 6-DOF arm with gripper) is imported from URDF and configured with accurate physics properties and articulation drives.

Training uses a combination of imitation learning from teleoperated demonstrations and reinforcement learning for policy refinement. Domain randomization is applied to lighting, object textures, positions, and robot dynamics to improve sim-to-real transfer.

Results & Achievements

Ongoing training has demonstrated the robot successfully interpreting shelf manipulation commands and executing pick-and-place sequences in simulation. Domain randomization significantly improved policy robustness across varied object placements. The pipeline is designed for eventual deployment on physical hardware for real-world retail automation tasks.