# Vision-Language-Action (VLA) models are the breakthrough enabling physical AI

**Date:** 2025-12-18  
**Tags:** Robotics, AI, VLA  
**URL:** https://kelexine.is-a.dev/til/vla-models-robotics

---

TIL: Vision-Language-Action (VLA) models are the breakthrough enabling physical AI. They combine computer vision (understanding surroundings), language understanding (interpreting instructions), and action planning (generating motor commands). Models like RT-2 and Helix can follow natural language commands in real-world settings.


```python
# VLA model inference pattern
vision_embedding = vision_encoder(camera_frame)
language_embedding = text_encoder('pick up the red cup')
action_sequence = vla_model.predict(
    vision=vision_embedding,
    language=language_embedding
)  # Returns: [move_to(x,y,z), open_gripper(), grasp(), lift()]
```


---

*This content is available at [kelexine.is-a.dev/til/vla-models-robotics](https://kelexine.is-a.dev/til/vla-models-robotics)*