Hourly · 2026-07-03 18:00 UTC

First Vision-Language Model Identifies Targets in Orbit — No Humans Needed

Loft Orbital's YAM-9 satellite, running Google DeepMind's Gemma 3 and NASA JPL's NAVI-Orbital software, became the first spacecraft to autonomously identify targets using a vision-language model.

For the first time, a satellite in orbit has found what it was looking for entirely on its own — no human analysts, no ground-station instructions, no delay.

The milestone occurred in April aboard YAM-9, a spacecraft built by space infrastructure company Loft Orbital, and was first reported in June. Onboard, Google DeepMind's Gemma 3 vision-language model — running on a Nvidia Jetson Orin AGX GPU — analyzed Earth imagery and responded to natural language queries, identifying infrastructure around railway hubs, classifying terrain where nature meets human development, and flagging points of interest without any human in the loop.

The software harness, called NAVI-Orbital, was developed by NASA's Jet Propulsion Laboratory. Unlike traditional Earth observation, where satellites dump raw data to ground stations for human analysts to sift through, YAM-9 triages imagery on orbit — deciding what matters before it ever sends a byte home.

"It opens the door to always-on, patrol layers in space," said Paul Lasserre, Loft's head of AI. "If you have a VLM, you can have logic — like 'monitor this border for me, and let me know when something is suspicious,' and interact back and forth with the satellites."

The achievement marks the first reported use of a vision-language model in orbit for autonomous decision-making. It follows a related April milestone from Swiss startup DPhi Space, whose Clustergate-2 platform ran a Liquid AI vision model aboard Momentus' Vigoride 7, demonstrating on-orbit image captioning — but Loft and JPL went further, proving a satellite can not just describe what it sees, but independently decide what's worth looking at.

The implications stretch well beyond defense. Real-time disaster monitoring, illegal fishing detection, deforestation alerts, and climate tracking all stand to accelerate when satellites can make decisions at the edge rather than waiting for a ground station pass. Loft Orbital, which currently operates 12 spacecraft, estimates 50 to 100 YAM-9-class satellites could provide real-time coverage of anywhere on Earth.

Other companies are already following. Planet Labs flies satellites with the same Jetson Orin processors and says VLM research is underway. Kepler Communications operates the largest group of GPUs in space but remains tight-lipped about specific AI workloads due to NDAs.

JPL's vision extends even further. Juan Delfa Victoria, who led the NAVI-Orbital project, said the idea was originally conceived as a digital assistant for astronauts on the Moon or Mars — an AI that could understand spoken commands and interpret what it sees through a helmet visor.

For now, it's doing something simpler but no less profound: teaching satellites to see and think for themselves.

Sources:

First Vision-Language Model Identifies Targets in Orbit — No Humans Needed

首个视觉语言模型能识别轨道目标——无需人类干预

More Hourlies Stories

More from Anagnorisis

First Vision-Language Model Identifies Targets in Orbit — No Humans Needed

首个视觉语言模型能识别轨道目标——无需人类干预

More Hourlies Stories

More from Anagnorisis

Stay in the loop