Presenting at AMOS With Dr. Hanspeter Schaub and Mark

Reinforcement Learning for Space-to-Space Imaging

Autonomous scheduling for imaging Resident Space Objects (RSOs) • AMOS 2025 • CU Boulder AVS Lab • Daniel Huterer Prats

Overview & motivation

Space-based SSA is unlike Earth imaging. Targets move fast, access windows are brief, and power, pointing, and downlink are tightly constrained. The goal is a flight policy that decides which RSO to image and when to downlink, while keeping the spacecraft healthy for the next opportunity.

The main study focuses on **LEO → LEO imaging**, which is the hardest regime: large relative motion, long eclipse periods, and frequent geometry changes—even complete target eclipses.

Scenario & simulator

Setup clip: LEO chaser with power/thermal/storage constraints imaging catalogued RSOs.

Formulating SBSS as a POMDP

The task is posed as a POMDP: noisy observations of orbital geometry, battery, storage, visibility and ground-station windows; actions choose targets and downlink decisions; rewards balance useful imagery delivery, health margins, and constraint satisfaction.

RL agent & training

Close-up: the agent opportunistically images, banks energy in eclipse, and downlinks during windows to preserve margins.

PPO trains over many randomized orbital seeds. Two repeating phases: rollouts in \(n_{\text{env}}\) parallel environments for \(n_b\) episodes each, then updates over the fresh on-policy data for \(n_{\text{opt}}\) epochs. Total steps \(T\) imply \(N_{\text{upd}} = T/(n_{\text{env}} n_b N)\) policy updates. Typical runs used a few hundred million steps on a single workstation. Exact settings per experiment are in Table 4 of the paper; full source is linked below.

Results: behaviors & plots

Battery, storage, downlink and reward over time (PDF). If your browser can’t display PDFs inline, open it in a new tab.
Azimuth/elevation pointing and ground-station windows (PDF). Open in a new tab.

Generalization: mixed LEO/MEO/GEO

With a target mix of ~50% LEO, 30% MEO, and 20% GEO, a policy trained on LEO still performs well. It re-uses the same resource-management reflexes—bank energy in eclipse, favor visible targets with short-term symmetry advantage, and schedule downlink bursts—so it continues to deliver timely imagery in the mixed environment.

Paper, talk & videos

Short visual summary from the AVS Lab YouTube channel.

Global perspective clip of the same run.

Photos

Presenting at AMOS
Stage photo: AMOS presentation.
Group photo with Dr. Hanspeter Schaub and Mark
With Dr. Hanspeter Schaub and Mark after the session.

Conclusion

RL-enabled scheduling learns practical habits: bank energy in eclipse, downlink in bursts when windows open, guard margins so opportunities aren’t missed, and generalize across orbital seeds and mixed-orbit target sets. This points toward scalable, onboard autonomy for future SSA missions.