Overview & motivation
Space-based SSA is unlike Earth imaging. Targets move fast, access windows are brief, and power, pointing, and downlink are tightly constrained. The goal is a flight policy that decides which RSO to image and when to downlink, while keeping the spacecraft healthy for the next opportunity.
The main study focuses on **LEO → LEO imaging**, which is the hardest regime: large relative motion, long eclipse periods, and frequent geometry changes—even complete target eclipses.
Scenario & simulator
Setup clip: LEO chaser with power/thermal/storage constraints imaging catalogued RSOs.
Formulating SBSS as a POMDP
The task is posed as a POMDP: noisy observations of orbital geometry, battery, storage, visibility and ground-station windows; actions choose targets and downlink decisions; rewards balance useful imagery delivery, health margins, and constraint satisfaction.
RL agent & training
Close-up: the agent opportunistically images, banks energy in eclipse, and downlinks during windows to preserve margins.
PPO trains over many randomized orbital seeds. Two repeating phases: rollouts in \(n_{\text{env}}\) parallel environments for \(n_b\) episodes each, then updates over the fresh on-policy data for \(n_{\text{opt}}\) epochs. Total steps \(T\) imply \(N_{\text{upd}} = T/(n_{\text{env}} n_b N)\) policy updates. Typical runs used a few hundred million steps on a single workstation. Exact settings per experiment are in Table 4 of the paper; full source is linked below.
Results: behaviors & plots
- Energy/storage habits: bank during eclipse, spend during visibility, keep buffers healthy.
- Opportunistic imaging: favors targets with short-term geometry advantage instead of pre-planned rigid schedules.
- Timely delivery: the policy bursts downlink when windows appear, improving image freshness over a myopic heuristic.
Generalization: mixed LEO/MEO/GEO
With a target mix of ~50% LEO, 30% MEO, and 20% GEO, a policy trained on LEO still performs well. It re-uses the same resource-management reflexes—bank energy in eclipse, favor visible targets with short-term symmetry advantage, and schedule downlink bursts—so it continues to deliver timely imagery in the mixed environment.
Paper, talk & videos
Short visual summary from the AVS Lab YouTube channel.
Global perspective clip of the same run.
Photos
Conclusion
RL-enabled scheduling learns practical habits: bank energy in eclipse, downlink in bursts when windows open, guard margins so opportunities aren’t missed, and generalize across orbital seeds and mixed-orbit target sets. This points toward scalable, onboard autonomy for future SSA missions.