Reinforcement Learning (RL) is a promising approach for real-world applications of Multi-Agent Path Finding (MAPF). However, its success depends on a good reward function, which is difficult to design manually in this complex domain. PBRSS (Potential-Based Reward Shaping with Search), our MAPF planner, automatically generates potential functions to guide RL-based MAPF agents using potential-based reward shaping. It invokes the theoretical and empirical advantages of accelerated training and likely convergence to better policies. We first formulate an adapted version of the Partially Observable MAPF (PO-MAPF) problem to standardize the comparison of RL-based against search-based planners and cross-fertilize techniques between them. We develop Partially Observable Conflict-Based Search (PO-CBS) as a generalization of CBS in the PO-MAPF domain. We then design the potential functions required for reward shaping using the PO-CBS plans and single-agent shortest path computations. PBRSS can be applied to any RL-based MAPF planner to improve its generalizability and performance.
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Ma, Hang
Member of collection