Resource type
Thesis type
(Thesis) M.Sc.
Date created
2022-12-05
Authors/Contributors
Author: Chamani, Danoosh
Abstract
Reinforcement Learning (RL) is a promising approach for real-world applications of Multi-Agent Path Finding (MAPF). However, its success depends on a good reward function, which is difficult to design manually in this complex domain. PBRSS (Potential-Based Reward Shaping with Search), our MAPF planner, automatically generates potential functions to guide RL-based MAPF agents using potential-based reward shaping. It invokes the theoretical and empirical advantages of accelerated training and likely convergence to better policies. We first formulate an adapted version of the Partially Observable MAPF (PO-MAPF) problem to standardize the comparison of RL-based against search-based planners and cross-fertilize techniques between them. We develop Partially Observable Conflict-Based Search (PO-CBS) as a generalization of CBS in the PO-MAPF domain. We then design the potential functions required for reward shaping using the PO-CBS plans and single-agent shortest path computations. PBRSS can be applied to any RL-based MAPF planner to improve its generalizability and performance.
Document
Extent
35 pages.
Identifier
etd22275
Copyright statement
Copyright is held by the author(s).
Supervisor or Senior Supervisor
Thesis advisor: Ma, Hang
Language
English
Member of collection
Download file | Size |
---|---|
etd22275.pdf | 8.97 MB |