Chamani, Danoosh

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2022-12-05

Authors/Contributors

Author: Chamani, Danoosh

Abstract

Reinforcement Learning (RL) is a promising approach for real-world applications of Multi-Agent Path Finding (MAPF). However, its success depends on a good reward function, which is difficult to design manually in this complex domain. PBRSS (Potential-Based Reward Shaping with Search), our MAPF planner, automatically generates potential functions to guide RL-based MAPF agents using potential-based reward shaping. It invokes the theoretical and empirical advantages of accelerated training and likely convergence to better policies. We first formulate an adapted version of the Partially Observable MAPF (PO-MAPF) problem to standardize the comparison of RL-based against search-based planners and cross-fertilize techniques between them. We develop Partially Observable Conflict-Based Search (PO-CBS) as a generalization of CBS in the PO-MAPF domain. We then design the potential functions required for reward shaping using the PO-CBS plans and single-agent shortest path computations. PBRSS can be applied to any RL-based MAPF planner to improve its generalizability and performance.

Extent

35 pages.

Keywords

Identifier

etd22275

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Ma, Hang

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd22275.pdf	8.97 MB

Learning with search-based guidance for Partially Observable Multi-Agent Path Finding via Potential-Based Reward Shaping

Keywords

Views & downloads - as of June 2023