Resource type
Thesis type
(Thesis) M.Sc.
Date created
2023-08-08
Authors/Contributors
Author: Tian, Parker Hao
Abstract
Processing-in-Memory (PIM) architectures have been extensively considered to reduce costly data transfers between processors and memory. Prior PIM work proposed using processing units in the logic layer of stacked memories, e.g., Hybrid Memory Cubes (HMC), to exploit the high bandwidth between memory and the logic layer. While such approaches can improve energy efficiency and performance, they incur significant overheads due to the need to transfer data from remote memory locations to processing units where computations are performed. In this thesis, we demonstrate that a large fraction of PIM's latency per memory request is attributed to data transfers and queuing delays from remote memory accesses. To improve PIM's data locality, we propose DL-PIM, a novel architecture that dynamically detects the overhead of data movement, and proactively moves data to a reserved area in the local memory of the requesting processing unit. DL-PIM uses a distributed address-indirection hardware lookup table to redirect traffic to the current data location. While some workloads benefit from this architecture, others are negatively impacted by the extra latency due to indirection accesses. DL-PIM uses an adaptive mechanism that considers the cost and benefit of indirection, and dynamically enables/disables it to avoid degrading workloads hurt by indirection. Overall, DL-PIM reduces average memory latency per request by 54% and improves performance by 15% for workloads that have non-trivial data reuse (6% speedup for all representative workloads).
Document
Extent
53 pages.
Identifier
etd22666
Copyright statement
Copyright is held by the author(s).
Supervisor or Senior Supervisor
Thesis advisor: Alameldeen, Alaa
Language
English
Member of collection
Download file | Size |
---|---|
etd22666.pdf | 3.58 MB |