Performance Characterizations and Improvements in Near Memory Processing Architectures

  • 30



Name of the Speaker: Mr. Shubhang Pandey ( EE19S057)
Name of the Guide: Dr. T G Venkatesh
Venue: ESB-244 (Seminar Hall)
Date/Time: September 30th (Friday), 11.30 AM to 12.30 PM

Recent advances in 3D fabrication have allowed the development of 3D memory over the logic die. The 3D memory presents itself as a viable solution to the memory wall problem. The 3D memory has stacked DRAM layers connected with Through Silicon Vias (TSVs), responsible for the very high bandwidth. Much effort has been made recently to improve the performance and power of the NMP architectures. Memory Centric Networks (MCNs) are advanced memory architectures that use NMP architectures. MCNs are multiple stacks of 3D memory units equipped with requirement-based processing cores, allowing numerous threads to execute concurrently.

The first part studies the performance of 3D memory that uses a packet-based communication protocol for communication between the CPU and off-chip memory. Our study provides insight into the internal flit traffic for different configurations of 3D memory when observed under the diverse memory access patterns and workload characteristics. We look at the performance characteristics of the 3D stacked memory under the variation of the number of banks and vaults in the structure. Further, the effect of varying the packet size and the number of communication links on off-chip link bandwidth and latency have been studied. We further examine different off-chip link power optimization strategies. Finally, we observe the impact of varying buffer sizes on the latency at the off-chip links buffer and the vault buffer of 3D memory.

The second part of our work presents a multi-armed bandit (MAB) based approach to formulating a resource allocation strategy for MCN to maximize the benefits of the high parallel execution. The performance of the NMP is crucially dependent upon the efficient management of the NMP resources. Better task offloading and task to NMP allocation will improve system performance. By modeling the application tasks, we consider the inter-task communications and the power density of each NMP in formulating the rewards for MAB. Most existing literature focuses mainly on one domain of applications. However, our solution is more generic and can be applied to diverse application domains. In our approach, we deploy the Upper Confidence Bound (UCB) policy to collect rewards and eventually use it for regret optimization. We study the following metrics- instructions per cycle (IPC), execution times, NMP core cache misses, packet latencies, and power consumption. Our study covers various applications from PARSEC and SPLASH2 benchmarks suite. The evaluation shows that the performance of the system improves by 11% on average and an average reduction in total power consumption by 12%. We also offer two case studies across benchmarks from diverse domains to examine our resource allocation strategy and visualize the performance over time.

The third part of our work proposes a stochastic optimization-based link power management (SOBLPM). Despite all the efforts in performance improvements of 3D stacked memory, high power consumption remains an issue of concern. It has been established that significant power is consumed in maintaining these off-chip links. However, little attention has been paid to optimizing the power consumed by the off-chip links. In this part of the work, we model the link priority strategies using the 2D Markov model and queuing theory and use the model for stochastic optimization strategy. The proposed approach selects the number of links to be maintained based on the rejection probability of memory requests. The prediction accuracy of our model for the memory access latency is around 2\% of the simulation data. The proposed stochastic optimization strategy provides approximately 20% more power efficiency against the existing off-chip link power management schemes without sacrificing the latency performance.

Our study provides more perspective into further developments of Data-Centric Computing architectures and insight into proper flit management strategies in future memory architectures.