报告一：Let GPUs Work Harder: Performance Analysis and Optimization for GPGPU Applications
报告简介：General-Purpose Graphics Processing Units (GPGPUs) are increasingly integral in modern computing environments, enhancing performance across various fields such as machine learning, high-performance computing, and autonomous driving. However, GPU-accelerated applications fail to reach their full potential, often hindered by common inefficiencies. Performance tools are crucial in diagnosing and rectifying these inefficiencies within complex code structures.
This talk will discuss our efforts to enhance the performance of GPGPU applications. It begins with an introduction to DrGPU, a sophisticated top-down profiler designed for GPU applications. Using hardware performance counters on GPUs, DrGPU assesses stall cycles, breaks them down into different causes, identifies primary issues, and offers clear optimization strategies. Following this, this talk introduces ValueExpert, a tool monitoring the execution of applications, tracking the values generated and utilized in each load and store operations within GPU kernels. It detects various value patterns and also suggests optimization techniques.
报告人简介：Yueming Hao is a Ph.D. candidate in the Departmet of Computer Science at North Carolina State University. He is advised by Prof. Xu Liu. Before that, he received his bachelor degree from Shandong University advised by Prof. Lei Ju. His main research interest is developing tools for program performance analysis and optimizations for GPGPU applications. He has publications on top-tier conferences ASPLOS, SC, CGO.
报告二：Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads
报告简介：Today, DNNs’ high computational complexity and sub-optimal device utilization present a major roadblock to democratizing DNNs. To reduce the execution time and improve device utilization, researchers have been proposing new system design solutions, which require performance models (especially GPU models) to help them with pre-product concept validation. Currently, researchers have been utilizing simulators to predict execution time, which provides high flexibility and acceptable accuracy, but at the cost of a long simulation time. Simulators are becoming increasingly impractical to model today’s large-scale systems and DNNs, urging us to find alternative lightweight solutions. To solve this problem, we propose using a data-driven method for modeling DNNs system performance. We first build a dataset that includes the execution time of numerous networks/layers/kernels. After identifying the relationships of directly known information (e.g., network structure, hardware theoretical computing capabilities), we discuss how to build a simple, yet accurate, performance model for DNNs execution time. Our observations on the dataset demonstrate prevalent linear relationships between the GPU kernel execution times, operation counts, and input/output parameters of DNNs layers. Guided by our observations, we develop a fast, linear- regression-based DNNs execution time predictor. Our evaluation using various image classification models suggests our method can predict new DNNs performance with a 7% error and new GPU performance with a 15.2% error. Our case studies also demonstrate how the performance model can facilitate future DNNs system research.
报告人简介：Ying Li is a Ph.D. candidate in the Department of Computer Science at William & Mary. She is advised by Dr. Adwait Jog and Dr. Yifan Sun. Her research lies in GPU architecture and machine learning. Previously, she received her Bachelor of Engineering in Computer Science and Technology from Shandong University, China, in 2018. She had a publication on MICRO 2023.