The remarkable performance achieved in a variety of application areas (natural language processing, computer vision, games, etc.) has led to the emergence of heterogeneous architectures to accelerate machine learning workloads. In parallel, production deployment, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, software and system architectures, dedicated runtime systems and numerical libraries, deployment and analysis tools. Deep learning models are generally memory and computationally intensive, for both training and inference. Accelerating these operations has obvious advantages, first by reducing the energy consumption (e.g. in data centers), and secondly, making these models usable on smaller devices at the edge of the Internet. In addition, while convolutional neural networks have motivated much of this effort, numerous applications and models involve a wider variety of operations, network architectures, and data processing. These applications and models permanently challenge computer architecture, the system stack, and programming abstractions. The high level of interest in these areas calls for a dedicated forum to discuss emerging acceleration techniques and computation paradigms for machine learning algorithms, as well as the applications of machine learning to the construction of such systems.
The workshop brings together researchers and practitioners working on computing systems for machine learning, and using machine learning to build better computing systems. It also reaches out to a wider community interested in this rapidly growing area, to raise awareness of the existing efforts, to foster collaboration and the free exchange of ideas.
This builds on the success of our previous events:
Topics of interest include (but are not limited to):
Novel ML systems: heterogeneous multi/many-core systems, GPUs and FPGAs;
Software ML acceleration: languages, primitives, libraries, compilers and frameworks;
Novel ML hardware accelerators and associated software;
Emerging semiconductor technologies with applications to ML hardware acceleration;
ML for the construction and tuning of systems;
Cloud and edge ML computing: hardware and software to accelerate training and inference;
Computing systems research addressing the privacy and security of ML-dominated systems;
November 30 December 7, 2022
Notification to authors:
December 15 December 17, 2022
Papers should be in double column IEEE format of between 4 and 8 pages including references. Papers should be uploaded as PDF and not anonymized.
Submissions can be made at https://easychair.org/my/conference?conf=5thaccml.
Papers will be reviewed by the workshop's technical program committee according to criteria regarding a submission's quality, relevance to the workshop's topics, and, foremost, its potential to spark discussions about directions, insights, and solutions on the topics mentioned above. Research papers, case studies, and position papers are all welcome.
In particular, we encourage authors to keep the following options in mind when preparing submissions:
Tentative Research Ideas: Presenting your research idea early one to get feedback and enable collaborations.
Works-In-Progress: To facilitate sharing of thought-provoking ideas and high-potential though preliminary research, authors are welcome to make submissions describing early-stage, in-progress, and/or exploratory work in order to elicit feedback, discover collaboration opportunities, and generally spark discussion.
Title: Memory-Centric Computing
Computing is bottlenecked by data. Large amounts of application data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. As a result, many key applications' performance, efficiency, and scalability are bottlenecked by data movement. In this lecture, we describe three major shortcomings of modern architectures in terms of 1) dealing with data, 2) taking advantage of the vast amounts of data, and 3) exploiting different semantic properties of application data. We argue that an intelligent architecture should be designed to handle data well. We show that handling data well requires designing architectures based on three key principles: 1) data-centric, 2) data-driven, 3) data-aware. We give several examples for how to exploit each of these principles to design a much more efficient and high performance computing system. We especially discuss recent research that aims to fundamentally reduce memory latency and energy, and practically enable computation close to data, with at least two promising novels directions: 1) processing using memory, which exploits analog operational properties of memory chips to perform massively-parallel operations in memory, with low-cost changes, 2) processing near memory, which integrates sophisticated additional processing capability in memory controllers, the logic layer of 3D-stacked memory technologies, or memory chips to enable high memory bandwidth and low memory latency to near-memory logic. We show both types of architectures can enable orders of magnitude improvements in performance and energy consumption of many important workloads, such as graph analytics, database systems, machine learning, video processing. We discuss how to enable adoption of such fundamentally more intelligent architectures, which we believe are key to efficiency, performance, and sustainability. We conclude with some guiding principles for future computing architecture and system designs.
Onur Mutlu is a Professor of Computer Science at ETH Zurich. He is also a faculty member at Carnegie Mellon University, where he previously held the Strecker Early Career Professorship. His current broader research interests are in computer architecture, systems, hardware security, and bioinformatics. A variety of techniques he, along with his group and collaborators, has invented over the years have influenced industry and have been employed in commercial microprocessors and memory/storage systems. He obtained his PhD and MS in ECE from the University of Texas at Austin and BS degrees in Computer Engineering and Psychology from the University of Michigan, Ann Arbor. He started the Computer Architecture Group at Microsoft Research (2006-2009), and held various product and research positions at Intel Corporation, Advanced Micro Devices, VMware, and Google. He received the Google Security and Privacy Research Award, Intel Outstanding Researcher Award, IEEE High Performance Computer Architecture Test of Time Award, NVMW Persistent Impact Prize, the IEEE Computer Society Edward J. McCluskey Technical Achievement Award, ACM SIGARCH Maurice Wilkes Award, the inaugural IEEE Computer Society Young Computer Architect Award, the inaugural Intel Early Career Faculty Award, US National Science Foundation CAREER Award, Carnegie Mellon University Ladd Research Award, faculty partnership awards from various companies, and a healthy number of best paper or "Top Pick" paper recognitions at various computer systems, architecture, and security venues. He is an ACM Fellow "for contributions to computer architecture research, especially in memory systems", IEEE Fellow for "contributions to computer architecture research and practice", and an elected member of the Academy of Europe (Academia Europaea). His computer architecture and digital logic design course lectures and materials are freely available on YouTube (https://www.youtube.com/OnurMutluLectures ), and his research group makes a wide variety of software and hardware artifacts freely available online (https://safari.ethz.ch/). For more information, please see his webpage at https://people.inf.ethz.ch/omutlu/.
Universitat Politècnica de València
Title: Convolutional Neural Networks: One Matrix Product to Rule them All!
The convolution is a key operator for the type of deep neural networks that dominate machine learning algorithms for signal processing, including computer vision tasks. For this reason, during the last years there has been an intensive effort to carefully exploit the architecture of modern processors in order to produce efficient realizations of this operator, based for example on the classical algorithm for the direct convolution or the lowering approach. In this talk we will argue that the direct convolution is nothing but a matrix multiplication (GEMM) in disguise; and the lowering approach offers a continuum of blocking opportunities that blur the differences between it and the direct convolution. Furthermore, we will demonstrate that the lowering approach yields a type of GEMM operations that are not efficiently supported by current high performance libraries. Fortunately, this problem can be tackled by relying in a special component of GEMM, known as the micro-kernel, which can be easily specialized using a high-level programming language plus vector intrinsics provided there is a good support from a backend compiler. This solution is portable, and we will demonstrate that delivers high performance for an ample variety of architectures, from commodity processors (NVIDIA Carmel, ARM A57, ARM A78) and high performance architectures (Intel Xeon, AMD EPYC, Fujitsu A64FX) to low-power systems (Arduino M4F, GreenWaves GAP8) and fancier accelerators (Xilinx AIE, RISC-V+EPI).
Enrique S. Quintana-Orti received his bachelor and Ph.D. degrees in computer sciences from the Universitat Politecnica de Valencia (UPV), Spain, in 1992 and 1996, respectively. After 20+ years at the Universitat Jaume I of Castellon, Spain, he came back to UPV in 2019, where he is now Professor in Computer Architecture. For his research, he received the NVIDIA 2008 Professor Partneship Award and two awards from the USA National Space Agency (NASA). He has published 400+ articles in journals and internacional conferences. Currently he participates in the EU projects APROPOS, RED-SEA, eFLOWS4HPC and Nimble AI. His current research interests include parallel programming, linear algebra, energy consumption, transprecision computing and deep learning as well as advanced architectures and hardware accelerators.
Samsung AI, Cambridge, UK
Title: Revising AI Computing at the Consumer Edge: New Challenges and Systems Considerations
In the last few years, the rapid progress of deep learning and deep neural networks (DNNs) has enabled the embedding of intelligence across consumer devices, be it voice assistants, smart cameras, or home robots. Nonetheless, recent trends strongly indicate that the next decade of consumer intelligence will require unprecedented levels of computational resources in order to cope with the demands of the new AI use-cases. In this talk, we argue for a paradigm shift towards the next generation of Consumer Edge-AI Computing. We'll start by discussing the new computational challenges of next-generation AI systems. Next, we'll introduce the notion of among-device intelligence, where multiple devices collaborate with each other through the fluid sharing of both context information and computational resources. Finally, we'll discuss how novel components, such as adaptive neural models, multi-DNN accelerators and fluid batching schemes, can be the key towards bringing performant and efficient intelligence to the consumer edge.
Stylianos I. Venieris is currently a Senior Research Scientist at Samsung AI, Cambridge, UK, where he leads the Distributed AI group. He received his PhD in Reconfigurable Computing and Deep Learning from Imperial College London in 2018 and his MEng in EEE from Imperial College London in 2014. His research interests include principled methodologies for the mapping of deep learning algorithms on distributed and mobile platforms, the design of novel end-to-end deep learning systems that robustly meet multi-objective performance requirements, and the design of next-generation hardware accelerators for the high-performance, energy-efficient deployment of deep neural networks.
|Time (CET)||18th January 2023|
|10:00 – 10:10||Welcome|
|10:10 – 11:00||Keynote talk: Memory-Centric Computing (Onur Mutlu, ETH Zürich)|
|11:00 – 11:30||HiPEAC Coffee break|
|11:30 – 12:10||Invited talk 1: Convolutional Neural Networks: One Matrix Product to Rule them All! (Enrique S. Quintana-Orti, Universitat Politècnica de València)|
|12:10 – 12:35||Paper talk: Parallel and Vectorised Winograd Convolutions for Multi-core Processors
Manuel F. Dolz, Héctor Martínez, Adrián Castelló, Pedro Alonso-Jordá and Enrique S. Quintana-Orti.
|12:35 – 13:00||Paper talk: Evaluating Machine Learning Workloads on Memory-Centric Computing Systems
Juan Gomez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh and Onur Mutlu
|13:00 – 14:00||HiPEAC Buffet lunch|
|14:00 – 14:40||Invited talk 2: Revising AI Computing at the Consumer Edge: New Challenges and Systems Considerations (Stylianos Venieris, Samsung AI)|
|14:40 – 15:05||Paper talk: Walking Noise: Understanding Implications of Noisy Computations on Classification Tasks
Hendrik Borras, Bernhard Klein and Holger Fröning
|15:05 – 15:30||Paper talk: Lightweight Address Translation Support for Convolutional Accelerators
Mirco Mannino, Biagio Peccerillo, Andrea Mondelli and Sandro Bartolini
|15:30 – 16:00||HiPEAC Coffee break|
|16:00 – 16:25||Paper talk: HEP-BNN: A Framework for Finding Low-Latency Execution Configurations of BNNs on Heterogeneous Multiprocessor Platforms
Leonard David Bereholschi, Ching-Chi Lin, Mikail Yayla and Jian-Jia Chen
|16:25 – 16:50||Paper talk: Accelerating Sparse Matrix-Matrix Multiplication with the Ascend AI Core
|16:50 – 17:15||Paper talk: Graph neural network hardware acceleration in Pytorch with streaming PYNQ overlays
|17:15 – 17:30||Short invited talk: Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation
Perry Gibson, Jose Cano
|17:30 – 17:35||Closing remarks|