6th AccML Workshop at HiPEAC 2024

HiPEAC 2024 workshop

17th January, 2024

Munich, Germany

The remarkable performance achieved in a variety of application areas (natural language processing, computer vision, games, etc.) has led to the emergence of heterogeneous architectures to accelerate machine learning workloads. In parallel, production deployment, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, software and system architectures, dedicated runtime systems and numerical libraries, deployment and analysis tools. Deep learning models are generally memory and computationally intensive, for both training and inference. Accelerating these operations has obvious advantages, first by reducing the energy consumption (e.g. in data centers), and secondly, making these models usable on smaller devices at the edge of the Internet. In addition, while convolutional neural networks have motivated much of this effort, numerous applications and models involve a wider variety of operations, network architectures, and data processing. These applications and models permanently challenge computer architecture, the system stack, and programming abstractions. The high level of interest in these areas calls for a dedicated forum to discuss emerging acceleration techniques and computation paradigms for machine learning algorithms, as well as the applications of machine learning to the construction of such systems.

The workshop brings together researchers and practitioners working on computing systems for machine learning, and using machine learning to build better computing systems. It also reaches out to a wider community interested in this rapidly growing area, to raise awareness of the existing efforts, to foster collaboration and the free exchange of ideas.

This builds on the success of our previous events:

Call For Contributions

Topics

Topics of interest include (but are not limited to):

Novel ML systems: heterogeneous multi/many-core systems, GPUs and FPGAs;
Software ML acceleration: languages, primitives, libraries, compilers and frameworks;
Novel ML hardware accelerators and associated software;
Emerging semiconductor technologies with applications to ML hardware acceleration;
ML for the construction and tuning of systems;
Cloud and edge ML computing: hardware and software to accelerate training and inference;
Computing systems research addressing the privacy and security of ML-dominated systems;
ML techniques for more efficient model training and inference (e.g. sparsity, pruning, etc);
Generative AI and their impact on computational resources;

Important Dates

Submission deadline: ~~November 3~~ November 17, 2023
Notification to authors: ~~December 1~~ December 15, 2023

Paper Format

Papers should be in double column IEEE format of between 4 and 8 pages including references. Papers should be uploaded as PDF and not anonymized.

Submission Site

Submissions can be made at https://easychair.org/my/conference?conf=6thaccml.

Submission Options

Papers will be reviewed by the workshop's technical program committee according to criteria regarding a submission's quality, relevance to the workshop's topics, and, foremost, its potential to spark discussions about directions, insights, and solutions on the topics mentioned above. Research papers, case studies, and position papers are all welcome.

In particular, we encourage authors to keep the following options in mind when preparing submissions:

Tentative Research Ideas: Presenting your research idea early one to get feedback and enable collaborations.
Works-In-Progress: To facilitate sharing of thought-provoking ideas and high-potential though preliminary research, authors are welcome to make submissions describing early-stage, in-progress, and/or exploratory work in order to elicit feedback, discover collaboration opportunities, and generally spark discussion.

Invited Speakers

Giuseppe Desoli

PhD, ST Company Fellow

Title: Revolutionizing Edge AI: Enabling Ultra-low-power and High-performance Inference with In-memory Computing Embedded NPUs

Abstract
The increasing demand for Edge AI has led to the development of complex cognitive applications on edge devices, where energy efficiency and compute density are crucial. While HW Neural Processing Units (NPUs) have already shown considerable benefits, the growing need for more complex algorithms demands significant improvements. To address the limitations of traditional Von Neumann architectures, novel designs based on computational memories are being developed by industry and academia. In this talk, we present STMicroelectronics' future directions in designing NPUs that integrate digital and analog In-Memory Computing (IMC) technology with high-efficiency dataflow inference engines capable of accelerating a wide range of Deep Neural Networks (DNNs). Our approach combines SRAM computational memory and phase change resistive memories, and we discuss the architectural considerations and purpose-designed compiler mapping algorithms required for practical industrial applications and some challenges we foresee in harnessing the potential of In-memory Computing going forward.

Bio
Giuseppe Desoli holds EE engineering master and PhD degrees from the University of Genoa. From 1995 to 2002 worked for Hewlett-Packard Laboratories in CAmbridge MA USA, developing microprocessors architectures, compilers, and tools. He’s one of the original architects of the ST200 family of VLIW embedded processors later integrated into most of ST’s multimedia products. In 2002 joined STMicroelectronics as an R&D Director and lead architect continuing to work on microprocessor architectures and pioneering multiprocessor systems for embedded SoCs for set-top boxes and home gateways. Since 2012 he has been the Chief Architect for the System Research & Application central R&D group responsible for developing HW AI architectures and tools for edge applications presently integrated into multiple ST products. From 2015 he pioneered the development and deployment of HW-accelerated AI in STMicroelectronics for advanced deep learning based applications. Presently he leads the SRA AI architecture team developing advanced AI HW digital IPs and tools supporting ST’s product groups; he is one of the proponents and coordinator of the STM advanced R&D corporate project for neuromorphic computing and he’s contributing to multiple initiatives of the Innovation Office such as the STM’s technology council, he is the chairman of STM’s company fellows scientific committee reporting to the corporation, and coordinates the STM AI Affinity team. He has co-authored more than 70 scientific publications. He holds more than 40 patents in the field of microprocessor architectures, AI HW acceleration, algorithms, compilers, and tools and has been coordinating multiple funded EU research projects.

John Kim

KAIST

Title: Domain-Specific Networks for Accelerated Computing

Abstract
Domain-specific architectures are hardware computing engine that is specialized for a particular application domain. As domain-specific architectures become widely used, the interconnection network can become the bottleneck for the system as the system scales. In this talk, I will present the role of domain-specific interconnection networks to enable scalable domain-specific architectures. In particular, I will present the impact of the physical/logical topology of the interconnection network on communication such as AllReduce in domain-specific systems. I will also discuss the opportunity of domain-specific interconnection networks and how they can be leveraged to optimize overall system performance and efficiency. As a case study, I will present the unique design of the Groq software-managed scale-out system and how it adopts architectures from high-performance computing to enable a domain-specific interconnection network.

Bio
John Kim is currently a professor in the School of Electrical Engineering at KAIST (Korea Advanced Institute of Science and Technology) in Daejeon, Korea. John Kim received his Ph.D. from Stanford University and B.S/M.Eng from Cornell University. His research interests include computer architecture, interconnection networks, security, and mobile systems. He has received a Google Faculty Research Award, Microsoft-Asia New Faculty Fellowship, and is listed in the Hall of Fame for ISCA, MICRO, and HPCA. He has also worked on the design of several microprocessors at Intel and at Motorola.

Adam Paszke

Google

Title: Pallas: A Multi-Platform High-Productivity Language for Accelerator Kernels

Abstract
Compute accelerators are the workhorses of modern scientific computing and machine learning workloads. But, their ever increasing performance also comes at a cost of increasing micro-architectural complexity. Worse, it happens at a speed that makes it hard for both compilers and low-level kernel authors to keep up. At the same time, the increased complexity makes it even harder for a wider audience to author high-performance software, leaving them almost entirely reliant on high-level libraries and compilers. In this talk I plan to introduce Pallas: a domain specific language embedded in Python and built on top of JAX. Pallas is highly inspired by the recent development and success of the Triton language and compiler, and aims to present users with a high-productivity programming environment that is a minimal extension over native JAX. For example, kernels can be implemented using the familiar JAX-NumPy language, while a single line of code can be sufficient to interface the kernel with a larger JAX program. Uniquely, Pallas kernels support a subset of JAX program transformations, making it possible to derive a number of interesting operators from a single implementation. Finally, based on our experiments, Pallas can be leveraged for high-performance code generation not only for GPUs, but also for other accelerator architectures such as Google’s TPUs.

Bio
Adam Paszke is a Research Scientist at Google DeepMind, based in Berlin. His main focus is exploring programming language design and methods for applications in scientific computing (and machine learning) applications. He’s involved in design and implementation of multiple software libraries and languages, including PyTorch, JAX, Dex and most recently Pallas. Before joining Google, he’s collaborated with Facebook and graduated from the University of Warsaw.

Ayse Coskun

Boston University

Title: ML-Powered Diagnosis of Performance Anomalies in Computer Systems

Abstract
Today’s large-scale computer systems that serve high performance computing and cloud face challenges in delivering predictable performance, while maintaining efficiency, resilience, and security. Much of computer system management has traditionally relied on (manual) expert analysis and policies that rely on heuristics derived based on such analysis. This talk will discuss a new path on designing ML-powered “automated analytics” methods for large-scale computer systems and how to make strides towards a longer term vision where computing systems are able to self-manage and improve. Specifically, the talk will first cover how to systematically diagnose root causes of performance “anomalies”, which cause substantial efficiency losses and higher cost. Second, it will discuss how to identify applications running on computing systems and discuss how such discoveries can help reduce vulnerabilities and avoid unwanted applications. The talk will also highlight how to apply ML in a practical and scalable way to help understand complex systems, demonstrate methods to help standardize study of performance anomalies, discuss explainability of applied ML methods in the context of computer systems, and point out future directions in automating computer system management.

Bio
Prof. Ayse K. Coskun is a full professor at Boston University (BU) at the Electrical and Computer Engineering Department, where she leads the Performance and Energy Aware Computing Laboratory (PeacLab) to solve problems towards making computer systems more intelligent and energy-efficient. Coskun is also the Director of the Center for Information and Systems Engineering (CISE). Coskun’s research interests intersect design automation, computer systems, and architecture. Her research outcomes are culminated in several technical awards, including the NSF CAREER Award, the IEEE CEDA Ernest Kuh Early Career Award, and an IBM Faculty Award. Coskun has been an avid collaborator of industry (including with IBM TJ Watson, Oracle, AMD, Intel, and others) and received several patents during her time at Sun Microsystems (now Oracle). Her research team has released several impactful open-source software artifacts and tools to the community. Coskun has also regularly participated in outreach programs at BU and founded a new forum called “Advancing Diversity in EDA” (DivEDA). She currently serves as the Deputy Editor-in-Chief of the IEEE Transactions on Computer Aided Design. Coskun received her PhD degree in Computer Engineering from University of California San Diego.

Program

Time (CET)	17th January 2024
10:00 – 10:10	Welcome
10:10 – 11:00	Keynote talk 1: Revolutionizing Edge AI: Enabling Ultra-low-power and High-performance Inference with In-memory Computing Embedded NPUs (Giuseppe Desoli, STMicroelectronics)
11:00 – 11:30	HiPEAC Coffee break
11:30 – 12:15	Invited talk 1: Pallas: A Multi-Platform High-Productivity Language for Accelerator Kernels (Adam Paszke, Google)
12:15 – 12:30	Paper talk 1: Fully Quantized Graph Convolutional Networks for Embedded Applications Habib Taha Kose, Jose Nunez-Yanez, Robert Piechocki and James Pope
12:30 – 12:45	Paper talk 2: Compressing the Backward Pass of Large-Scale Neural Architectures by Structured Activation Pruning Daniel Barley and Holger Fröning
12:45 – 13:0	Paper talk 3: Efficient and Mathematically Robust Operations for Certified Neural Networks Inference Fabien Geyer, Johannes Freitag, Tobias Schulz and Sascha Uhrig
13:00 – 14:00	HiPEAC Buffet lunch
14:00 – 14:50	Keynote talk 2: Domain-Specific Networks for Accelerated Computing (John Kim, KAIST)
14:50 – 15:05	Paper talk 4: FPGA-based Hardware Acceleration of Artificial Neural Network Inference Stefan Rothhaupt, Wolfgang Meyerle and Benjamin Kormann
15:05 – 15:20	Paper talk 5: An Approach Towards Distributed DNN Training on FPGA Clusters Philipp Kreowsky, Justin Knapheide and Benno Stabernack
15:30 – 16:00	HiPEAC Coffee break
16:00 – 16:45	Invited talk 2: ML-Powered Diagnosis of Performance Anomalies in Computer Systems (Ayse Coskun, Boston University)
16:45 – 17:00	Paper talk 6: Hyperdimensional Computing quantization using Thermometer Codes Caio Vieira, Jeronimo Castrillon and Antonio Carlos Schneider Beck
17:00 – 17:15	Paper talk 7: Applying maximum entropy principle on quantized neural networks correlates with high accuracy Lucas Maisonnave, Cyril Moineau, Olivier Bichler and Fabrice Rastello
17:15 – 17:30	Paper talk 8: Centered Kernel Alignment for Efficient Vision Transformer Quantization José Lucas De Melo Costa, Cyril Moineau, Thibault Allenet and Inna Kucher
17:30 – 17:35	Closing remarks

Accepted papers

An Approach Towards Distributed DNN Training on FPGA Clusters (Philipp Kreowsky, Justin Knapheide and Benno Stabernack)

Applying maximum entropy principle on quantized neural networks correlates with high accuracy (Lucas Maisonnave, Cyril Moineau, Olivier Bichler and Fabrice Rastello)

Centered Kernel Alignment for Efficient Vision Transformer Quantization (José Lucas De Melo Costa, Cyril Moineau, Thibault Allenet and Inna Kucher)

Compressing the Backward Pass of Large-Scale Neural Architectures by Structured Activation Pruning (Daniel Barley and Holger Fröning)

Differential Privacy Preservation for Graph-based Federated Learning under Malware Attacks (Mohamed Amjath and Shagufta Henna) Online presentation

Efficient and Mathematically Robust Operations for Certified Neural Networks Inference (Fabien Geyer, Johannes Freitag, Tobias Schulz and Sascha Uhrig)

FPGA-based Hardware Acceleration of Artificial Neural Network Inference (Stefan Rothhaupt, Wolfgang Meyerle and Benjamin Kormann)

Fully Quantized Graph Convolutional Networks for Embedded Applications (Habib Taha Kose, Jose Nunez-Yanez, Robert Piechocki and James Pope)

Hyperdimensional Computing quantization using Thermometer Codes (Caio Vieira, Jeronimo Castrillon and Antonio Carlos Schneider Beck)

Zero-Shot RTL Code Generation with Attention Sink Augmented Large Language Models (Selim Sandal and Ismail Akturkv ) Online presentation

HiPEAC 2024 workshop

17th January, 2024

Munich, Germany

Call For Contributions

Topics

Important Dates

Paper Format

Submission Site

Submission Options

Invited Speakers

Giuseppe Desoli

John Kim

Adam Paszke

Ayse Coskun

Program

Accepted papers

Organizers

José Cano (University of Glasgow)

Valentin Radu (University of Sheffield)

José L. Abellán (University of Murcia)

Marco Cornero (Google DeepMind)

Ulysse Beaugnon (Google DeepMind)

Juliana Franco (Google DeepMind)

Program Committee

José L. Abellán (University of Murcia)

Manuel E. Acacio (University of Murcia)

Sam Ainsworth (University of Edinburgh)

Ulysse Beaugnon (Google DeepMind)

José Cano (University of Glasgow)

Marco Cornero (Google DeepMind)

Juliana Franco (Google DeepMind)

David Gregg (Trinity College Dublin)

Jan Moritz Joseph (RWTH Aachen University)

Sushant Kondguli (Meta)

Paolo Meloni (University of Cagliari)

Ozcan Ozturk (Bilkent University)

Valentin Radu (University of Sheffield)

Yifan Sun (William & Mary)

Nicolas Vasilache (Google)

Stylianos Venieris (Samsung AI)

Zheng Wang (University of Leeds)

Oleksandr Zinenko (Google)

Contact