8th Workshop on
Accelerated Machine Learning (AccML)

Co-located with the HiPEAC 2026 Conference


In this 8th AccML workshop, we aim to bring together researchers working in Machine Learning and System Architecture to discuss requirements, opportunities, challenges and next steps in developing novel approaches for machine learning systems.

Find Out More

HiPEAC 2026 workshop

27th January, 2026

Kraków, Poland


In the last few years, the remarkable performance achieved in a variety of application areas (natural language processing, computer vision, games, etc.) has led to the emergence of heterogeneous architectures to accelerate machine learning workloads. In parallel, production deployment, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, software and system architectures, dedicated runtime systems and numerical libraries, deployment and analysis tools. Deep learning models are generally memory and computationally intensive, for both training and inference. Accelerating these operations has obvious advantages, first by reducing the energy consumption (e.g. in data centers), and secondly, making these models usable on smaller devices at the edge of the Internet. In addition, while Convolutional Neural Networks (CNNs) have motivated much of this effort, numerous applications and models (e.g., Vision Transformers, Large Language Models) involve a wider variety of operations, network architectures, and data processing. These applications and models permanently challenge computer architecture, the system stack, and programming abstractions. The high level of interest in these areas calls for a dedicated forum to discuss emerging acceleration techniques and computation paradigms for machine learning algorithms, as well as the applications of machine learning to the construction of such systems.

The workshop brings together researchers and practitioners working on computing systems for machine learning, and using machine learning to build better computing systems. It also reaches out to a wider community interested in this rapidly growing area, to raise awareness of the existing efforts, to foster collaboration and the free exchange of ideas.

This builds on the success of our previous events:

Call For Contributions


Topics

Topics of interest include (but are not limited to):

  • Novel ML/AI systems: heterogeneous multi/many-core systems, GPUs, ASICs and FPGAs;

  • Software ML/AI acceleration: languages, primitives, libraries, compilers and frameworks;

  • Novel ML/AI hardware accelerators and associated software;

  • Emerging semiconductor technologies with applications to ML/AI hardware acceleration;

  • ML/AI for the design and tuning of hardware, compilers, and systems;

  • Cloud and edge ML/AI computing: hardware and software to accelerate training and inference;

  • Hardware-Software co-design techniques for more efficient model training and inference (e.g. addressing sparsity, pruning, etc);

  • Training and deployment of huge LLMs (such as GPT, Llama), or large GNNs;

  • Computing systems research addressing the privacy and security of ML/AI-dominated systems;

  • Generative AI and their impact on computational resources;

Important Dates

Submission deadline: November 21 December 5, 2025
Notification to authors: December 5 December 17, 2025

Paper Format

Papers should be in double column IEEE format of between 4 and 8 pages including references. Papers should be uploaded as PDF and not anonymized.

Submission Site

Submissions can be made at https://easychair.org/my/conference?conf=8thaccml.

Submission Options

Papers will be reviewed by the workshop's technical program committee according to criteria regarding a submission's quality, relevance to the workshop's topics, and, foremost, its potential to spark discussions about directions, insights, and solutions on the topics mentioned above. Research papers, case studies, and position papers are all welcome.

In particular, we encourage authors to keep the following options in mind when preparing submissions:

  • Tentative Research Ideas: Presenting your research idea early one to get feedback and enable collaborations.

  • Works-In-Progress: To facilitate sharing of thought-provoking ideas and high-potential though preliminary research, authors are welcome to make submissions describing early-stage, in-progress, and/or exploratory work in order to elicit feedback, discover collaboration opportunities, and generally spark discussion.

Keynote Speaker


Cristina Silvano

Cristina Silvano

Full Professor at Politecnico di Milano

Title: Energy-Efficient Accelerators for AI on the Edge

Abstract
Hardware accelerators play a central role in enabling artificial intelligence (AI) workloads on high-performance computing (HPC) systems and in data centers, providing the computational capability required to process large datasets and train increasingly complex models. At the same time, the growing demand to deploy AI directly on edge devices—such as embedded systems, mobile platforms, and Internet-of-Things (IoT) devices—has shifted attention toward highly energy-efficient hardware solutions. This talk surveys hardware accelerators across the full computing spectrum, from large-scale HPC systems to resource-constrained edge platforms, and discusses their impact on accelerating AI workloads through reduced execution time and improved energy efficiency. Particular emphasis is placed on techniques for mapping AI kernels onto energy-efficient Neural Processing Units (NPUs), as well as architectural approaches for tightly integrating Digital In-Memory Computing (DIMC) modules within NPUs to further enhance performance and energy efficiency.

Bio
Cristina Silvano is a Full Professor of Computer Architecture at Politecnico di Milano, where she holds the Chair of the Research Area on Computer Science and Engineering. In 2022, she was the promoter of the M. Sc. degree in HPC Engineering at Politecnico di Milano, the first program of its kind in Italy. In 2022, she has been appointed as member of the National Technical Committee on Semiconductor Technologies appointed by the Italian Ministry of University and Research. Currently, she is the leader of the flagship project on Hardware Accelerators of the Italian National Research Center for High Performance Computing. She has been Scientific Coordinator of three European research projects (ANTAREX, 2PARMA and MULTICUBE). Her research activities are in the areas of computer architecture and EDA, with focus on design space exploration of energy-efficient accelerators for deep neural networks and application autotuning for HPC. She has published more than 200 peer-reviewed papers, six books, and some patents in collaboration with Group Bull and STMicroelectronics. Since 2017, she is an IEEE Fellow for her contributions to energy-efficient computer architectures.

Invited Speakers


Nicholas Fraser

Nicholas Fraser

AMD Research and Advanced Development (RAD), Dublin

Title: Achieving Highly Accurate, Heavily Quantized Neural Networks

Abstract
Quantization of neural networks (NNs) is a key aspect of achieving high-performance neural network deployments. However, given the diversity neural network topologies and applications, it's often not obvious how best to approach quantization for a given problem. In this talk, we try address this gap by decomposing the quantization process into 4 steps: 1) pre-quantization transformations, i.e., error mitigation before quantization; 2) the simulation of low-precision datatypes with neural network models; 3) post-quantization accuracy recovery, i.e., error recovery after quantization; and 4) model export for fast deployment. By decomposing the quantization process into these steps, we aim to equip model owners with the knowledge they need to understand what may apply to their own models. Finally, we explain how all-of-the-above steps can achieved through Brevitas, our PyTorch-based open-source neural network quantization library, designed for quantization research.

Bio
Nicholas J. Fraser received the PhD degree at The University of Sydney, Australia in 2020. Currently, he's a Research Scientist at AMD Research and Advanced Development (RAD), Dublin, Ireland, where he has conducted neural network quantization research for ~10 years and is project lead for the development of Brevitas, a PyTorch-based open-source neural network quantizer. His main research interests include: neural network quantization, software / hardware co-design of neural network topologies / accelerators, and audio signal processing..

Heiko Joerg Schick

Heiko Joerg Schick

Huawei Technologies

Title: TBA

Abstract

TBA

Bio
TBA

Jacques Pienaar

Jacques Pienaar

Google

Title: Breaking the Silos: How MLIR and CIRCT are Democratizing the AI Hardware Stack

Abstract

The traditional EDA landscape is characterized by semantic gaps between high-level algorithmic intent and physical implementation. As the industry pivots toward specialized AI accelerators, this disconnected workflow has become a critical bottleneck. Furthermore, hardware companies today face a "matrix of constraints": creating a competitive AI accelerator requires not just silicon, but a software stack that can keep pace with rapidly evolving research. Historically, this has resulted in fragmented, brittle toolchains that fail to unlock the hardware's potential. MLIR provides the escape velocity needed to break this cycle. This talk traces the evolution of MLIR from a software compiler framework to the backbone of an open hardware ecosystem. We will discuss the role of CIRCT in standardizing hardware description, the emergence of intermediate languages like Calyx and Allo for accelerator generation, and the importance of unified infrastructure for interoperability. By aligning on shared Intermediate Representations, the industry is moving from ad-hoc optimization to a converged methodology where software compilers and hardware generators speak the same language. The open development process, shared infrastructure, and interoperability through MLIR create an environment that matches hardware and software development velocities, enabling a new era for heterogeneous computing.

Bio
Jacques Pienaar serves as a Senior Staff Software Engineer at Google, where he has dedicated over ten years to architecting infrastructure for modern machine learning. In his current capacity as a lead within the ML Compilers and Systems Research team, Dr. Pienaar focuses on the acceleration and simplification of high-performance model deployment across distributed, heterogeneous compute platforms. As a co-founder of Multi-Level Intermediate Representation (MLIR) and a foundational engineer for XLA, Dr. Pienaar has contributed to the development of industry-standard technologies for optimizing machine learning workloads. His professional background includes leading the TensorFlow graph optimization compiler initiative and extensive experience in developing optimization targets for a diverse range of environments, including custom hardware, data centers, and commodity edge devices.

Program


Time 27th January 2026
10:00 – 10:10 Welcome
10:10 – 11:00 Keynote talk: Energy-Efficient Accelerators for Deep Learning at the Edge (Cristina Silvano, Politecnico di Milano)
11:00 – 11:30 Coffee break
11:30 – 12:10 Invited talk 1: Achieving Highly Accurate, Heavily Quantized Neural Networks (Nicholas Fraser, AMD)
12:10 – 12:25 Paper talk 1: Uncertainty-Preserving QBNNs: Multi-Level Quantization of SVI-Based Bayesian Neural Networks for Image Classification (presented by Hendrik Borras, Heidelberg University)
Hendrik Borras, Yong Wu, Bernhard Klein and Holger Fröning
12:25 – 12:40 Paper talk 2: Quantization-aware training: a tradeoff between training and fine-tuning for domain-specific language models (presented by Xavier Pillet, LS2N - Nantes Université)
Xavier Pillet, Cédric Gernigon, Anastasia Volkova, Richard Dufour, Adeline Granet and Nicolas Greffard
12:40 – 12:55 Paper talk 3: Reducing the Hardware Gap for Custom Accelerators through Quantization Aware Training (presented by Bastien Barbe, INSA Lyon)
Bastien Barbe, Romain Bouarah, Florent de Dinechin and Anastasia Volkova
13:00 – 14:00 Lunch break
14:00 – 14:40 Invited talk 2: TBA (Heiko Schick, Huawei Technologies)
14:40 – 14:55 Paper talk 4: In-Pipeline Integration of Digital In-Memory-Computing into RISC-V Vector Architecture to Accelerate Deep Learning (presented by Tommaso Spagnolo, Politecnico di Milano)
Tommaso Spagnolo, Cristina Silvano, Riccardo Massa, Filippo Grillotti, Thomas Boesch and Giuseppe Desoli
14:55 – 15:10 Paper talk 5: ML-Guided Conflict-Aware Scheduling for FPGA-Based Acceleration in the Cloud-Edge Continuum (presented by Juan Encinas, Universidad Politécnica de Madrid)
Juan Encinas, Alfonso Rodriguez and Andrés Otero
15:10 – 15:25 Paper talk 6: SlimSwin: Gradient-Based Window-Level Head Pruning for Efficient Vision Transformers (presented by Emir Mehmet Eryilmaz, Ozyegin University)
Emir Mehmet Eryilmaz and Ismail Akturk
15:30 – 16:00 Coffee break
16:00 – 16:40 Invited talk 3: Breaking the Silos: How MLIR and CIRCT are Democratizing the AI Hardware Stack (Jacques Pienaar, Google)
16:40 – 16:55 Paper talk 7: From PyTorch to Calyx: An Open-Source Compiler Toolchain for ML Accelerators (presented by Evan Williams, Cornell University)
Jiahan Xie, Evan Williams and Adrian Sampson
16:55 – 17:10 Paper talk 8: Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices (presented by Alejandro Ruiz Y Mesa, Dresden University of Technology)
Alejandro Ruiz Y Mesa, Guilherme Korol, Moritz Riesterer, João C. de Lima and Jeronimo Castrillon
17:10 – 17:25 Paper talk 9: Depth-First Fusion and Tiling for CNN Memory Footprint Reduction in TVM (presented by Arthur Viens, C.R.I. - Mines Paris - PSL and Safran Electronics and Defense)
Arthur Viens, Corinne Ancourt and Jean-Louis Dufour
17:25 – 17:30 Closing remarks

Organizers


José Cano (University of Glasgow)

José L. Abellán (University of Murcia)

Valentin Radu (University of Sheffield)

Marco Cornero (Google DeepMind)

Ulysse Beaugnon (Google DeepMind)

Juliana Franco (Google DeepMind)


Program Committee


José L. Abellán (University of Murcia)

Manuel E. Acacio (University of Murcia)

Sam Ainsworth (University of Edinburgh)

Ulysse Beaugnon (Google DeepMind)

José Cano (University of Glasgow)

Adrián Castelló (Universitat Politècnica de València)

Marco Cornero (Google DeepMind)

Sachchidanand Deo (Google)

Juliana Franco (Google DeepMind)

Jan Moritz Joseph (RWTH Aachen University)

Sushant Kondguli (Meta)

Paolo Meloni (University of Cagliari)

Ozcan Ozturk (Sabancı University)

Valentin Radu (University of Sheffield)

Yifan Sun (William & Mary)

Stylianos Venieris (Samsung AI Center, Cambridge)

Lei Xun (Imperial College London)

Contact


If you have any questions, please feel free to send an email to 8th-accml@easychair.org