7th AccML Workshop at HiPEAC 2025

HiPEAC 2025 workshop

21st January, 2025

Barcelona, Spain

The remarkable performance achieved in a variety of application areas (natural language processing, computer vision, games, etc.) has led to the emergence of heterogeneous architectures to accelerate machine learning workloads. In parallel, production deployment, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, software and system architectures, dedicated runtime systems and numerical libraries, deployment and analysis tools. Deep learning models are generally memory and computationally intensive, for both training and inference. Accelerating these operations has obvious advantages, first by reducing the energy consumption (e.g. in data centers), and secondly, making these models usable on smaller devices at the edge of the Internet. In addition, while convolutional neural networks have motivated much of this effort, numerous applications and models involve a wider variety of operations, network architectures, and data processing. These applications and models permanently challenge computer architecture, the system stack, and programming abstractions. The high level of interest in these areas calls for a dedicated forum to discuss emerging acceleration techniques and computation paradigms for machine learning algorithms, as well as the applications of machine learning to the construction of such systems.

The workshop brings together researchers and practitioners working on computing systems for machine learning, and using machine learning to build better computing systems. It also reaches out to a wider community interested in this rapidly growing area, to raise awareness of the existing efforts, to foster collaboration and the free exchange of ideas.

This builds on the success of our previous events:

Call For Contributions

Topics

Topics of interest include (but are not limited to):

Novel ML systems: heterogeneous multi/many-core systems, GPUs and FPGAs;
Software ML acceleration: languages, primitives, libraries, compilers and frameworks;
Novel ML hardware accelerators and associated software;
Emerging semiconductor technologies with applications to ML hardware acceleration;
ML for the construction and tuning of systems;
Cloud and edge ML computing: hardware and software to accelerate training and inference;
Computing systems research addressing the privacy and security of ML-dominated systems;
ML techniques for more efficient model training and inference (e.g. sparsity, pruning, etc);
Generative AI and their impact on computational resources;

Important Dates

Submission deadline: ~~November 4~~ November 18, 2024
Notification to authors: ~~December 2~~ December 16, 2024

Paper Format

Papers should be in double column IEEE format of between 4 and 8 pages including references. Papers should be uploaded as PDF and not anonymized.

Submission Site

Submissions can be made at https://easychair.org/my/conference?conf=7thaccml.

Submission Options

Papers will be reviewed by the workshop's technical program committee according to criteria regarding a submission's quality, relevance to the workshop's topics, and, foremost, its potential to spark discussions about directions, insights, and solutions on the topics mentioned above. Research papers, case studies, and position papers are all welcome.

In particular, we encourage authors to keep the following options in mind when preparing submissions:

Tentative Research Ideas: Presenting your research idea early one to get feedback and enable collaborations.
Works-In-Progress: To facilitate sharing of thought-provoking ideas and high-potential though preliminary research, authors are welcome to make submissions describing early-stage, in-progress, and/or exploratory work in order to elicit feedback, discover collaboration opportunities, and generally spark discussion.

Keynote Speaker

Alex Ramirez

Senior Staff Engineer at Google

Title: ML training scalability challenges

Bio
I am currently Senior Staff Engineer at Google, where I design, develop, and deliver high performance accelerator systems like the YouTube Video (trans)Coding Unit (VCU), or the ML infrastructure powering Gemini. Before that, I was Principal Research Scientist at NVIDIA in the Architecture group, associate professor at UPC, and Research Manager at the Barcelona Supercomputing Center. I have a BsC ('95), MsC ('97) and PhD ('02, awarded the UPC extraordinary award to the best PhD in computer science) in Computer Science from the Universitat Politecnica de Catalunya, Barcelona, Spain. I have been a summer student intern with Compaq's Western Research Laboratory in Palo Alto, California for two consecutive years ('99-'00), and with Intel's Microprocessor Research Laboratory in Santa Clara ('01). I was awarded the first edition of the Agustin de Betancourt Award to a Young Researcher by the Spanish Royal Academy of Engineering in 2010. I have co-authored more than 150 papers in international refereed conferences and journals, and supervised 10 PhD students. My research interests include energy efficient supercomputing, heterogeneous multicore architectures, hardware support for programming models, and simulation techniques.

Invited Speakers

Juan Gomez Luna

Senior Research Scientist at NVIDIA

Title: Accelerating Access to Data Storage and Services in the Age of AI

Abstract
Access to data storage and data services for GPUs has traditionally relied on the host CPU. While this approach might still be efficient for regular workloads and datasets that can be evenly partitioned, there are emerging applications (e.g., graph and data analytics, graph neural networks, recommender systems) that have a more irregular and data-dependent behavior. With the traditional approach, these workloads suffer from CPU-GPU synchronization, I/O traffic amplification, and long CPU latencies. In this talk, we will introduce GPU-initiated access to storage and services, which can efficiently support these emerging applications.

Bio
Juan Gómez Luna is a senior research scientist at NVIDIA since 2023. He received the BS and MS degrees in Telecommunication Engineering from the University of Sevilla, Spain, in 2001, and the PhD degree in Computer Science from the University of Córdoba, Spain, in 2012. Between 2005 and 2017, he was a faculty member of the University of Córdoba. Between 2017 and 2023, he worked as a senior researcher and lecturer at professor Onur Mutlu's SAFARI Research Group at ETH Zürich. His research interests focus on GPU and heterogeneous computing, memory and storage systems, processing-in-memory, and hardware and software acceleration of medical imaging and bioinformatics. He is the lead author of PrIM, the first publicly-available benchmark suite for a real-world processing-in-memory architecture, and Chai, a benchmark suite for heterogeneous systems with CPU/GPU/FPGA.

Rosa M. Badia

Research Manager Barcelona Supercomputing Center

Title: Managing and optimizing the life-cycle of AI-HPC workflows

Abstract
While the TOP500 yearly reflects the increase in the size of the HPC systems, the computing paradigm is also extended to consider a continuum edge-to-cloud/HPC. Furthermore, the user community is aware of the underlying performance and eager to leverage it by providing more complex application workflows. The trend in such applications is to combine traditional HPC modelling and simulation with data analytics and artificial intelligence. Our recent research has focused on making easier the management and optimization of the whole life-cycle of these workflows, from development to deployment and operation. Our work is based on PyCOMPSs, a parallel task-based programming environment in Python. Based on simple annotations, it can execute sequential Python programs in parallel in HPC clusters and other distributed infrastructures. PyCOMPSs has been extended to support tasks that invoke HPC applications and combine them with Artificial Intelligence and Data analytics frameworks. The talk will present this recent research and development, including examples of application workflows and how their performance or accuracy of results has been improved.

Bio
Rosa M. Badia holds a PhD in Computer Science (1994) from the Technical University of Catalonia (UPC). She is the manager of the Workflows and Distributed Computing research group at the Barcelona Supercomputing Center (BSC, Spain). Her research has contributed to parallel programming models for multicore and distributed computing. Recent contributions have been focused in the area of the digital continuum, proposing new programming environments and software environment for edge-to-cloud/HPC. The research is integrated in PyCOMPSs/COMPSs, a parallel task-based programming distributed computing framework, and its application to developing large heterogeneous workflows that combine HPC, Big Data, and Machine Learning. The group is also doing research around the dislib, a parallel machine learning library parallelized with PyCOMPSs. Dr Badia has published nearly 200 papers on her research topics in international conferences and journals. She has been very active in projects funded by the European Commission and in contracts with industry. She has been the PI of the EuroHPC project eFlows4HPC. She is a member of HiPEAC Network of Excellence. She received the Euro-Par Achievement Award 2019 for her contributions to parallel processing, the DonaTIC award, category Academia/Researcher in 2019 and the HPDC Achievement Award 2021 for her innovations in parallel task-based programming models, workflow applications and systems, and leadership in the high performance computing research community. In 2023, she has been invited to be a member of the Institut d'Estudis Catalans (Catalan academy).

Christos Bouganis

Professor of Intelligent Digital Systems, Imperial College London

Title: Deep Neural Networks in the Embedded Space: Opportunities and Challenges

Abstract
The talk will address the challenging task of designing FPGA-based hardware accelerators for Convolutional Neural Networks (CNNs), with a particular focus on the embedded space. It will highlight the opportunities offered by reconfigurable computing, specifically the advantages of design customization, and delve into the challenges faced when mapping large CNN models onto embedded FPGA devices.
I will share my research team’s efforts in tackling these challenges and provide an in- depth overview of our fpgaConvNet toolchain. This toolchain addresses key obstacles in the design process, enabling the generation of high-performance CNN accelerators that achieve state-of-the-art results.

Bio
Christos-Savvas Bouganis is a Professor of Intelligent Digital Systems in the Department of Electrical and Electronic Engineering, Imperial College London, U.K. He is leading the iDSL group at Imperial College, with a focus on the theory and practice of reconfigurable computing and design automation, mainly targeting the domains of Machine Learning, Computer Vision, and Robotics.

Program

Time	21st January 2025
10:00 – 10:10	Welcome
10:10 – 11:00	Keynote talk: ML training scalability challenges (Álex Ramírez, Google)
11:00 – 11:30	Coffee break
11:30 – 12:10	Invited talk 1: Accelerating Access to Data Storage and Services in the Age of AI (Juan Gómez Luna, NVIDIA)
12:10 – 12:35	Paper talk: On Hardening DNNs against Noisy Computations by Quantization (presented by Xiao Wang, Heidelberg University) Xiao Wang, Hendrik Borras, Bernhard Klein and Holger Fröning
12:35 – 13:00	Paper talk: Large-Scale Evolutionary Optimization of Artificial Neural Networks Using Adaptive Mutations (presented by Rune Krauss, DFKI) Rune Krauss, Jan Zielasko and Rolf Drechsler
13:00 – 14:00	Lunch break
14:00 – 14:40	Invited talk 2: Managing and optimizing the life-cycle of AI-HPC workflows (Rosa M. Badia, Barcelona Supercomputing Center)
14:40 – 15:05	Paper talk: Exploring the Potential of Wireless-enabled Multi-Chip AI Accelerators (presented by Emmanuel Irabor, Universitat Politecnica de Catalunya) Emmanuel Irabor, Mariam Musavi, Abhijit Das and Sergi Abadal
15:05 – 15:30	Paper talk: Optimized XGBoost Architecture on FPGA for High-Performance and Seamless Deployment (presented by Rodrigo Olmos, Universidad Politecnica de Madrid) Rodrigo Olmos and Andres Otero
15:30 – 16:00	Coffee break
16:00 – 16:40	Invited talk 3: Deep Neural Networks in the Embedded Space: Opportunities and Challenges (Christos Bouganis, Imperial College London)
16:40 – 17:05	Paper talk: Multiclass Random Forest on FPGA using Conifer (presented by Kévin Druart, UBO/Lab-Sticc) Kévin Druart, David Espes, Catherine Dezan and Alain Deturche
17:05 – 17:30	Paper talk: Exploring the applicability of Graph Attention Networks in computer vision and their hardware acceleration (presented by Abdolvahab Khalili, Linköping University) Abdolvahab Khalili Sadaghiani and Jose Nunez-Yanez
17:30 – 17:35	Closing remarks

7th Workshop on
Accelerated Machine Learning (AccML)

Co-located with the HiPEAC 2025 Conference

HiPEAC 2025 workshop

21st January, 2025

Barcelona, Spain

Call For Contributions

Topics

Important Dates

Paper Format

Submission Site

Submission Options

Keynote Speaker

Alex Ramirez

Invited Speakers

Juan Gomez Luna

Rosa M. Badia

Christos Bouganis

Program

Organizers

José Cano (University of Glasgow)

Valentin Radu (University of Sheffield)

José L. Abellán (University of Murcia)

Marco Cornero (Google DeepMind)

Ulysse Beaugnon (Google DeepMind)

Juliana Franco (Google DeepMind)

Program Committee

José L. Abellán (Catholic University of Murcia)

Sam Ainsworth (University of Edinburgh)

Ulysse Beaugnon (Google DeepMind)

José Cano (University of Glasgow)

Marco Cornero (DeepMind)

Juliana Franco (Google DeepMind)

Sushant Kondguli (Meta)

Paolo Meloni (University of Cagliari)

Ozcan Ozturk (Sabancı University)

Valentin Radu (University of Sheffield)

Nishit Shah (Intel)

Zheng Wang (University of Leeds)

Lei Xun (University of Southampton)

Contact