In this 7th AccML workshop, we aim to bring together researchers working in Machine Learning and System Architecture to discuss requirements, opportunities, challenges and next steps in developing novel approaches for machine learning systems.
Find Out More
The remarkable performance achieved in a variety of application areas (natural language processing, computer vision, games, etc.) has led to the emergence of heterogeneous architectures to accelerate machine learning workloads. In parallel, production deployment, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, software and system architectures, dedicated runtime systems and numerical libraries, deployment and analysis tools. Deep learning models are generally memory and computationally intensive, for both training and inference. Accelerating these operations has obvious advantages, first by reducing the energy consumption (e.g. in data centers), and secondly, making these models usable on smaller devices at the edge of the Internet. In addition, while convolutional neural networks have motivated much of this effort, numerous applications and models involve a wider variety of operations, network architectures, and data processing. These applications and models permanently challenge computer architecture, the system stack, and programming abstractions. The high level of interest in these areas calls for a dedicated forum to discuss emerging acceleration techniques and computation paradigms for machine learning algorithms, as well as the applications of machine learning to the construction of such systems.
The workshop brings together researchers and practitioners working on computing systems for machine learning, and using machine learning to build better computing systems. It also reaches out to a wider community interested in this rapidly growing area, to raise awareness of the existing efforts, to foster collaboration and the free exchange of ideas.
This builds on the success of our previous events:
Topics of interest include (but are not limited to):
Novel ML systems: heterogeneous multi/many-core systems, GPUs and FPGAs;
Software ML acceleration: languages, primitives, libraries, compilers and frameworks;
Novel ML hardware accelerators and associated software;
Emerging semiconductor technologies with applications to ML hardware acceleration;
ML for the construction and tuning of systems;
Cloud and edge ML computing: hardware and software to accelerate training and inference;
Computing systems research addressing the privacy and security of ML-dominated systems;
ML techniques for more efficient model training and inference (e.g. sparsity, pruning, etc);
Generative AI and their impact on computational resources;
Submission deadline: November 4 November 18, 2024
Notification to authors: December 2 December 16, 2024
Papers should be in double column IEEE format of between 4 and 8 pages including references. Papers should be uploaded as PDF and not anonymized.
Submissions can be made at https://easychair.org/my/conference?conf=7thaccml.
Papers will be reviewed by the workshop's technical program committee according to criteria regarding a submission's quality, relevance to the workshop's topics, and, foremost, its potential to spark discussions about directions, insights, and solutions on the topics mentioned above. Research papers, case studies, and position papers are all welcome.
In particular, we encourage authors to keep the following options in mind when preparing submissions:
Tentative Research Ideas: Presenting your research idea early one to get feedback and enable collaborations.
Works-In-Progress: To facilitate sharing of thought-provoking ideas and high-potential though preliminary research, authors are welcome to make submissions describing early-stage, in-progress, and/or exploratory work in order to elicit feedback, discover collaboration opportunities, and generally spark discussion.
Senior Staff Engineer at Google
Title: ML training scalability challenges
Bio
I am currently Senior Staff Engineer at Google, where I design, develop, and deliver high performance accelerator systems like the YouTube Video (trans)Coding Unit (VCU), or the ML infrastructure powering Gemini. Before that, I was Principal Research Scientist at NVIDIA in the Architecture group, associate professor at UPC, and Research Manager at the Barcelona Supercomputing Center.
I have a BsC ('95), MsC ('97) and PhD ('02, awarded the UPC extraordinary award to the best PhD in computer science) in Computer Science from the Universitat Politecnica de Catalunya, Barcelona, Spain. I have been a summer student intern with Compaq's Western Research Laboratory in Palo Alto, California for two consecutive years ('99-'00), and with Intel's Microprocessor Research Laboratory in Santa Clara ('01). I was awarded the first edition of the Agustin de Betancourt Award to a Young Researcher by the Spanish Royal Academy of Engineering in 2010.
I have co-authored more than 150 papers in international refereed conferences and journals, and supervised 10 PhD students. My research interests include energy efficient supercomputing, heterogeneous multicore architectures, hardware support for programming models, and simulation techniques.
Senior Research Scientist at NVIDIA
Title: Accelerating Access to Data Storage and Services in the Age of AI
Abstract
Access to data storage and data services for GPUs has traditionally relied on the host CPU. While this approach might still be efficient for regular workloads and datasets that can be evenly partitioned, there are emerging applications (e.g., graph and data analytics, graph neural networks, recommender systems) that have a more irregular and data-dependent behavior. With the traditional approach, these workloads suffer from CPU-GPU synchronization, I/O traffic amplification, and long CPU latencies. In this talk, we will introduce GPU-initiated access to storage and services, which can efficiently support these emerging applications.
Bio
Juan Gómez Luna is a senior research scientist at NVIDIA since 2023. He received the BS and MS degrees in Telecommunication Engineering from the University of Sevilla, Spain, in 2001, and the PhD degree in Computer Science from the University of Córdoba, Spain, in 2012. Between 2005 and 2017, he was a faculty member of the University of Córdoba. Between 2017 and 2023, he worked as a senior researcher and lecturer at professor Onur Mutlu's SAFARI Research Group at ETH Zürich. His research interests focus on GPU and heterogeneous computing, memory and storage systems, processing-in-memory, and hardware and software acceleration of medical imaging and bioinformatics. He is the lead author of PrIM, the first publicly-available benchmark suite for a real-world processing-in-memory architecture, and Chai, a benchmark suite for heterogeneous systems with CPU/GPU/FPGA.
Research Manager Barcelona Supercomputing Center
Title: Managing and optimizing the life-cycle of AI-HPC workflows
Abstract
While the TOP500 yearly reflects the increase in the size of the HPC systems, the computing paradigm is also extended to consider a continuum edge-to-cloud/HPC.
Furthermore, the user community is aware of the underlying performance and eager to leverage it by providing more complex application workflows. The trend in such applications is to combine traditional HPC modelling and simulation with data analytics and artificial intelligence.
Our recent research has focused on making easier the management and optimization of the whole life-cycle of these workflows, from development to deployment and operation. Our work is based on PyCOMPSs, a parallel task-based programming environment in Python. Based on simple annotations, it can execute sequential Python programs in parallel in HPC clusters and other distributed infrastructures.
PyCOMPSs has been extended to support tasks that invoke HPC applications and combine them with Artificial Intelligence and Data analytics frameworks.
The talk will present this recent research and development, including examples of application workflows and how their performance or accuracy of results has been improved.
Bio
Rosa M. Badia holds a PhD in Computer Science (1994) from the Technical University of Catalonia (UPC). She is the manager of the Workflows and Distributed Computing research group at the Barcelona Supercomputing Center (BSC, Spain). Her research has contributed to parallel programming models for multicore and distributed computing. Recent contributions have been focused in the area of the digital continuum, proposing new programming environments and software environment for edge-to-cloud/HPC. The research is integrated in PyCOMPSs/COMPSs, a parallel task-based programming distributed computing framework, and its application to developing large heterogeneous workflows that combine HPC, Big Data, and Machine Learning. The group is also doing research around the dislib, a parallel machine learning library parallelized with PyCOMPSs. Dr Badia has published nearly 200 papers on her research topics in international conferences and journals. She has been very active in projects funded by the European Commission and in contracts with industry. She has been the PI of the EuroHPC project eFlows4HPC.
She is a member of HiPEAC Network of Excellence. She received the Euro-Par Achievement Award 2019 for her contributions to parallel processing, the DonaTIC award, category Academia/Researcher in 2019 and the HPDC Achievement Award 2021 for her innovations in parallel task-based programming models, workflow applications and systems, and leadership in the high performance computing research community. In 2023, she has been invited to be a member of the Institut d'Estudis Catalans (Catalan academy).
Professor of Intelligent Digital Systems, Imperial College London
Title: Deep Neural Networks in the Embedded Space: Opportunities and Challenges
Abstract
The talk will address the challenging task of designing FPGA-based
hardware accelerators for Convolutional Neural Networks (CNNs), with a particular
focus on the embedded space. It will highlight the opportunities offered by
reconfigurable computing, specifically the advantages of design customization, and
delve into the challenges faced when mapping large CNN models onto embedded
FPGA devices.
I will share my research team’s efforts in tackling these challenges and provide an in-
depth overview of our fpgaConvNet toolchain. This toolchain addresses key
obstacles in the design process, enabling the generation of high-performance CNN
accelerators that achieve state-of-the-art results.
Bio
Christos-Savvas Bouganis is a Professor of Intelligent Digital Systems in
the Department of Electrical and Electronic Engineering, Imperial College London,
U.K. He is leading the iDSL group at Imperial College,
with a focus on the theory and practice of reconfigurable computing and design
automation, mainly targeting the domains of Machine Learning, Computer Vision,
and Robotics.
Time | 21st January 2025 |
---|---|
10:00 – 10:10 | Welcome |
10:10 – 11:00 | Keynote talk: ML training scalability challenges (Álex Ramírez, Google) |
11:00 – 11:30 | Coffee break |
11:30 – 12:10 | Invited talk 1: Accelerating Access to Data Storage and Services in the Age of AI (Juan Gómez Luna, NVIDIA) |
12:10 – 12:35 | Paper talk: On Hardening DNNs against Noisy Computations by Quantization (presented by Xiao Wang, Heidelberg University) Xiao Wang, Hendrik Borras, Bernhard Klein and Holger Fröning |
12:35 – 13:00 | Paper talk: Large-Scale Evolutionary Optimization of Artificial Neural Networks Using Adaptive Mutations (presented by Rune Krauss, DFKI) Rune Krauss, Jan Zielasko and Rolf Drechsler |
13:00 – 14:00 | Lunch break |
14:00 – 14:40 | Invited talk 2: Managing and optimizing the life-cycle of AI-HPC workflows (Rosa M. Badia, Barcelona Supercomputing Center) |
14:40 – 15:05 | Paper talk: Exploring the Potential of Wireless-Enabled Multi-Chip AI Accelerators with Gemini (presented by Emmanuel Irabor, Universitat Politecnica de Catalunya) Emmanuel Irabor, Mariam Musavi, Abhijit Das and Sergi Abadal |
15:05 – 15:30 | Paper talk: Optimized XGBoost Architecture on FPGA for High-Performance and Seamless Deployment (presented by Rodrigo Olmos, Universidad Politecnica de Madrid) Rodrigo Olmos and Andres Otero |
15:30 – 16:00 | Coffee break |
16:00 – 16:40 | Invited talk 3: Deep Neural Networks in the Embedded Space: Opportunities and Challenges (Christos Bouganis, Imperial College London) |
16:40 – 17:05 | Paper talk: Multiclass Random Forest on FPGA using Conifer (presented by Kévin Druart, UBO/Lab-Sticc) Kévin Druart, David Espes, Catherine Dezan and Alain Deturche |
17:05 – 17:30 | Paper talk: Exploring the applicability of Graph Attention Networks in computer vision and their hardware acceleration (presented by Abdolvahab Khalili, Linköping University) Abdolvahab Khalili Sadaghiani and Jose Nunez-Yanez |
17:30 – 17:35 | Closing remarks |