New framework reduces memory usage and boosts energy efficiency for large-scale AI graph analysis

June 23, 2025

The GIST New framework reduces memory usage and boosts energy efficiency for large-scale AI graph analysis

Where did the wonder go, and can AI help us find it?

June 24, 2025

AI is consuming more power than the grid can handle. Nuclear might be the answer

June 24, 2025

Gaby Clark

scientific editor

Robert Egan

associate editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

proofread

Real-time, large-scale graph neural network inference through BingoCGN — BingoCGN employs cross-partition message quantization to summarize inter-partition message flow, which eliminates the need for irregular off-chip memory access and utilizes a fine-grained structured strong lottery theory-based training algorithm to improve computational efficiency. Credit: Institute of Science Tokyo, Japan

BingoCGN, a scalable and efficient graph neural network accelerator that enables inference of real-time, large-scale graphs through graph partitioning, has been developed by researchers at the Institute of Science Tokyo, Japan. This breakthrough framework utilizes an innovative cross-partition message quantization technique and a novel training algorithm to significantly reduce memory demands and increase computational and energy efficiency.

Graph neural networks (GNNs) are powerful artificial intelligence (AI) models designed for analyzing complex, unstructured graph data. In such data, entities are represented as nodes and relationships between them are the edges. GNNs have been successfully employed in many real-world applications, including social networks, drug discovery, autonomous driving, and recommendation systems. Despite their potential, achieving real-time, large-scale GNN inference, critical for tasks like autonomous driving, remains challenging.

Large graphs require extensive memory, often overflowing on-chip buffers, which are memory regions integrated into a chip. This forces the system to rely on slower off-chip memory. Since graph data is stored irregularly, this leads to irregular memory access patterns, degrading computational efficiency and increasing energy consumption.

One promising solution is graph partitioning, where large graphs are divided into smaller graphs, each assigned its own on-chip buffer. This results in more localized memory access patterns and smaller buffer size requirements as the number of partitions increases.

However, this is only partially effective. As the number of partitions grows, the links between the partitions and inter-partition edges grow substantially. This requires increased off-chip memory access, limiting scalability.

To address this issue, a research team led by Associate Professor Daichi Fujiki from Institute of Science Tokyo, Japan, developed a novel, scalable and efficient GNN accelerator called BingoCGN. "BingoCGN employs a new technique called cross-partition message quantization (CMQ) that summarizes inter-partition message flow, eliminating irregular off-chip memory access, and a new training algorithm that significantly boosts computational efficiency," explains Fujiki. Their findings will be presented at the Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA '25) from June 21–25, 2025.

CMQ uses a technique called vector quantization, which clusters inter-partition nodes and represents them using points called centroids. Nodes are clustered based on their distance, with each node assigned to its nearest centroid. For a given partition, these centroids replace the inter-partition nodes, effectively compressing node data. The centroids are stored in tables called codebooks, which reside directly in the on-chip buffer.

CMQ, therefore, allows inter-partition communication without the need for irregular and costly off-chip memory access. Additionally, since this method requires frequent reading and writing of nodes and centroids to memory, this method employs a hierarchical tree-like structure for codebooks, with parent and child centroids, reducing computation demands while maintaining accuracy.

While CMQ solves the memory bottleneck, it shifts the burden to computation. To counter this, the researchers developed a novel training algorithm based on strong lottery ticket theory. In this method, the GNN is initialized with random weights, generated on-chip using random number generators.

Then, unnecessary weights are pruned using a mask, forming a smaller, less dense or sparse sub-network that has comparable accuracy to the full GNN but is significantly more efficient to compute. Further, this method incorporates fine-grained (FG) structured pruning, which uses multiple masks with different levels of sparsity, to construct an even smaller and more efficient sub-network.

"Through these techniques, BingoCGN achieves high-performance GNN inference even on finely partitioned graph data, which was previously considered difficult," remarks Fujiki. "Our hardware implementation, tested on seven real-world datasets, achieves up to 65-fold speedup and up-to 107-fold increase in energy-efficiency compared to state-of-the-art accelerator FlowGNN."

This breakthrough opens the door to real-time processing of large-scale graph data, paving the way for diverse real-world applications of GNNs.

More information: Jiale Yan et al, BingoGCN: Towards Scalable and Efficient GNN Acceleration with Fine-Grained Partitioning and SLT, Proceedings of the 52nd Annual International Symposium on Computer Architecture (2025). DOI: 10.1145/3695053.3731115

Provided by Institute of Science Tokyo Citation: New framework reduces memory usage and boosts energy efficiency for large-scale AI graph analysis (2025, June 23) retrieved 23 June 2025 from https://techxplore.com/news/2025-06-framework-memory-usage-boosts-energy.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Novel out-of-core mechanism introduced for large-scale graph neural network training shares

Feedback to editors

New framework reduces memory usage and boosts energy efficiency for large-scale AI graph analysis

Where did the wonder go, and can AI help us find it?

AI is consuming more power than the grid can handle. Nuclear might be the answer

Related Posts

Where did the wonder go, and can AI help us find it?

AI is consuming more power than the grid can handle. Nuclear might be the answer

Semantic watermarks for AI image recognition can be easily manipulated

Here’s why the public needs to challenge the ‘good AI’ myth pushed by tech companies

AI applications are producing cleaner cities, smarter homes and more efficient transit

The law relies on being precise. AI is disrupting that

Bilinear sequence regression model shows why AI excels at learning from word sequences

Recent News

Metaplanet Plans to Inject $5B Into U.S. Unit to Accelerate Bitcoin Buying Strategy

Crypto Scam Markets Thrive Again After Telegram’s Cleanup Attempt: Report

Bitcoin rallies to $106K on Mideast ceasefire news; Circle shares continue explosive climb

Where did the wonder go, and can AI help us find it?

TOP News

The best Android phones for 2023

Google’s new AI Core update for Pixel 8 Pro will boost its powers and performance

Shiba Inu Price Prediction Today

North Korean Hackers Pose as South Korean Government Officials to Steal Crypto

My go-to robot vacuum and mop is still $455 off following Cyber Monday

Welcome Back!

Retrieve your password