CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
Saturday, September 6, 2025
No Result
View All Result
CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
No Result
View All Result
CRYPTOREPORTCLUB

Retraining AI to fortify itself against rogue rewiring even after key layers are removed

September 5, 2025
150
0

September 5, 2025

The GIST Retraining AI to fortify itself against rogue rewiring even after key layers are removed

Related Post

Large language models can execute complete ransomware attacks autonomously, research shows

Large language models can execute complete ransomware attacks autonomously, research shows

September 5, 2025
Where AI models fall short in mimicking the expressiveness of human speech

Where AI models fall short in mimicking the expressiveness of human speech

September 5, 2025
Stephanie Baum

scientific editor

Robert Egan

associate editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

preprint

trusted source

proofread

Researchers fortify AI against rogue rewiring
(A) We investigate early exits from different image encoder layers and find that VLM safety alignment varies, leading to what we term Image Encoder Early Exit (ICET) vulnerability. We propose Layer-wise Clip-PPO (L-PPO) to alleviate ICET. (B) With the same input (image and prompt), choosing different image encoder layers significantly affects the safety of the output response. (C) Safety training is applied with the model’s default settings and architecture, but limited generalization creates vulnerabilities, leaving parts of the embedding space uncovered when architectural changes occur (e.g., using a different intermediate layer embedding than during training). Credit: arXiv (2024). DOI: 10.48550/arxiv.2411.04291

As generative AI models move from massive cloud servers to phones and cars, they're stripped down to save power. But what gets trimmed can include the technology that stops them from spewing hate speech or offering roadmaps for criminal activity.

To counter this threat, researchers at the University of California, Riverside, have developed a method to preserve AI safeguards even when open-source AI models are stripped down to run on lower-power devices. Their work is published on the arXiv preprint server.

Unlike proprietary AI systems, open‑source models can be downloaded, modified, and run offline by anyone. Their accessibility promotes innovation and transparency but also creates challenges when it comes to oversight. Without the cloud infrastructure and constant monitoring available to closed systems, these models are vulnerable to misuse.

The UCR researchers focused on a key issue: carefully designed safety features erode when open-source AI models are reduced in size. This happens because lower‑power deployments often skip internal processing layers to conserve memory and computational power. Dropping layers improves the models' speed and efficiency, but could also result in answers containing pornography, or detailed instructions for making weapons.

"Some of the skipped layers turn out to be essential for preventing unsafe outputs," said Amit Roy-Chowdhury, professor of electrical and computer engineering and senior author of the study. "If you leave them out, the model may start answering questions it shouldn't."

The team's solution was to retrain the model's internal structure so that its ability to detect and block dangerous prompts is preserved, even when key layers are removed. Their approach avoids external filters or software patches. Instead, it changes how the model understands risky content at a fundamental level.

"Our goal was to make sure the model doesn't forget how to behave safely when it's been slimmed down," said Saketh Bachu, UCR graduate student and co-lead author of the study.

To test their method, the researchers used LLaVA 1.5, a vision‑language model capable of processing both text and images. They found that certain combinations, such as pairing a harmless image with a malicious question, could bypass the model's safety filters. In one instance, the altered model responded with detailed instructions for building a bomb.

After retraining, however, the model reliably refused to answer dangerous queries, even when deployed with only a fraction of its original architecture.

"This isn't about adding filters or external guardrails," Bachu said. "We're changing the model's internal understanding, so it's on good behavior by default, even when it's been modified."

Bachu and co-lead author Erfan Shayegani, also a graduate student, describe the work as "benevolent hacking," a way of fortifying models before vulnerabilities can be exploited. Their ultimate goal is to develop techniques that ensure safety across every internal layer, making AI more robust in real‑world conditions.

In addition to Roy-Chowdhury, Bachu, and Shayegani, the research team included doctoral students Arindam Dutta, Rohit Lal, and Trishna Chakraborty, and UCR faculty members Chengyu Song, Yue Dong, and Nael Abu-Ghazaleh. Their work was presented this year at the International Conference on Machine Learning in Vancouver, Canada.

"There's still more work to do," Roy-Chowdhury said. "But this is a concrete step toward developing AI in a way that's both open and responsible."

More information: Saketh Bachu et al, Layer-wise Alignment: Examining Safety Alignment Across Image Encoder Layers in Vision Language Models, arXiv (2024). DOI: 10.48550/arxiv.2411.04291

Journal information: arXiv Provided by University of California – Riverside Citation: Retraining AI to fortify itself against rogue rewiring even after key layers are removed (2025, September 5) retrieved 5 September 2025 from https://techxplore.com/news/2025-09-retraining-ai-fortify-rogue-rewiring.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New method enables AI models to forget private and copyrighted data 16 shares

Feedback to editors

Share212Tweet133ShareShare27ShareSend

Related Posts

Large language models can execute complete ransomware attacks autonomously, research shows
AI

Large language models can execute complete ransomware attacks autonomously, research shows

September 5, 2025
0

September 5, 2025 The GIST Large language models can execute complete ransomware attacks autonomously, research shows Lisa Lock scientific editor Andrew Zinin lead editor Editors' notes This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:...

Read moreDetails
Where AI models fall short in mimicking the expressiveness of human speech

Where AI models fall short in mimicking the expressiveness of human speech

September 5, 2025
Europe’s fastest supercomputer to boost AI drive

Europe’s fastest supercomputer to boost AI drive

September 5, 2025
Similarities between human and AI learning offer intuitive design insights

Similarities between human and AI learning offer intuitive design insights

September 4, 2025
Researchers discover a GPU vulnerability that could threaten AI models

Researchers discover a GPU vulnerability that could threaten AI models

September 4, 2025
RoboBallet system enables robotic arms to work together like a well-choreographed dance

RoboBallet system enables robotic arms to work together like a well-choreographed dance

September 4, 2025
OpenAI looks to online advertising deal. AI-driven ads will be hard for consumers to spot

OpenAI looks to online advertising deal. AI-driven ads will be hard for consumers to spot

September 4, 2025

Recent News

Litecoin feuds with influencer, trades barbs over price…and hairline

Litecoin feuds with influencer, trades barbs over price…and hairline

September 6, 2025
Meta is fixing threads on Threads

Meta is fixing threads on Threads

September 6, 2025
Dogecoin and SHIB ETF Speculation Could Trigger a 50% Surge

Dogecoin and SHIB ETF Speculation Could Trigger a 50% Surge

September 6, 2025
Amazon greenlights a Life is Strange series adaptation

Amazon greenlights a Life is Strange series adaptation

September 6, 2025

TOP News

  • Investment Giant 21Shares Announces New Five Altcoins Including Avalanche (AVAX)!

    570 shares
    Share 228 Tweet 143
  • God help us, Donald Trump plans to sell a phone

    570 shares
    Share 228 Tweet 143
  • WhatsApp has ads now, but only in the Updates tab

    569 shares
    Share 228 Tweet 142
  • Tron Looks to go Public in the U.S., Form Strategy Like TRX Holding Firm: FT

    570 shares
    Share 228 Tweet 143
  • AI generates data to help embodied agents ground language to 3D world

    569 shares
    Share 228 Tweet 142
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of Use
Advertising: digestmediaholding@gmail.com

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Crypto news
  • AI
  • Technologies

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved