CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
Tuesday, July 22, 2025
No Result
View All Result
CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
No Result
View All Result
CRYPTOREPORTCLUB

AI vision, reinvented: Vision-language models gain clearer sight through synthetic training data

July 21, 2025
154
0

July 21, 2025

The GIST AI vision, reinvented: Vision-language models gain clearer sight through synthetic training data

Related Post

Scalable transformer accelerator enables on-device execution of large language models

Scalable transformer accelerator enables on-device execution of large language models

July 22, 2025
Probing AI ‘thoughts’ reveals models use tree-like math to track shifting information

Probing AI ‘thoughts’ reveals models use tree-like math to track shifting information

July 22, 2025
Gaby Clark

scientific editor

Andrew Zinin

lead editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

proofread

AI vision, reinvented: The power of synthetic data
CoSyn works by leveraging the language skills of open-source AI models to create training data for other AI models to learn how to read complex, text-rich images. Credit: Yue Yang

In the race to develop AI that understands complex images like financial forecasts, medical diagrams and nutrition labels—essential for AI to operate independently in everyday settings—closed-source systems like ChatGPT and Claude currently set the pace. But no one outside their makers knows how those models were trained or what data they used, leaving open-source alternatives scrambling to catch up.

Now, researchers at Penn Engineering and the Allen Institute for AI (Ai2) have developed a new approach to train open-source models: using AI to create scientific figures, charts and tables that teach other AI systems how to interpret complex visual information.

Their tool, CoSyn (short for Code-Guided Synthesis), taps open-source AI models' coding skills to render text-rich images and generate relevant questions and answers, giving other AI systems the data they need to learn how to "see" and understand scientific figures.

As the researchers detail in a paper for ACL 2025, one of the world's leading AI conferences, CoSyn-trained models match or outperform their proprietary peers.

"This is like taking a student who's great at writing and asking them to teach someone how to draw, just by describing what the drawing should look like," says Yue Yang (GrEng'25), co-first author and Research Scientist at Ai2's PRIOR: Perceptual Reasoning and Interaction Research group. "We're essentially transferring the strengths of open-source AI from text to vision."

Synthetic images, real results

The resulting dataset, called CoSyn-400K, includes more than 400,000 synthetic images and 2.7 million sets of corresponding instructions, in categories as varied as scientific charts, chemical structures and user-interface screenshots. CoSyn-trained models outperformed top proprietary systems like GPT-4V and Gemini 1.5 Flash on a suite of seven benchmark tests.

In one particularly striking case, the researchers synthetically generated just 7,000 nutrition labels to train a model for a new benchmark they created, NutritionQA. That small, targeted dataset enabled their model to beat others trained on millions of real images.

"Training AI with CoSyn is incredibly data efficient," says Mark Yatskar, Assistant Professor in CIS and Yang's doctoral co-advisor. "We're showing that synthetic data can help models generalize to real-world scenarios that could be unique to a person's needs, like reading a nutrition label for someone with low vision."

Yue Yang demonstrates CoSyn's capabilities, using a model trained on synthetic data created with CoSyn to read nutrition labels and solve math problems. Credit: Sylvia Zhang

Scaling and diversifying the dataset

Creating hundreds of thousands of useful, varied training examples posed its own challenges.

To reach the scale required, co-first-author Ajay Patel, a doctoral student in Computer and Information Science (CIS), developed a software library called DataDreamer that automated the entire process of generating data. This allowed the team to prompt language models in parallel, enabling large-scale production of synthetic images and instructions.

In order to avoid repetition, the team leveraged "personas," short character profiles like "a sci-fi novelist" or "a chemistry teacher," which guided the AI's responses and shaped the content and tone of each example. Embedding these personas into prompts led CoSyn to produce richer, more varied training data across a wide range of domains.

"AI models tend to repeat themselves unless you nudge them into different perspectives," explains Patel. "Personas give us a scalable way to do that, and the results speak for themselves."

Leveling the playing field for open-source AI

By building CoSyn entirely with open-source tools, the researchers hope to democratize access to powerful vision-language training methods without the ethical and legal challenges surrounding web scraping and copyrighted content.

"This is a step towards AI helping us make new scientific discoveries," adds Chris Callison-Burch, Professor in CIS, who co-advised Yang and currently advises Patel. "It opens the door to AI systems that can reason about scientific documents, which could help a wide range of people, from college students to researchers."

From understanding to action

The team has released the full CoSyn code and dataset to the public, inviting the global research community to build upon their work.

Yang is already looking ahead to synthetic data that can help AI not only understand images, but also interact with them, serving as intelligent digital agents that can click buttons, fill out forms and assist users in daily tasks.

"In the long run, we want AI that can act in the world, not just describe it," Yang says. "This is one way to teach it how."

More information: Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation, yueyang1996.github.io/papers/cosyn.pdf

Provided by University of Pennsylvania Citation: AI vision, reinvented: Vision-language models gain clearer sight through synthetic training data (2025, July 21) retrieved 21 July 2025 from https://techxplore.com/news/2025-07-ai-vision-reinvented-language-gain.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

AI generates data to help embodied agents ground language to 3D world 1 shares

Feedback to editors

Share212Tweet133ShareShare27ShareSend

Related Posts

Scalable transformer accelerator enables on-device execution of large language models
AI

Scalable transformer accelerator enables on-device execution of large language models

July 22, 2025
0

July 21, 2025 The GIST Scalable transformer accelerator enables on-device execution of large language models Sadie Harley scientific editor Andrew Zinin lead editor Editors' notes This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility: fact-checked...

Read moreDetails
Probing AI ‘thoughts’ reveals models use tree-like math to track shifting information

Probing AI ‘thoughts’ reveals models use tree-like math to track shifting information

July 22, 2025
AI comes to California’s electric grid

AI comes to California’s electric grid

July 21, 2025
AI models learn to split up tasks, slashing wait times for complex prompts

AI models learn to split up tasks, slashing wait times for complex prompts

July 21, 2025
Platform can make machine learning more transparent and accessible

Platform can make machine learning more transparent and accessible

July 21, 2025
Researchers use multidimensional data mining for obstacle avoidance system in autonomous vehicles

Researchers use multidimensional data mining for obstacle avoidance system in autonomous vehicles

July 21, 2025
AI is now part of our world. University graduates should know how to use it responsibly

AI is now part of our world. University graduates should know how to use it responsibly

July 19, 2025

Recent News

Scalable transformer accelerator enables on-device execution of large language models

Scalable transformer accelerator enables on-device execution of large language models

July 22, 2025
Probing AI ‘thoughts’ reveals models use tree-like math to track shifting information

Probing AI ‘thoughts’ reveals models use tree-like math to track shifting information

July 22, 2025
AI vision, reinvented: Vision-language models gain clearer sight through synthetic training data

AI vision, reinvented: Vision-language models gain clearer sight through synthetic training data

July 21, 2025
Google shows off the Pixel 10 less than a month before its launch

Google shows off the Pixel 10 less than a month before its launch

July 21, 2025

TOP News

  • Bitcoin Sees Long-Term Holders Sell As Short-Term Buyers Step In – Sign Of Rally Exhaustion?

    Bitcoin Sees Long-Term Holders Sell As Short-Term Buyers Step In – Sign Of Rally Exhaustion?

    534 shares
    Share 214 Tweet 134
  • AI-driven personalized pricing may not help consumers

    541 shares
    Share 216 Tweet 135
  • Our favorite power bank for iPhones is 20 percent off right now

    541 shares
    Share 216 Tweet 135
  • God help us, Donald Trump plans to sell a phone

    541 shares
    Share 216 Tweet 135
  • Investment Giant 21Shares Announces New Five Altcoins Including Avalanche (AVAX)!

    541 shares
    Share 216 Tweet 135
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of Use
Advertising: digestmediaholding@gmail.com

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Crypto news
  • AI
  • Technologies

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved