CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
Friday, July 25, 2025
No Result
View All Result
CRYPTOREPORTCLUB
  • Crypto news
  • AI
  • Technologies
No Result
View All Result
CRYPTOREPORTCLUB

New dataset and models boost Portuguese language AI performance to match English

July 23, 2025
154
0

July 23, 2025

The GIST New dataset and models boost Portuguese language AI performance to match English

Related Post

Trump’s AI plan calls for massive data centers. Here’s how it may affect energy in the US

Trump’s AI plan calls for massive data centers. Here’s how it may affect energy in the US

July 25, 2025
Tradition meets AI in Nishijinori weaving style from Japan’s ancient capital

Tradition meets AI in Nishijinori weaving style from Japan’s ancient capital

July 25, 2025
Gaby Clark

scientific editor

Andrew Zinin

lead editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

proofread

Portuguese
Credit: Unsplash/CC0 Public Domain

Large language models, such as ChatGPT, perform significantly less well in Portuguese than in English despite both languages being spoken worldwide. This gap has now been closed with "GigaVerbo." The team led by Dr. Nicholas Kluge Corrêa from the Center for Science and Thought at the University of Bonn is now presenting the project in the journal Patterns. The researchers were among the first to utilize the new "Marvin" supercomputer at the University of Bonn. Nicholas Kluge Corrêa and his colleague Aniket Sen are both members of the Transdisciplinary Research Area "Sustainable Futures" at the University of Bonn.

GigaVerbo is the name of the dataset developed by the researchers. The project "Tucano: Advancing Neural Text Generation for Portuguese" aims to bridge the resource gap in Portuguese natural language processing (NLP) by providing high-quality datasets and cutting-edge language models specifically designed for the Portuguese language.

The development and release of the GigaVerbo corpus, comprising 200 billion deduplicated tokens, along with the Tucano family of models, aims to foster progress in neural text generation in an open and reproducible manner, promoting equitable access.

The researchers collected several Portuguese corpora from different sources to ensure high linguistic diversity and quality. These corpora were then deduplicated and filtered to form the GigaVerbo dataset. Using this dataset, they trained several decoder models on the Marvin supercomputer, which followed rigorous evaluation and optimization cycles.

The project addresses two major gaps: first, the scarcity of comprehensive open-source resources for Portuguese, a language often overshadowed by resource-rich languages like English. Second, the deficiency in open-source LLM development, which impedes the scientific reproducibility of these models.

The researchers are currently working to scale up their developments in Portuguese by improving their dataset and training larger models. They are also currently developing resources for other low-resource languages, such as Bengali and Hindi, all thanks to Marvin and the University of Bonn.

More information: Nicholas Kluge Corrêa et al, Tucano: Advancing neural text generation for Portuguese, Patterns (2025). DOI: 10.1016/j.patter.2025.101325

Journal information: Patterns Provided by University of Bonn Citation: New dataset and models boost Portuguese language AI performance to match English (2025, July 23) retrieved 23 July 2025 from https://techxplore.com/news/2025-07-dataset-boost-portuguese-language-ai.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New open-source language model offers multilingual support and public transparency shares

Feedback to editors

Share212Tweet133ShareShare27ShareSend

Related Posts

Trump’s AI plan calls for massive data centers. Here’s how it may affect energy in the US
AI

Trump’s AI plan calls for massive data centers. Here’s how it may affect energy in the US

July 25, 2025
0

July 25, 2025 The GIST Trump's AI plan calls for massive data centers. Here's how it may affect energy in the US Andrew Zinin lead editor Editors' notes This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the...

Read moreDetails
Tradition meets AI in Nishijinori weaving style from Japan’s ancient capital

Tradition meets AI in Nishijinori weaving style from Japan’s ancient capital

July 25, 2025
AI tackles notoriously complex equations, enabling faster advances in drug and material design

AI tackles notoriously complex equations, enabling faster advances in drug and material design

July 25, 2025
AI will soon be able to audit all published research—what will that mean for public trust in science?

AI will soon be able to audit all published research—what will that mean for public trust in science?

July 25, 2025
A human-inspired pathfinding approach to improve robot navigation

A human-inspired pathfinding approach to improve robot navigation

July 25, 2025
Scientists develop tool to detect fake videos

Scientists develop tool to detect fake videos

July 25, 2025
Innovative robotic slip-prevention method could bring human-like dexterity to industrial automation

Innovative robotic slip-prevention method could bring human-like dexterity to industrial automation

July 25, 2025

Recent News

Amazon is developing a Wolfenstein TV show

Amazon is developing a Wolfenstein TV show

July 25, 2025

Tea App That Claimed to Protect Women Exposes 72,000 IDs in Epic Security Fail

July 25, 2025
LeBron James is reportedly trying to stop the spread of viral AI ‘pregnancy’ videos

LeBron James is reportedly trying to stop the spread of viral AI ‘pregnancy’ videos

July 25, 2025
Breaking Bad creator’s new show streams on Apple TV+ November 7

Breaking Bad creator’s new show streams on Apple TV+ November 7

July 25, 2025

TOP News

  • Bitcoin Sees Long-Term Holders Sell As Short-Term Buyers Step In – Sign Of Rally Exhaustion?

    Bitcoin Sees Long-Term Holders Sell As Short-Term Buyers Step In – Sign Of Rally Exhaustion?

    534 shares
    Share 214 Tweet 134
  • The AirPods 4 are still on sale at a near record low price

    533 shares
    Share 213 Tweet 133
  • Ripple Partners With Ctrl Alt to Expand Custody Footprint Into Middle East

    533 shares
    Share 213 Tweet 133
  • Cyberpunk 2077: Ultimate Edition comes to the Mac on July 17

    533 shares
    Share 213 Tweet 133
  • HBO confirms The Last of Us season 3 will arrive in 2027

    533 shares
    Share 213 Tweet 133
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of Use
Advertising: digestmediaholding@gmail.com

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Crypto news
  • AI
  • Technologies

Disclaimer: Information found on cryptoreportclub.com is those of writers quoted. It does not represent the opinions of cryptoreportclub.com on whether to sell, buy or hold any investments. You are advised to conduct your own research before making any investment decisions. Use provided information at your own risk.
cryptoreportclub.com covers fintech, blockchain and Bitcoin bringing you the latest crypto news and analyses on the future of money.

© 2023-2025 Cryptoreportclub. All Rights Reserved