Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning

February 26, 2024

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

proofread

Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning

Image-sentence retrieval task aims to search images for given sentences and retrieve sentences from image queries. The current retrieval methods are all supervised methods that require a large number of annotations for training. However, considering the labor cost, it is difficult to re-align large amounts of multimodal data in many applications (e.g., medical retrieval), which results in unsupervised multimodal data.

A research team led by Yang Yang published their new research in Frontiers of Computer Science.

To solve the problem the team strive to take a step towards non-parallel image-sentence retrieval by designing the alignment transfer, and propose a novel Alignment Efficient Image-Sentence Retrieval method (AEIR).

In the research, AEIR use other auxiliary parallel data with multimodal consistency as the source domain and non-parallel data with missing consistency as the target domain. Unlike unimodal transfer learning, AEIR transfers semantic representations and modal consistency relations together from the source domain to the target domain.

Firstly, AEIR learns cross-modal consistency representations using cross-modal parallel data in the source domain. Then AEIR jointly optimizes adversarial learning-based semantic transfer constraints and metric learning-based structural transfer constraints to learn cross-domain cross-modal consistency representations to achieve transfer of consistency knowledge from the source domain to the target domain.

A large number of experimental experiments conducted in different transfer scenarios show that semantic transfer and structural transfer can effectively learn invariant features across modalities across domains. The proposed efficient alignment-based image-sentence retrieval network verifies that AEIR is more advantageous than current cross-modal retrieval methods, semi-supervised cross-modal retrieval methods and cross-modal transfer methods.

Future work can focus on the conduction of positive cross-modal transfer considering the domain discrepancy.

More information: Yang Yang et al, Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning, Frontiers of Computer Science (2023). DOI: 10.1007/s11704-023-3186-6

Provided by Higher Education Press Citation: Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning (2024, February 26) retrieved 26 February 2024 from https://techxplore.com/news/2024-02-alignment-efficient-image-sentence-modal.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

An approach for unsupervised domain adaptation based on an integrated autoencoder shares

Feedback to editors

Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning

By cryptoadmin

You Missed

ByteDance has reportedly suspended the global rollout of its new AI video generator

Spotify’s new Taste Profile feature lets users fine-tune their algorithm’s recommendations

Prediction markets price rising Trump impeachment risks amid Iran conflict. What it signals for 2026 economic uncertainty?

Trump administration will reportedly get $10 billion for brokering the TikTok deal

Categories

Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning

By cryptoadmin

Related Post

AI’s game-playing still has flaws: AlphaZero-style self-play tested on Nim

AI agent ‘lobster fever’ grips China despite risks

Report calls for AI toy safety standards to protect young children

You Missed

ByteDance has reportedly suspended the global rollout of its new AI video generator

Spotify’s new Taste Profile feature lets users fine-tune their algorithm’s recommendations

Prediction markets price rising Trump impeachment risks amid Iran conflict. What it signals for 2026 economic uncertainty?

Trump administration will reportedly get $10 billion for brokering the TikTok deal