AI image generators get a new safety test for hidden toxic text in memes

Generative AI models can be prompted with just a few words to insert offensive or discriminatory text messages into images. Aditya Kumar from the SPRINT-ML Lab at the CISPA Helmholtz Center for Information Security is investigating how such outputs can be reliably prevented. To address this, he developed ToxicBench, a test dataset that evaluates how well image-generating AI systems handle offensive inputs. He also created a fine-tuning strategy to adapt the models accordingly.