McKelvey School of Engineering Graduate Student Theses & Dissertations

Improving Semantic Precision in Text-to-Image Diffusion Models via Latent-Space Optimization and Semantically-Parsed Evaluation

Mohammad Rouie Miab, Washington University in St. LouisFollow

Abstract

Text-to-image diffusion models can produce visually impressive images from natural-language prompts, but they often fail to satisfy the detailed semantic constraints expressed in compositional prompts. Typical failure modes include omitted objects, merged entities, incorrect quantities, incorrect attribute binding, and leakage of one entity's attributes onto another. This thesis studies the problem of semantic precision in text-to-image generation: how faithfully a generated image satisfies the structured meaning of its prompt. The thesis makes two linked contributions. First, it presents a training-free inference-time refinement method for diffusion-based image generation. The method operates directly in latent space during denoising and uses noun-phrase-aware cross-attention objectives to improve object presence, attribute binding, and spatial separation between semantically distinct prompt entities. The approach augments an Attend-and-Excite style attention-activation objective with additional losses for attribute alignment, noun-phrase separation, and centroid separation, without retraining the underlying diffusion model. Second, the thesis introduces a structured evaluation framework for prompt-image semantic alignment. Rather than assigning a single global similarity score to the whole prompt-image pair, the framework decomposes the prompt into explicit semantic constraints, grounds relevant image regions, and verifies each constraint through targeted yes/no visual question answering. The resulting scores are aggregated into interpretable category-level and overall measures covering entity presence, attribute correctness, relation correctness, and quantity satisfaction. Experiments on a controlled compositional prompt set show that inference-time latent refinement improves semantic alignment over plain Stable Diffusion and simpler attention-guidance baselines. The proposed evaluation pipeline also exposes failure modes that are frequently hidden by global prompt-image similarity metrics, yielding a more fine-grained and diagnostically useful picture of semantic correctness.

Committee Chair

Nathan Jacobs

Committee Members

Tao Ju, Ilan Goodman

Degree

Master of Science (MS)

Author's Department

Computer Science & Engineering

Author's School

McKelvey School of Engineering

Document Type

Thesis

Date of Award

Spring 5-6-2026

Language

English (en)

DOI

https://doi.org/10.7936/56dw-fa51

Author's ORCID

https://orcid.org/0009-0005-5053-2499

Recommended Citation

Rouie Miab, Mohammad, "Improving Semantic Precision in Text-to-Image Diffusion Models via Latent-Space Optimization and Semantically-Parsed Evaluation" (2026). McKelvey School of Engineering Graduate Student Theses & Dissertations. 1351.

The definitive version is available at https://doi.org/10.7936/56dw-fa51

Download

Included in

Artificial Intelligence and Robotics Commons, Computational Engineering Commons, Data Science Commons, Graphics and Human Computer Interfaces Commons, Other Computer Sciences Commons

COinS

DOI

https://doi.org/10.7936/56dw-fa51

McKelvey School of Engineering Graduate Student Theses & Dissertations

Improving Semantic Precision in Text-to-Image Diffusion Models via Latent-Space Optimization and Semantically-Parsed Evaluation

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Author's ORCID

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Graduate Student Theses & Dissertations

Improving Semantic Precision in Text-to-Image Diffusion Models via Latent-Space Optimization and Semantically-Parsed Evaluation

Author

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Author's ORCID

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner