Identifying Accessibility Data Gaps in CodeGen Models
October 21, 2025

Rather than relying on anecdotal evidence or cherry-picked examples, I built a systematic approach to evaluate how well LLMs — starting with GPT-4 — generate accessible HTML. The methodology is straightforward but comprehensive: I created a Python testing framework that sent carefully crafted prompts to Azure OpenAI’s GPT 4 model, collected the generated HTML responses, and then manually analyzed these responses for accessibility compliance.
Source: Identifying Accessibility Data Gaps in CodeGen Models :: Aaron Gustafson