Abstract: This paper investigates how to build, systematically configure, and rigorously explains a transformer-based bug detection system. The central argument is that trustworthy explainability requires first establishing which model is worth explaining and in which configuration. We evaluate three approaches on the lrhammond/buggy-apps dataset (8,778 balanced samples): a TF-IDF baseline, DistilBERT, and GraphCodeBERT. DistilBERT exhibited mode collapse across all tested learning rates, confirming that general-purpose language model pretraining is insufficient for code defect detection — a prerequisite finding that motivates the choice of GraphCodeBERT for explainability analysis. A systematic ablation across three stride configurations (128, 256, 384 tokens) and three aggregation strategies (max, mean, majority vote) yields nine experimental conditions; stride 256 with mean aggregation is the optimal configuration (70.77% accuracy, 69.56% macro F1, p = 5.26 × 10⁻³³ vs TF-IDF, McNemar's test). Explainability analysis via attention rollout and integrated gradients across ten test samples reveals that integrated gradient signal strength is 11.8× stronger for correctly detected bugs than for misclassified samples — providing a gradient-based, quantitative explanation of model failure modes. Attention rollout and integrated gradients show near-zero cross-method correlation (mean r = −0.017), empirically confirming they are non-redundant and complementary methods.

Keywords: Explainable AI, Bug Detection, Transformer Models, GraphCodeBERT, Mode Collapse, Attention Rollout, Integrated Gradients, Sliding Window Ablation


Download: PDF | DOI: 10.17148/IMRJR.2026.030304

Cite:

[1] Debargha Ghosh, Eve Thullen Ph.D, Emmanuel Udoh Ph.D, "Quantified Explainability and Robustness Analysis of Transformer-Based Bug Detection Models," International Multidisciplinary Research Journal Reviews (IMRJR), 2026, DOI 10.17148/IMRJR.2026.030304