Context-Aware Image Captioning with Scene Graphs
End-to-end image captioning pipeline conditioned on structured scene graph triples. BLEU-4 +11.4% over an image-only baseline.
End-to-end image captioning pipeline conditioned on structured scene graph triples. BLEU-4 +11.4% over an image-only baseline.
Fine-tuning a 3B instruct model with QLoRA to produce strict JSON from free-form text — on a single consumer GPU. Exact-match accuracy 0.258 → 0.624.