agentchat.contrib.captainagent.tools.information_retrieval.image_qa
image_qa
@with_requirements(["transformers", "torch"],
["transformers", "torch", "PIL", "os"])
def image_qa(image, question, ckpt="Salesforce/blip-vqa-base")
Perform question answering on an image using a pre-trained VQA model.
Arguments:
image
Union[str, Image.Image] - The image to perform question answering on. It can be either file path to the image or a PIL Image object.question
- The question to ask about the image.
Returns:
dict
- The generated answer text.