Skip to main content

5 docs tagged with "multimodal"

Agent Chat with Multimodal Models: DALLE and GPT-4V

Multimodal agent chat with DALL-E and GPT-4v.

Agent Chat with Multimodal Models: LLaVA

Leveraging multimodal models like llava.

Engaging with Multimodal Models: GPT-4V in AutoGen

Leveraging multimodal models through two different methodologies: MultimodalConversableAgent and VisionCapability.

Generate Dalle Images With Conversable Agents

Generate images with conversable agents.

Translating Video audio using Whisper and GPT-3.5-turbo

Use tools to extract and translate the transcript of a video file.