In Brief:
- Introducing the Multimodal Conversable Agent and the LLaVA Agent to enhance LMM functionalities.
- Users can input text and images simultaneously using the
<img img_path>
tag to specify image loading. - Demonstrated through the GPT-4V notebook.
- Demonstrated through the LLaVA notebook.