LiteLLM with WatsonX
LiteLLM is an open-source, locally run proxy server providing an OpenAI-compatible API. It supports various LLM providers, including IBM’s WatsonX, enabling seamless integration with tools like AG2.
Running LiteLLM with WatsonX requires the following installations:
- AG2 – A framework for building and orchestrating AI agents.
- LiteLLM – An OpenAI-compatible proxy for bridging non-compliant APIs.
- IBM WatsonX – LLM service requiring specific session token authentication.
Prerequisites
Before setting up, ensure Docker is installed. Refer to the Docker installation guide. Optionally, consider using Postman to easily test API requests.
Installing WatsonX
To set up WatsonX, follow these steps: 1. Access WatsonX: - Sign up for WatsonX.ai. - Create an API_KEY and PROJECT_ID.
- Validate WatsonX API Access:
- Verify access using the following commands:
Tip: Verify access to watsonX APIs before installing LiteLLM.
Get Session Token:
curl -L "https://iam.cloud.ibm.com/identity/token?=null"
-H "Content-Type: application/x-www-form-urlencoded"
-d "grant_type=urn%3Aibm%3Aparams%3Aoauth%3Agrant-type%3Aapikey"
-d "apikey=<API_KEY>"
Get list of LLMs:
curl -L "https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2024-09-16&project_id=1eeb4112-5f6e-4a81-9b61-8eac7f9653b4&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200"
-H "Authorization: Bearer <SESSION TOKEN>"
Ask the LLM a question:
curl -L "https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2023-05-02"
-H "Content-Type: application/json"
-H "Accept: application/json"
-H "Authorization: Bearer <SESSION TOKEN>" \
-d "{
\"model_id\": \"google/flan-t5-xxl\",
\"input\": \"What is the capital of Arkansas?:\",
\"parameters\": {
\"max_new_tokens\": 100,
\"time_limit\": 1000
},
\"project_id\": \"<PROJECT_ID>"}"
With access to watsonX API’s validated you can install the python library from https://ibm.github.io/watsonx-ai-python-sdk/install.html
Installing LiteLLM
To install LiteLLM, follow these steps: 1. Download LiteLLM Docker Image:
docker pull ghcr.io/berriai/litellm:main-latest
OR
Install LiteLLM Python Library:
pip install 'litellm[proxy]'
-
Create a LiteLLM Configuration File:
-
Save as
litellm_config.yaml
in a local directory. -
Example content for WatsonX:
model_list:
- model_name: llama-3-8b
litellm_params:
# all params accepted by litellm.completion()
model: watsonx/meta-llama/llama-3-8b-instruct
api_key: "os.environ/WATSONX_API_KEY"
project_id: "os.environ/WX_PROJECT_ID"- model_name: "llama_3_2_90"
litellm_params:
model: watsonx/meta-llama/llama-3-2-90b-vision-instruct
api_key: os.environ["WATSONX_APIKEY"] = "" # IBM cloud API key
max_new_tokens: 4000
-
-
Start LiteLLM Container:
docker run -v <Directory>\litellm_config.yaml:/app/config.yaml -e WATSONX_API_KEY=<API_KEY> -e WATSONX_URL=https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2023-05-02 -e WX_PROJECT_ID=<PROJECT_ID> -p 4000:4000 ghcr.io/berriai/litellm:main-latest --config /app/config.yaml --detailed_debug
Installing AG2
AG2 simplifies orchestration and communication between agents. To install:
-
Open a terminal with administrator rights.
-
Run the following command:
pip install ag2
Once installed, AG2 agents can leverage WatsonX APIs via LiteLLM.
phi1 = {
"config_list": [
{
"model": "llama-3-8b",
"base_url": "http://localhost:4000", #use http://0.0.0.0:4000 for Macs
"api_key":"watsonx",
"price" : [0,0]
},
],
"cache_seed": None, # Disable caching.
}
phi2 = {
"config_list": [
{
"model": "llama-3-8b",
"base_url": "http://localhost:4000", #use http://0.0.0.0:4000 for Macs
"api_key":"watsonx",
"price" : [0,0]
},
],
"cache_seed": None, # Disable caching.
}
from AG2 import ConversableAgent, AssistantAgent
jack = ConversableAgent(
"Jack (Phi-2)",
llm_config=phi2,
system_message="Your name is Jack and you are a comedian in a two-person comedy show.",
)
emma = ConversableAgent(
"Emma (Gemma)",
llm_config=phi1,
system_message="Your name is Emma and you are a comedian in two-person comedy show.",
)
#AG2
chat_result = jack.initiate_chat(emma, message="Emma, tell me a joke.", max_turns=2)