OpenAI Proxy Server
CLI Tool to create a LLM Proxy Server to translate openai api calls to any non-openai model (e.g. Huggingface, TogetherAI, Ollama, etc.) 100+ models Provider List.
Quick start
Call Huggingface models through your OpenAI proxy.
Start Proxy
$ pip install litellm
$ litellm --model huggingface/bigcode/starcoder
#INFO: Uvicorn running on http://0.0.0.0:8000
This will host a local proxy api at: http://0.0.0.0:8000
Test Proxy
Make a test ChatCompletion Request to your proxy
- litellm cli
- OpenAI
- curl
litellm --test http://0.0.0.0:8000
import openai
openai.api_base = "http://0.0.0.0:8000"
print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))
curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{
"role": "user",
"content": "what do you know?"
}
],
}'
Other supported models:
- Anthropic
- Huggingface
- TogetherAI
- Replicate
- Petals
- Palm
- Azure OpenAI
- AI21
- Cohere
$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1
$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1
$ export TOGETHERAI_API_KEY=my-api-key
$ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k
$ export REPLICATE_API_KEY=my-api-key
$ litellm \
--model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3
$ litellm --model petals/meta-llama/Llama-2-70b-chat-hf
$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison
$ export AZURE_API_KEY=my-api-key
$ export AZURE_API_BASE=my-api-base
$ export AZURE_API_VERSION=my-api-version
$ litellm --model azure/my-deployment-id
$ export AI21_API_KEY=my-api-key
$ litellm --model j2-light
$ export COHERE_API_KEY=my-api-key
$ litellm --model command-nightly
Deploy Proxy
Deploy the proxy to https://api.litellm.ai
$ export ANTHROPIC_API_KEY=sk-ant-api03-1..
$ litellm --model claude-instant-1 --deploy
#INFO: Uvicorn running on https://api.litellm.ai/44508ad4
This will host a ChatCompletions API at: https://api.litellm.ai/44508ad4
Other supported models:
- Anthropic
- TogetherAI
- Replicate
- Petals
- Palm
- Azure OpenAI
- AI21
- Cohere
$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1 --deploy
$ export TOGETHERAI_API_KEY=my-api-key
$ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k --deploy
$ export REPLICATE_API_KEY=my-api-key
$ litellm \
--model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3
--deploy
$ litellm --model petals/meta-llama/Llama-2-70b-chat-hf --deploy
$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison --deploy
$ export AZURE_API_KEY=my-api-key
$ export AZURE_API_BASE=my-api-base
$ export AZURE_API_VERSION=my-api-version
$ litellm --model azure/my-deployment-id --deploy
$ export AI21_API_KEY=my-api-key
$ litellm --model j2-light --deploy
$ export COHERE_API_KEY=my-api-key
$ litellm --model command-nightly --deploy
Test Deployed Proxy
Make a test ChatCompletion Request to your proxy
- litellm cli
- OpenAI
- curl
litellm --test https://api.litellm.ai/44508ad4
import openai
openai.api_base = "https://api.litellm.ai/44508ad4"
print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))
curl --location 'https://api.litellm.ai/44508ad4/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{
"role": "user",
"content": "what do you know?"
}
],
}'
Setting api base, temperature, max tokens
litellm --model huggingface/bigcode/starcoder \
--api_base https://my-endpoint.huggingface.cloud \
--max_tokens 250 \
--temperature 0.5
Ollama example
$ litellm --model ollama/llama2 --api_base http://localhost:11434
Tutorial - using HuggingFace LLMs with aider
Aider is an AI pair programming in your terminal.
But it only accepts OpenAI API Calls.
In this tutorial we'll use Aider with WizardCoder (hosted on HF Inference Endpoints).
[NOTE]: To learn how to deploy a model on Huggingface
Step 1: Install aider and litellm
$ pip install aider-chat litellm
Step 2: Spin up local proxy
Save your huggingface api key in your local environment (can also do this via .env)
$ export HUGGINGFACE_API_KEY=my-huggingface-api-key
Point your local proxy to your model endpoint
$ litellm \
--model huggingface/WizardLM/WizardCoder-Python-34B-V1.0 \
--api_base https://my-endpoint.huggingface.com
This will host a local proxy api at: http://0.0.0.0:8000
Step 3: Replace openai api base in Aider
Aider lets you set the openai api base. So lets point it to our proxy instead.
$ aider --openai-api-base http://0.0.0.0:8000
And that's it!