16 KiB
TGI Inference with Mistral-7b
In this tutorial we are going to use Huggingface's TGI (Text Generation Interface) to run an arbitrary LLM model and enable users to requests jobs form it, both on-chain and off-chain.
Install Pre-requisites
For this tutorial you'll need to have the following installed.
Setting up a TGI LLM Service
Included with this tutorial, is a containerized llm service. We're going to deploy this service on a powerful machine with access to GPU.
Rent a GPU machine
To run this service, you will need to have access to a machine with a powerful GPU. In the video above, we use an A100 instance on Paperspace.
Install docker
You will have to install docker.
For Ubuntu, you can run the following commands:
# install docker
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
As docker installation may vary depending on your operating system, consult the official documentation for more information.
After installation, you can verify that docker is installed by running:
# sudo docker run hello-world
Hello from Docker!
Ensure CUDA is installed
Depending on where you rent your GPU machine, CUDA is typically pre-installed. For Ubuntu, you can follow the instructions here.
You can verify that CUDA is installed by running:
# verify Installation
python -c '
import torch
print("torch.cuda.is_available()", torch.cuda.is_available())
print("torch.cuda.device_count()", torch.cuda.device_count())
print("torch.cuda.current_device()", torch.cuda.current_device())
print("torch.cuda.get_device_name(0)", torch.cuda.get_device_name(0))
'
If CUDA is installed and available, your output will look similar to the following:
torch.cuda.is_available() True
torch.cuda.device_count() 1
torch.cuda.current_device() 0
torch.cuda.get_device_name(0) Tesla V100-SXM2-16GB
Ensure nvidia-container-runtime
is installed
For your container to be able to access the GPU, you will need to install the nvidia-container-runtime
.
On Ubuntu, you can run the following commands:
# Docker GPU support
# nvidia container-runtime repos
# https://nvidia.github.io/nvidia-container-runtime/
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
# install nvidia-container-runtime
# https://docs.docker.com/config/containers/resource_constraints/#gpu
sudo apt-get install -y nvidia-container-runtime
As always, consult the official documentation for more information.
You can verify that nvidia-container-runtime
is installed by running:
which nvidia-container-runtime-hook
# this should return a path to the nvidia-container-runtime-hook
Now, with the pre-requisites installed, we can move on to setting up the TGI service.
Clone this repository
# Clone locally
git clone --recurse-submodules https://github.com/ritual-net/infernet-container-starter
# Navigate to the repository
cd infernet-container-starter
Run the Stable Diffusion service
make run-service project=tgi-llm service=tgi
This will start the tgi
service. Note that this service will have to download a large model file,
so it may take a few minutes to be fully ready. Downloaded model will get cached, so subsequent runs will be faster.
Testing the tgi-llm
service via the gradio UI
Included with this project is a simple gradio chat UI that allows you to interact with the tgi-llm
service. This is
not needed for running the Infernet node, but a nice way to debug and test the TGI service.
Ensure docker
& foundry
exist
To check for docker
, run the following command in your terminal:
docker --version
# Docker version 25.0.2, build 29cf629 (example output)
You'll also need to ensure that docker-compose exists in your terminal:
which docker-compose
# /usr/local/bin/docker-compose (example output)
To check for foundry
, run the following command in your terminal:
forge --version
# forge 0.2.0 (551bcb5 2024-02-28T07:40:42.782478000Z) (example output)
Clone the starter repository
Just like our other examples, we're going to clone this repository. All of the code and instructions for this tutorial
can be found in the projects/tgi-llm
directory of the repository.
# Clone locally
git clone --recurse-submodules https://github.com/ritual-net/infernet-container-starter
# Navigate to the repository
cd infernet-container-starter
Configure the UI Service
You'll need to configure the UI service to point to the tgi
service. To do this, you'll have to
pass that info as environemnt variables. There exists a gradio_ui.env.sample
file in the projects/tgi-llm/ui
directory. Simply copy this file to gradio_ui.env
and set the TGI_SERVICE_URL
to the address of the tgi
service.
cd projects/tgi-llm/ui
cp gradio_ui.env.sample gradio_ui.env
Then modify the content of gradio_ui.env
to look like this:
TGI_SERVICE_URL={your_service_ip}:{your_service_port} # <- replace with your service ip & port
HF_API_TOKEN={huggingface_api_token} # <- replace with your huggingface api token
PROMPT_FILE_PATH=./prompt.txt # <- path to the prompt file
The env vars are as follows:
TGI_SERVICE_URL
is the address of thetgi
serviceHF_API_TOKEN
is the Huggingface API token. You can get one by signing up at HuggingfacePROMPT_FILE_PATH
is the path to the system prompt file. By default it is set to./prompt.txt
. A simpleprompt.txt
file is included in theui
directory.
Build the UI service
From the top-level directory of the repository, simply run the following command to build the UI service:
# cd back to the top-level directory
cd ../../..
# build the UI service
make build-service project=tgi-llm service=ui
Run the UI service
In the same directory, you can also run the following command to run the UI service:
make run-service project=tgi-llm service=ui
By default the service will run on http://localhost:3001
. You can navigate to this address in your browser to see
the UI.
Chat with the TGI service!
Congratulations! You can now chat with the TGI service using the gradio UI. You can enter a prompt and see the response from the TGI service.
Now that we've tested the TGI service, we can move on to setting up the Infernet Node and the tgi-llm
container.
Setting up the Infernet Node along with the tgi-llm
container
You can follow the following steps on your local machine to setup the Infernet Node and the tgi-llm
container.
The first couple of steps are identical to that of the previous section. So if you've already completed those steps, you can skip to building the tgi-llm container.
Ensure docker
& foundry
exist
To check for docker
, run the following command in your terminal:
docker --version
# Docker version 25.0.2, build 29cf629 (example output)
You'll also need to ensure that docker-compose exists in your terminal:
which docker-compose
# /usr/local/bin/docker-compose (example output)
To check for foundry
, run the following command in your terminal:
forge --version
# forge 0.2.0 (551bcb5 2024-02-28T07:40:42.782478000Z) (example output)
Clone the starter repository
Just like our other examples, we're going to clone this repository.
All of the code and instructions for this tutorial can be found in the
projects/tgi-llm
directory of the repository.
# Clone locally
git clone --recurse-submodules https://github.com/ritual-net/infernet-container-starter
# Navigate to the repository
cd infernet-container-starter
Configure the tgi-llm
container
Configure the URL for the TGI Service
The tgi-llm
container needs to know where to find the TGI service that we started in the steps above. To do this,
we need to modify the configuration file for the tgi-llm
container. We have a sample config.json file.
Simply navigate to the projects/tgi-llm
directory and set up the config file:
cd projects/tgi-llm/container
cp config.sample.json config.json
In the containers
field, you will see the following:
"containers": [
{
// etc. etc.
"env": {
"TGI_SERVICE_URL": "http://{your_service_ip}:{your_service_port}" // <- replace with your service ip & port
}
}
},
Build the tgi-llm
container
Simply run the following command to build the tgi-llm
container:
make build-container project=tgi-llm
Deploy the tgi-llm
container with Infernet
You can run a simple command to deploy the tgi-llm
container along with bootstrapping the rest of the
Infernet node stack in one go:
make deploy-container project=tgi-llm
Check the running containers
At this point it makes sense to check the running containers to ensure everything is running as expected.
# > docker container ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0dbc30f67e1e ritualnetwork/example-tgi-llm-infernet:latest "hypercorn app:creat…" 8 seconds ago Up 7 seconds
0.0.0.0:3000->3000/tcp tgi-llm
0c5140e0f41b ritualnetwork/infernet-anvil:0.0.0 "anvil --host 0.0.0.…" 23 hours ago Up 23 hours
0.0.0.0:8545->3000/tcp anvil-node
f5682ec2ad31 ritualnetwork/infernet-node:latest "/app/entrypoint.sh" 23 hours ago Up 9 seconds
0.0.0.0:4000->4000/tcp deploy-node-1
c1ece27ba112 fluent/fluent-bit:latest "/fluent-bit/bin/flu…" 23 hours ago Up 10 seconds 2020/tcp,
0.0.0.0:24224->24224/tcp, :::24224->24224/tcp deploy-fluentbit-1
3cccea24a303 redis:latest "docker-entrypoint.s…" 23 hours ago Up 10 seconds 0.0.0.0:6379->6379/tcp,
:::6379->6379/tcp deploy-redis-1
You should see five different images running, including the Infernet node and the tgi-llm
container.
Send a job request to the tgi-llm
container
From here, we can make a Web-2 job request to the container by posting a request to the api/jobs
endpoint.
curl -X POST http://127.0.0.1:4000/api/jobs \
-H "Content-Type: application/json" \
-d '{"containers": ["tgi-llm"], "data": {"prompt": "Can shrimp actually fry rice fr?"}}'
# {"id":"7a375a56-0da0-40d8-91e0-6440b3282ed8"}
You will get a job id in response. You can use this id to check the status of the job.
Check the status of the job
You can make a GET
request to the api/jobs
endpoint to check the status of the job.
curl -X GET "http://127.0.0.1:4000/api/jobs?id=7a375a56-0da0-40d8-91e0-6440b3282ed8"
# [{"id":"7a375a56-0da0-40d8-91e0-6440b3282ed8","result":{"container":"tgi-llm","output":{"data":"\n\n## Can you fry rice in a wok?\n\nThe wok is the"}},"status":"success"}]
Congratulations! You have successfully setup the Infernet Node and the tgi-llm
container. Now let's move on to
calling our service from a smart contract (a la web3 request).
Calling our service from a smart contract
In the following steps, we will deploy our consumer contract and make a subscription request by calling the contract.
Setup
Ensure that you have followed the steps in the previous section up until here to setup
the Infernet Node and the tgi-llm
container.
Notice that in the step above we have an Anvil node running on port 8545
.
By default, the anvil-node
image used deploys the
Infernet SDK and other relevant contracts for you:
- Coordinator:
0x663F3ad617193148711d28f5334eE4Ed07016602
- Primary node:
0x70997970C51812dc3A010C7d01b50e0d17dc79C8
Deploy our Prompter
smart contract
In this step, we will deploy our Prompter.sol
to the Anvil node. This contract simply allows us to submit a prompt to the LLM, and receives the result of the
prompt and prints it to the anvil console.
Anvil logs
During this process, it is useful to look at the logs of the Anvil node to see what's going on. To follow the logs, in a new terminal, run:
docker logs -f anvil-node
Deploying the contract
Once ready, to deploy the Prompter
consumer contract, in another terminal, run:
make deploy-contracts project=tgi-llm
You should expect to see similar Anvil logs:
# > make deploy-contracts project=tgi-llm
eth_getTransactionReceipt
Transaction: 0x17a9d17cc515d39eef26b6a9427e04ed6f7ce6572d9756c07305c2df78d93ffe
Contract created: 0x13D69Cf7d6CE4218F646B759Dcf334D82c023d8e
Gas used: 731312
Block Number: 1
Block Hash: 0xd17b344af15fc32cd3359e6f2c2724a8d0a0283fc3b44febba78fc99f2f00189
Block Time: "Wed, 6 Mar 2024 18:21:01 +0000"
eth_getTransactionByHash
From our logs, we can see that the Prompter
contract has been deployed to address
0x13D69Cf7d6CE4218F646B759Dcf334D82c023d8e
.
Call the contract
Now, let's call the contract to with a prompt! In the same terminal, run:
make call-contract project=tgi-llm prompt="What is 2 * 3?"
You should first expect to see an initiation transaction sent to the Prompter
contract:
eth_getTransactionReceipt
Transaction: 0x988b1b251f3b6ad887929a58429291891d026f11392fb9743e9a90f78c7a0801
Gas used: 190922
Block Number: 2
Block Hash: 0x51f3abf62e763f1bd1b0d245a4eab4ced4b18f58bd13645dbbf3a878f1964044
Block Time: "Wed, 6 Mar 2024 18:21:34 +0000"
eth_getTransactionByHash
eth_getTransactionReceipt
Shortly after that you should see another transaction submitted from the Infernet Node which is the result of your on-chain subscription and its associated job request:
eth_sendRawTransaction
_____ _____ _______ _ _ _
| __ \|_ _|__ __| | | | /\ | |
| |__) | | | | | | | | | / \ | |
| _ / | | | | | | | |/ /\ \ | |
| | \ \ _| |_ | | | |__| / ____ \| |____
|_| \_\_____| |_| \____/_/ \_\______|
subscription Id 1
interval 1
redundancy 1
node 0x70997970C51812dc3A010C7d01b50e0d17dc79C8
output:
2 * 3 = 6
Transaction: 0xdaaf559c2baba212ab218fb268906613ce3be93ba79b37f902ff28c8fe9a1e1a
Gas used: 116153
Block Number: 3
Block Hash: 0x2f26b2b487a4195ff81865b2966eab1508d10642bf525a258200eea432522e24
Block Time: "Wed, 6 Mar 2024 18:21:35 +0000"
eth_blockNumber
We can now confirm that the address of the Infernet Node (see the logged node
parameter in the Anvil logs above)
matches the address of the node we setup by default for our Infernet Node.
Congratulations! 🎉 You have successfully enabled a contract to have access to a TGI LLM service.