Deploy a Hugging Face model with FastAPI

Goal

By the end of this tutorial you will have:

Wrapped a Hugging Face model in a FastAPI server
Received predictions via HTTP POST
Run the server in the background and accessed it from outside

Step 1: Prepare the environment

pip install fastapi uvicorn transformers torch accelerate

Step 2: API server code

# server.py
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline
import torch

app = FastAPI()

# Initialize the model once at startup
device = 0 if torch.cuda.is_available() else -1
classifier = pipeline(
    "text-classification",
    model="snunlp/KR-FinBert-SC",
    device=device
)

class TextRequest(BaseModel):
    text: str

class PredictionResponse(BaseModel):
    label: str
    score: float

@app.post("/predict", response_model=PredictionResponse)
def predict(req: TextRequest):
    result = classifier(req.text)[0]
    return PredictionResponse(label=result["label"], score=result["score"])

@app.get("/health")
def health():
    return {"status": "ok"}

Step 3: Run the server

# Foreground (for testing)
uvicorn server:app --host 0.0.0.0 --port 8000

# Background
nohup uvicorn server:app --host 0.0.0.0 --port 8000 >> server.log 2>&1 &

Step 4: Firewall and test

In the virtual network's Firewall rules on its detail page, add a rule allowing TCP 8000 (see Firewall for details). Changes take effect within one minute.

# Health check
curl http://<PUBLIC_IP>:8000/health

# Inference request
curl -X POST http://<PUBLIC_IP>:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "The stock price jumped sharply today."}'

Python client:

import requests

response = requests.post(
    "http://<PUBLIC_IP>:8000/predict",
    json={"text": "The stock price jumped sharply today."}
)
print(response.json())
# {"label": "positive", "score": 0.98}

Auto-start (systemd)

To have the server start automatically on reboot:

USER_NAME=$(whoami)
sudo tee /etc/systemd/system/ml-api.service >/dev/null <<EOF
[Unit]
Description=ML API Server
After=network.target

[Service]
User=${USER_NAME}
WorkingDirectory=/home/${USER_NAME}
ExecStart=/usr/bin/python3 -m uvicorn server:app --host 0.0.0.0 --port 8000
Restart=always

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable --now ml-api

Next steps

Hugging Face model test: verify behavior across different models
Firewall configuration: open the API port externally

Goal​

Step 1: Prepare the environment​

Step 2: API server code​

Step 3: Run the server​

Step 4: Firewall and test​

Auto-start (systemd)​

Next steps​