BERT HuggingFace Model Deployment using Kubernetes [ Github Repo]

Author(s): Vaibhawkhemka Originally published on Towards AI. Source: Image by Author Github Repo : https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/MLops/Model_Deployment/Bert_Kubernetes_deployment Source: Image by Author Motivation: Model development is useless if you don’t deploy it to production, which comes with a lot of issues of scalability and portability. I have deployed a basic BERT model from the huggingface transformer on Kubernetes with the help of docker, which will give a feel of how to deploy and manage pods on production. Model Serving and Deployment: ML Pipeline: Workflow: Model server (using FastAPI, uvicorn) for BERT uncased model → Containerize model and inference scripts to create a docker image → Kubernetes deployment for these model servers (for scalability) → Testing Components: Model server Used BERT uncased model from hugging face for prediction of next word [MASK]. Inference is done using transformer-cli which uses fastapi and uvicorn to serve the model endpoints Source: Image by Author Server streaming: Source: Image by Author Testing: (fastapi docs) http://localhost:8888/docs/ Source: Image by Author Source: Image by Author { “output”: [ { “score”: 0.21721847355365753, “token”: 2204, “token_str”: “good”, “sequence”: “today is a good day” }, { “score”: 0.16623663902282715, “token”: 2047, “token_str”: “new”, “sequence”: “today is a new day” }, { “score”: 0.07342924177646637, “token”: 2307, “token_str”: “great”, “sequence”: “today is a great day” }, { “score”: 0.0656224861741066, “token”: 2502, “token_str”: “big”, “sequence”: “today is a big day” }, { “score”: 0.03518620505928993, “token”: 3376, “token_str”: “beautiful”, “sequence”: “today is a beautiful day” } ] Containerization Created a docker image from huggingface GPU base image and pushed to dockerhub after testing. Source: Image by Author Testing on docker container: Source: Image by Author You can directly pull the image vaibhaw06/bert-kubernetes:latest Source: Image by Author K8s deployment Used minikube and kubectl commands to create a single pod container for serving the model by configuring deployment and service config deployment.yaml apiVersion: apps/v1kind: Deploymentmetadata: name: bert-deployment labels: app: bertappspec: replicas: 1 selector: matchLabels: app: bertapp template: metadata: labels: app: bertapp spec: containers: - name: bertapp image: vaibhaw06/bert-kubernetes ports: - containerPort: 8080 ---apiVersion: v1kind: Servicemetadata: name: bert-servicespec: type: NodePort selector: app: bertapp ports: - protocol: TCP port: 8080 targetPort: 8080 nodePort: 30100 Setting up minikube and running pods using kubectl and deployment.yaml minikube startkubectl apply -f deployment.yaml Final Testing: kubectl get all Source: Image by Author It took around 15 mins to pull and create container pods. kubectl image list Source: Image by Author kubectl get svc Source: Image by Author minikube service bert-service Source: Image by Author Source: Image by Author After running the last command minikube service bert-service, you can verify the resulting deployment on the web endpoint. Find the GitHub Link: https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/MLops/Model_Deployment/Bert_Kubernetes_deployment If you have any questions, ping me on my LinkedIn: https://www.linkedin.com/in/vaibhaw-khemka-a92156176/ Follow ML Umbrella for more such detailed, actionable projects. Future Extension: Scaling with pod replicas and load balancer – Self-healing Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI

Latest Images

Trending Articles

Latest Images