Deploying Models as RESTful APIs using Kubeflow Pipelines and KFServing: A Step-by-Step Tutorial

Deploying Models as RESTful APIs using Kubeflow Pipelines and KFServing: A Step-by-Step Tutorial

Deploying machine learning models as RESTful APIs allows for easy integration with other applications and services. Kubeflow Pipelines provides a platform for building and deploying machine learning pipelines, while KFServing is an open-source project that simplifies the deployment of machine learning models as serverless inference services on Kubernetes. In this tutorial, we will explore how to deploy models as RESTful APIs using Kubeflow Pipelines and KFServing.


Before we begin, make sure you have the following installed and set up:

  • Kubeflow Pipelines
  • KFServing
  • Kubernetes cluster
  • Python 3.x
  • Docker

Building the Model and Pipeline

First, we need to build the machine learning model and create a pipeline to train and deploy it. For this tutorial, we will use a simple example of training and deploying a sentiment analysis model using the IMDb movie reviews dataset. We will use TensorFlow and Keras for model training.

# Import libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load the IMDb movie reviews dataset
imdb =
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
# Preprocess the data
train_data = keras.preprocessing.sequence.pad_sequences(train_data, value=0, padding='post', maxlen=250)
test_data = keras.preprocessing.sequence.pad_sequences(test_data, value=0, padding='post', maxlen=250)
# Build the model
model = keras.Sequential([
    layers.Embedding(10000, 16),
    layers.Dense(16, activation='relu'),
    layers.Dense(1, activation='sigmoid')
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Train the model, train_labels, epochs=10, batch_size=32, validation_data=(test_data, test_labels))
# Save the model'model.h5')

Defining the Deployment Pipeline

Next, we need to define the deployment pipeline using Kubeflow Pipelines. This pipeline will use KFServing to deploy the trained model as a RESTful API.

import kfp
from kfp import dsl
from kubernetes.client import V1EnvVar

@dsl.pipeline(name='Sentiment Analysis Deployment', description='Deploy the sentiment analysis model as a RESTful API')
def sentiment_analysis_pipeline(model_dir: str, api_name: str, namespace: str):
    kfserving_op = kfp.components.load_component_from_file('kfserving_component.yaml')
    # Define the deployment task
    deployment_task = kfserving_op(
            V1EnvVar(name='MODEL_NAME', value=api_name),
            V1EnvVar(name='NAMESPACE', value=namespace)
if __name__ == '__main__':
    kfp.compiler.Compiler().compile(sentiment_analysis_pipeline, 'sentiment_analysis_pipeline.tar.gz')

The pipeline definition includes a deployment task that uses the KFServing component to apply the model deployment. It specifies the model directory, API name, and Kubernetes namespace for the deployment.

Deploying the Model as a RESTful API

To deploy the model as a RESTful API, follow these steps:

Build a Docker image for the model:

docker build -t sentiment-analysis-model:latest .

Push the Docker image to a container registry:

docker push <registry>/<namespace>/sentiment-analysis-model:latest

Create a YAML file for the KFServing configuration, e.g., kfserving.yaml:

kind: InferenceService
  name: sentiment-analysis
        storageUri: <registry>/<namespace>/sentiment-analysis-model:latest

Deploy the model as a RESTful API using KFServing:

kubectl apply -f kfserving.yaml

Access the RESTful API:

kubectl get inferenceservice sentiment-analysis

# Get the service URL
kubectl get inferenceservice sentiment-analysis -o jsonpath='{.status.url}'

With the model deployed as a RESTful API, you can now make predictions by sending HTTP requests to the service URL.

In this tutorial, we have explored how to deploy machine learning models as RESTful APIs using Kubeflow Pipelines and KFServing. We built a sentiment analysis model, defined a deployment pipeline using Kubeflow Pipelines, and used KFServing to deploy the model as a RESTful API on a Kubernetes cluster. This approach allows for easy integration of machine learning models into applications and services, enabling real-time predictions and inference.

By combining Kubeflow Pipelines and KFServing, you can streamline the process of training and deploying machine learning models as scalable and reliable RESTful APIs on Kubernetes. This enables efficient model management, deployment, and serving in production environments.