
Technical Stuff

AutoML: Automated Machine Learning in Python

AutoML: Automated Machine Learning in Python

AutoML (Automated Machine Learning) is a branch of machine learning that uses artificial intelligence and machine learning techniques to automate the entire machine learning process. AutoML automates tasks such as data preparation, feature engineering, algorithm selection, hyperparameter tuning, and model evaluation. AutoML enables non-experts to build and deploy machine learning models with minimal effort and technical knowledge.

Automated Machine Learning in Python

Python is a popular language for machine learning, and several libraries support AutoML. In this tutorial, we will use the H2O library to perform AutoML in Python.

Install Library

We will start by installing the H2O library.

pip install h2o

Import Libraries

Next, we will import the necessary libraries, including H2O for AutoML, and NumPy and Pandas for data processing.

import numpy as np
import pandas as pd
import h2o
from h2o.automl import H2OAutoML

Load Data

Next, we will load the data to train the AutoML model

# Load data
url = ""
data = pd.read_csv(url, header=None, names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])

# Convert data to H2O format
h2o_data = h2o.H2OFrame(data)

In this example, we load the Iris dataset from a URL and convert it to the H2O format.

Train AutoML Model

Next, we will train an AutoML model on the data.

# Train AutoML model
aml = H2OAutoML(max_models=10, seed=1)
aml.train(x=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], y='class', training_frame=h2o_data)

In this example, we train an AutoML model with a maximum of 10 models and a random seed of 1.

View Model Leaderboard

Next, we can view the leaderboard of the trained models.

# View model leaderboard
lb = aml.leaderboard

In this example, we print the leaderboard of the trained models.

Test AutoML Model

Finally, we can use the trained AutoML model to make predictions on new data.

# Test AutoML model
test_data = pd.DataFrame(np.array([[5.1, 3.5, 1.4, 0.2], [7.7, 3.0, 6.1, 2.3]]), columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])
h2o_test_data = h2o.H2OFrame(test_data)
preds = aml.predict(h2o_test_data)

In this example, we use the trained AutoML model to predict the class of two new data points.

In this tutorial, we covered the basics of AutoML and how to use it in Python to automate the entire machine learning process. AutoML enables non-experts to build and deploy machine learning models with minimal effort and technical knowledge. I hope you found this tutorial useful in understanding AutoML in Python.

Defending Your Web Application: Understanding and Preventing SQL Injection Attacks

Defending Your Web Application: Understanding and Preventing SQL Injection Attacks

SQL injection attacks are one of the most common types of web application attacks that can compromise the security of your website or application. These attacks can be used to gain unauthorized access to sensitive data, modify data, or execute malicious code. In this tutorial, we will explain what SQL injection attacks are, how they work, and how you can prevent them.

What is SQL Injection?

SQL injection is a type of attack where an attacker exploits a vulnerability in a web application’s input validation and uses it to inject malicious SQL code into the application’s database. This malicious SQL code can be used to manipulate or extract data from the database, or even execute arbitrary code on the server.

How does SQL Injection work?

SQL injection attacks work by taking advantage of input validation vulnerabilities in web applications. In most web applications, user input is used to build SQL queries that are executed on the server-side. If this input is not properly validated, an attacker can manipulate the input to include their own SQL code.

For example, consider a login form that asks the user for their username and password. If the application uses the following SQL query to validate the user’s credentials:

An attacker could use a SQL injection attack by entering the following as the password:

This would result in the following SQL query being executed on the server:

The -- at the end of the password input is used to comment out the rest of the query, so the attacker can avoid syntax errors. In this case, the attacker has successfully bypassed the login form and gained access to the application.

Preventing SQL Injection Attacks

There are several ways to prevent SQL injection attacks. Here are some best practices:

Use Parameterized Queries: Parameterized queries are a type of prepared statement that allows you to separate the SQL code from the user input. This means that the input is treated as a parameter, and not as part of the SQL query. This approach can help prevent SQL injection attacks by ensuring that the user input is not executed as SQL code. Here’s an example of a parameterized query in Python using the sqlite3 module:

Validate User Input: User input should always be validated to ensure that it matches the expected format and does not contain malicious code. Regular expressions can be used to validate input for specific formats (e.g. email addresses or phone numbers). You should also sanitize user input by removing any special characters that could be used to inject malicious SQL code.

Use Stored Procedures: Stored procedures are precompiled SQL statements that can be called from within the application. This approach can help prevent SQL injection attacks by ensuring that the user input is not executed as SQL code. However, it’s important to ensure that the stored procedures themselves are secure and cannot be manipulated by an attacker.

Use an ORM: Object-relational mapping (ORM) frameworks like SQLAlchemy can help prevent SQL injection attacks by abstracting the SQL code away from the application code. The ORM handles the construction and execution of SQL queries based on the application’s object model, which can help prevent SQL injection attacks.

SQL injection attacks can have serious consequences for web applications and their users. By following the best practices outlined in this tutorial, you can help prevent SQL injection attacks and ensure the security of your application’s database. Remember to always validate user input, use parameterized queries, and consider using an ORM or stored procedures to help prevent SQL injection attacks.

Python Code Example

Here’s a Python code example that demonstrates a simple SQL injection attack and how to prevent it using parameterized queries:

In this example, we first prompt the user for their username and password. We then create a vulnerable SQL query that concatenates the user input into the SQL string. We also create a malicious input that will allow the attacker to bypass the login form. We execute both the vulnerable and malicious queries and print the results.

Finally, we prevent SQL injection by using a parameterized query. We pass the user input as parameters to the query using a tuple, which allows the input to be properly sanitized and prevents the attacker from injecting malicious SQL code.

By following best practices like parameterized queries and input validation, you can prevent SQL injection attacks and protect your web application’s database.

Bayesian Machine Learning: Probabilistic Models and Inference in Python

Bayesian Machine Learning: Probabilistic Models and Inference in Python

Bayesian Machine Learning is a branch of machine learning that incorporates probability theory and Bayesian inference in its models. Bayesian Machine Learning enables the estimation of model parameters and prediction uncertainty through probabilistic models and inference techniques. Bayesian Machine Learning is useful in scenarios where uncertainty is high and where the data is limited or noisy.

Probabilistic Models and Inference in Python

Python is a popular language for machine learning, and several libraries support Bayesian Machine Learning. In this tutorial, we will use the PyMC3 library to build and fit probabilistic models and perform Bayesian inference.

Import Libraries

We will start by importing the necessary libraries, including NumPy for numerical computations, Matplotlib for visualizations, and PyMC3 for probabilistic models and inference.

Generate Data

Next, we will generate some random data to fit our probabilistic model.

In this example, we generate 50 data points with a linear relationship between x and y.

Build Probabilistic Model

Next, we will build a probabilistic model to fit the data.

In this example, we define the priors for the model parameters (alpha, beta, and sigma) and the likelihood for the data.

Fit Probabilistic Model

Next, we will fit the probabilistic model to the data using Bayesian inference.

In this example, we use the sample function from PyMC3 to sample from the posterior distribution of the model parameters. We then plot the posterior distributions of the parameters.

Make Predictions

Finally, we can use the fitted probabilistic model to make predictions on new data.

In this example, we use the sample_posterior_predictive function from PyMC3 to predict y values for new x values. We then plot the predictions and the associated uncertainty.

In this tutorial, we covered the basics of Bayesian Machine Learning and how to use it in Python to build and fit probabilistic models and perform Bayesian inference. Bayesian Machine Learning enables the estimation of model parameters and prediction uncertainty through probabilistic models and inference techniques. It is useful in scenarios where uncertainty is high and where the data is limited or noisy. I hope you found this tutorial useful in understanding Bayesian Machine Learning in Python.


The code examples provided in this tutorial are for illustrative purposes only and are not intended for production use. The code should be adapted to specific use cases and may require additional validation and testing.

Active Learning: Learning with Limited Labeled Data in Python (Scikit-learn, Active Learning Lib)

Active Learning: Learning with Limited Labeled Data in Python (Scikit-learn, Active Learning Lib)

Active Learning is a machine learning approach that enables the selection of the most informative data points to be labeled by an oracle, thereby reducing the number of labeled data points required to train a model. Active Learning is useful in scenarios where labeled data is limited or expensive to acquire. Active Learning can help improve the accuracy of machine learning models with fewer labeled data points.

Learning with Limited Labeled Data in Python

Python is a popular language for machine learning, and several libraries support Active Learning. In this tutorial, we will use the Scikit-learn library to train a model and the Active Learning library to select informative data points to be labeled.

Import Libraries

We will start by importing the necessary libraries, including Scikit-learn for training the model, NumPy for numerical computations, and the Active Learning library for selecting informative data points to be labeled.

import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from modAL.uncertainty import uncertainty_sampling

Generate Data

Next, we will generate some random data for training and testing the model.

# Generate random data for training and testing
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_classes=2, random_state=1)

In this example, we generate 1000 data points with 10 features and 5 informative features for training and testing.

Split Data

Next, we will split the data into a training set and a test set.

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

In this example, we split the data into a training set and a test set, with 20% of the data in the test set.

Train Initial Model

Next, we will train an initial logistic regression model on the labeled data.

# Train initial model
model = LogisticRegression()[:10], y_train[:10])

In this example, we train an initial model on the first 10 labeled data points.

Active Learning

Next, we will use Active Learning to select informative data points to be labeled by an oracle.

# Active Learning
for i in range(10):
    # Select most informative data point to be labeled
    query_idx, query_inst = uncertainty_sampling(model, X_train)
    # Label data point
    y_new = np.array([int(input(f"Enter label for instance {j+1}: ")) for j in query_idx])
    # Add labeled data point to training data
    X_train = np.concatenate((X_train, query_inst.reshape(1, -1)))
    y_train = np.concatenate((y_train, y_new))
    # Retrain model, y_train)

In this example, we use the uncertainty_sampling function from the Active Learning library to select the most informative data point to be labeled by an oracle. We then ask the user to label the data point and add the labeled data point to the training data. We then retrain the model on the new labeled data.

Test Model

Finally, we will test the model on the test data.

# Test model
score = model.score(X_test, y_test)
print(f"Model accuracy: {score}")

In this example, we test the model on the test data and print the accuracy.

In this tutorial, we covered the basics of Active Learning and how to use it in Python to train machine learning models with limited labeled data. Active Learning is a useful approach in scenarios where labeled data is limited or expensive to acquire, and can help improve the accuracy of machine learning models with fewer labeled data points. I hope you found this tutorial useful in understanding Active Learning in Python.

Transfer Learning: aprovechando modelos pre-entrenados para nuevas tareas en Python (+Keras)

Transfer Learning: aprovechando modelos pre-entrenados para nuevas tareas en Python (+Keras)

El Transfer Learning es una técnica en Deep Learning que permite reutilizar un modelo pre-entrenado en una nueva tarea que es similar a la tarea original. El Transfer Learning puede ahorrar tiempo y recursos computacionales al aprovechar el conocimiento adquirido en la tarea original. El modelo pre-entrenado puede ser afinado o utilizado como un extractor de características para la nueva tarea.

Uso de modelos pre-entrenados en Keras

Keras es una popular biblioteca de Deep Learning que admite varios modelos pre-entrenados que se pueden utilizar para el Transfer Learning. Estos modelos pre-entrenados están entrenados en conjuntos de datos grandes y pueden reconocer patrones que son útiles para muchas tareas diferentes.

Importar bibliotecas

Comenzaremos importando las bibliotecas necesarias, incluyendo Keras para cargar el modelo pre-entrenado y NumPy para cálculos numéricos.

import numpy as np
from keras.applications import VGG16

Cargar modelo pre-entrenado

Luego, cargaremos un modelo pre-entrenado, VGG16, usando Keras.

# Cargar modelo pre-entrenado
modelo = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

En este ejemplo, cargamos el modelo VGG16 pre-entrenado en el conjunto de datos ImageNet, excluyendo la capa superior y especificando la forma de entrada.

Congelar capas

A continuación, congelaremos las capas en el modelo pre-entrenado para evitar que se actualicen durante el entrenamiento.

# Congelar capas
for capa in modelo.layers:
    capa.trainable = False

Agregar nuevas capas

A continuación, agregaremos nuevas capas encima del modelo pre-entrenado para la nueva tarea. Agregaremos una capa Flatten para convertir la salida del modelo pre-entrenado en una matriz unidimensional, una capa Dense con 256 neuronas y una capa Dense final con el número de clases de salida.

# Agregar nuevas capas
x = Flatten()(modelo.output)
x = Dense(256, activation='relu')(x)
predicciones = Dense(num_clases, activation='softmax')(x)

En este ejemplo, agregamos una capa Flatten para convertir la salida del modelo pre-entrenado en una matriz unidimensional, una capa Dense con 256 neuronas y una capa Dense final con el número de clases de salida.

Compilar el modelo

A continuación, compilaremos el nuevo modelo y especificaremos la función de pérdida, el optimizador y la métrica de evaluación.

# Compilar modelo
modelo = Model(inputs=modelo.input, outputs=predicciones)
modelo.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

En este ejemplo, utilizamos una función de pérdida de entropía cruzada categórica, un optimizador Adam y la precisión como métrica de evaluación.

Entrenar el modelo

A continuación, entrenaremos el nuevo modelo en la nueva tarea.

# Entrenar modelo, y_entrenamiento, epochs=10, batch_size=32)

En este ejemplo, entrenamos el modelo durante 10 épocas con un tamaño de lote de 32.

En este tutorial, cubrimos los fundamentos del Transfer Learning y cómo utilizar modelos pre-entrenados en Keras. También mostramos cómo congelar capas, agregar nuevas capas, compilar el nuevo modelo y entrenar el nuevo modelo en una nueva tarea. El Transfer Learning es una técnica poderosa que puede ahorrar tiempo y recursos computacionales y es útil para muchas aplicaciones diferentes.

Espero que haya encontrado útil este tutorial sobre Transfer Learning en Python. Considere comprar mi libro sobre inteligencia artificial y aprendizaje automático: A.I. & Machine Learning — When you don’t know sh#t: A Beginner’s Guide to Understanding Artificial Intelligence and Machine Learning (

Transfer Learning: Leveraging Pre-Trained Models for New Tasks in Python (+Keras).

Transfer Learning: Leveraging Pre-Trained Models for New Tasks in Python (+Keras).

Transfer Learning is a technique in Deep Learning that enables a pre-trained model to be reused on a new task that is similar to the original task. Transfer Learning can save time and computational resources by leveraging the knowledge gained from the original task. The pre-trained model can be fine-tuned or used as a feature extractor for the new task.

Using Pre-Trained Models in Keras

Keras is a popular Deep Learning library that supports several pre-trained models that can be used for Transfer Learning. These pre-trained models are trained on large datasets and can recognize patterns that are useful for many different tasks.

Import Libraries

We will start by importing the necessary libraries, including Keras for loading the pre-trained model and NumPy for numerical computations.

import numpy as np
from keras.applications import VGG16

Load Pre-Trained Model

Next, we will load a pre-trained model, VGG16, using Keras.

# Load pre-trained model
model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

In this example, we load the VGG16 model pre-trained on the ImageNet dataset, excluding the top layer, and specifying the input shape.

Freeze Layers

Next, we will freeze the layers in the pre-trained model to prevent them from being updated during training.

# Freeze layers
for layer in model.layers:
    layer.trainable = False

Add New Layers

Next, we will add new layers on top of the pre-trained model for the new task. We will add a Flatten layer to convert the output of the pre-trained model into a 1-dimensional array, a Dense layer with 256 neurons, and a final Dense layer with the number of output classes.

# Add new layers
x = Flatten()(model.output)
x = Dense(256, activation='relu')(x)
predictions = Dense(num_classes, activation='softmax')(x)

In this example, we add a Flatten layer to convert the output of the pre-trained model into a 1-dimensional array, a Dense layer with 256 neurons, and a final Dense layer with the number of output classes.

Compile Model

Next, we will compile the new model and specify the loss function, optimizer, and evaluation metric.

# Compile model
model = Model(inputs=model.input, outputs=predictions)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In this example, we use categorical cross-entropy loss, Adam optimizer, and accuracy as the evaluation metric.

Train Model

Next, we will train the new model on the new task.

# Train model, y_train, epochs=10, batch_size=32)

In this example, we train the model for 10 epochs with a batch size of 32.

In this tutorial, we covered the basics of Transfer Learning and how to use pre-trained models in Keras. We also showed how to freeze layers, add new layers, compile the new model, and train the new model on a new task. Transfer Learning is a powerful technique that can save time and computational resources and is useful for many different applications.

I hope you found this tutorial useful in understanding Transfer Learning in Python. Please check out my book: A.I. & Machine Learning — When you don’t know sh#t: A Beginner’s Guide to Understanding Artificial Intelligence and Machine Learning (

Unsupervised Learning: Clustering and Dimensionality Reduction in Python

Unsupervised Learning: Clustering and Dimensionality Reduction in Python

Unsupervised learning is a type of machine learning where the model is not provided with labeled data. The model learns the underlying structure and patterns in the data without any specific guidance on what to look for. Clustering and Dimensionality Reduction are two important techniques in unsupervised learning.


Clustering is a technique where the model tries to identify groups in the data based on their similarities. The objective is to group similar data points together and separate dissimilar data points. Clustering algorithms can be used for a variety of applications such as customer segmentation, anomaly detection, and image segmentation.

Dimensionality Reduction

Dimensionality reduction is a technique where the model tries to reduce the number of features in the data while retaining as much information as possible. This is useful when dealing with high-dimensional data where it’s difficult to visualize and analyze the data. Dimensionality reduction algorithms can be used for a variety of applications such as data compression, feature extraction, and visualization.

Clustering Algorithms

There are several clustering algorithms in machine learning, each with its own strengths and weaknesses. In this tutorial, we will cover two popular clustering algorithms: K-Means Clustering and Hierarchical Clustering.

K-Means Clustering

K-Means Clustering is a simple and efficient clustering algorithm. The algorithm partitions the data into K clusters based on their similarity. The number of clusters K is specified by the user. The algorithm starts by randomly selecting K data points as the initial centroids. The data points are then assigned to the nearest centroid based on their distance. The centroid is then updated based on the mean of the data points in the cluster. This process is repeated until convergence.

Let’s see how to implement K-Means Clustering in Python using Scikit-Learn.

from sklearn.cluster import KMeans
import numpy as np

# Generate random data
X = np.random.rand(100, 2)
# Initialize KMeans model with 2 clusters
kmeans = KMeans(n_clusters=2)
# Fit the model to the data
# Predict the clusters for the data
y_pred = kmeans.predict(X)
# Print the centroids of the clusters

In this example, we generate random data with 2 features and 100 data points. We then initialize the KMeans model with 2 clusters and fit the model to the data. We then predict the clusters for the data and print the centroids of the clusters.

Hierarchical Clustering

Hierarchical Clustering is a clustering algorithm that builds a hierarchy of clusters. The algorithm starts by treating each data point as a separate cluster. The algorithm then iteratively merges the closest clusters based on their distance until all the data points belong to a single cluster.

There are two types of hierarchical clustering algorithms: Agglomerative and Divisive. Agglomerative clustering starts with each data point as a separate cluster and iteratively merges the closest clusters. Divisive clustering starts with all data points in a single cluster and iteratively splits the cluster into smaller clusters.

Let’s see how to implement Agglomerative Hierarchical Clustering in Python using Scikit-Learn.

from sklearn.cluster import AgglomerativeClustering
import numpy as np

# Generate random data
X = np.random.rand(100, 2)
# Initialize AgglomerativeClustering model with 2 clusters
agg_clustering = AgglomerativeClustering(n_clusters=2)
# Fit the model to the data
# Predict the clusters for the data
y_pred = agg_clustering.labels_
# Print the labels of the clusters

In this example, we generate random data with 2 features and 100 data points. We then initialize the AgglomerativeClustering model with 2 clusters and fit the model to the data. We then predict the clusters for the data and print the labels of the clusters.

Divisive Hierarchical Clustering

Divisive Hierarchical Clustering is a clustering algorithm that starts with all data points in a single cluster and iteratively splits the cluster into smaller clusters. The algorithm starts by treating all data points as a single cluster. The algorithm then iteratively splits the cluster into smaller clusters based on their dissimilarity until each data point belongs to a separate cluster.

Divisive Hierarchical Clustering is not as popular as Agglomerative Hierarchical Clustering because it is computationally expensive and tends to produce imbalanced clusters.

Dimensionality Reduction Algorithms

There are several dimensionality reduction algorithms in machine learning, each with its own strengths and weaknesses. In this tutorial, we will cover two popular dimensionality reduction algorithms: Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE).

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a linear dimensionality reduction technique that tries to find the orthogonal directions of maximum variance in the data. The objective is to find a lower-dimensional representation of the data that retains as much information as possible. PCA is useful when dealing with high-dimensional data where it’s difficult to visualize and analyze the data.

Let’s see how to implement PCA in Python using Scikit-Learn.

from sklearn.decomposition import PCA
import numpy as np

# Generate random data
X = np.random.rand(100, 10)
# Initialize PCA model with 2 components
pca = PCA(n_components=2)
# Fit the model to the data
# Transform the data to 2 dimensions
X_transformed = pca.transform(X)
# Print the shape of the transformed data

In this example, we generate random data with 10 features and 100 data points. We then initialize the PCA model with 2 components and fit the model to the data. We then transform the data to 2 dimensions and print the shape of the transformed data.

t-Distributed Stochastic Neighbor Embedding (t-SNE)

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a nonlinear dimensionality reduction technique that tries to preserve the pairwise distances between the data points in the lower-dimensional representation. The objective is to find a lower-dimensional representation of the data that retains the local structure of the data. t-SNE is useful when dealing with high-dimensional data where it’s difficult to visualize and analyze the data.

Let’s see how to implement t-SNE in Python using Scikit-Learn.

from sklearn.manifold import TSNE
import numpy as np

# Generate random data
X = np.random.rand(100, 10)
# Initialize t-SNE model with 2 components
tsne = TSNE(n_components=2)
# Fit the model to the data
X_transformed = tsne.fit_transform(X)
# Print the shape of the transformed data

In this example, we generate random data with 10 features and 100 data points. We then initialize the t-SNE model with 2 components and fit the model to the data. We then transform the data to 2 dimensions and print the shape of the transformed data.

In this tutorial, we covered two important techniques in unsupervised learning: Clustering and Dimensionality Reduction. We also covered two popular algorithms for each technique: K-Means Clustering and Hierarchical Clustering for Clustering, and PCA and t-SNE for Dimensionality Reduction. We also provided code examples in Python using Scikit-Learn.

I hope you found this tutorial useful in understanding Unsupervised Learning. To learn more about Machine Learning, I hope you will consider checking out my book: Unsupervised Learning: Clustering and Dimensionality Reduction (

Deploying Stateful Applications on Kubernetes

Deploying Stateful Applications on Kubernetes


  • A Kubernetes cluster
  • A basic understanding of Kubernetes concepts
  • A stateful application that you want to deploy

Step 1: Create a Persistent Volume

apiVersion: v1
kind: PersistentVolume
  name: my-pv
  storageClassName: my-storage-class
    storage: 10Gi
  - ReadWriteOnce
    path: /mnt/data
kubectl apply -f pv.yaml

Step 2: Create a Persistent Volume Claim

apiVersion: v1
kind: PersistentVolumeClaim
  name: my-pvc
  storageClassName: my-storage-class
  - ReadWriteOnce
      storage: 10Gi
kubectl apply -f pvc.yaml

Step 3: Create a StatefulSet

apiVersion: apps/v1
kind: StatefulSet
  name: my-app
      app: my-app
  serviceName: my-app
  replicas: 3
        app: my-app
      - name: my-app
        image: my-app-image
        - name: my-persistent-storage
          mountPath: /data
  - metadata:
      name: my-persistent-storage
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: my-storage-class
          storage: 10Gi
kubectl apply -f statefulset.yaml

Step 4: Verify Your Deployment

kubectl get statefulsets
kubectl get pods

Kubernetes for Machine Learning: Setting up a Machine Learning Workflow on Kubernetes (TensorFlow)

Kubernetes for Machine Learning: Setting up a Machine Learning Workflow on Kubernetes (TensorFlow)


  • A Kubernetes cluster
  • A basic understanding of Kubernetes concepts
  • Familiarity with machine learning concepts and frameworks, such as TensorFlow or PyTorch
  • A Docker image for your machine learning application

Step 1: Create a Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
  name: ml-app
  replicas: 3
      app: ml-app
        app: ml-app
      - name: ml-app
        image: your-ml-image:latest
        - containerPort: 5000
kubectl apply -f deployment.yaml

Step 2: Create a Kubernetes Service

apiVersion: v1
kind: Service
  name: ml-app
    app: ml-app
  - name: http
    port: 80
    targetPort: 5000
  type: LoadBalancer
kubectl apply -f service.yaml

Step 3: Scale Your Deployment

kubectl scale deployment ml-app --replicas=5

Step 4: Run Machine Learning Jobs

kind: Tensorflow
  name: tf-serving
        storageUri: gs://your-bucket/your-model
            cpu: 1
            memory: 1Gi
            cpu: 0.5
            memory: 500Mi
kubectl apply -f tf-serving.yaml