AI & ML Workloads: GPU and Pytorch

Introduction

We present a basic method to deploy artificial intelligence (AI) and machine learning (ML) on the TFGrid. For this, we make use of dedicated nodes and GPU support.

In the first part, we show the steps to install the Nvidia driver of a GPU card on a full VM Ubuntu 22.04 running on the TFGrid.

In the second part, we show how to use PyTorch to run AI/ML tasks.

Prerequisites

You need to reserve a dedicated GPU node on the ThreeFold Grid.

Prepare the System

Update the system

dpkg --add-architecture i386
apt-get update
apt-get dist-upgrade
reboot

Check the GPU info
```
lspci | grep VGA
lshw -c video
```

Install the GPU Driver

Download the latest Nvidia driver
- Check which driver is recommended
```
apt install ubuntu-drivers-common
ubuntu-drivers devices
```
- Install the recommended driver (e.g. with 535)
```
apt install nvidia-driver-535
```
- Reboot and reconnect to the VM
Check the GPU status
```
nvidia-smi
```

Now that the GPU node is set, let's work on setting PyTorch to run AI/ML workloads.

Set a Python Virtual Environment

Before installing Python package with pip, you should create a virtual environment.

Install the prerequisites

apt update
apt install python3-pip python3-dev
pip3 install --upgrade pip
pip3 install virtualenv

Create a virtual environment

mkdir ~/python_project
cd ~/python_project
virtualenv python_project_env
source python_project_env/bin/activate

Install PyTorch and Test Cuda

Once you've created and activated a virtual environment for Pyhton, you can install different Python packages.

Install PyTorch and upgrade Numpy

pip3 install torch
pip3 install numpy --upgrade

Before going further, you can check if Cuda is properly installed on your machine.

Check that Cuda is available on Python with PyTorch by using the following lines:

import torch
torch.cuda.is_available()
torch.cuda.device_count() # the output should be 1
torch.cuda.current_device() # the output should be 0
torch.cuda.device(0)
torch.cuda.get_device_name(0)

Set and Access Jupyter Notebook

You can run Jupyter Notebook on the remote VM and access it on your local browser.

Install Jupyter Notebook
```
pip3 install notebook
```
Run Jupyter Notebook in no-browser mode and take note of the URL and the token
```
jupyter notebook --no-browser --port=8080 --ip=0.0.0.0
```
On your local machine, copy and paste on a browser the given URL but make sure to change 127.0.0.1 with the WireGuard IP (here it is 10.20.4.2) and to set the correct token.
```
http://10.20.4.2:8080/tree?token=<insert_token>
```

Run AI/ML Workloads

After following the steps above, you should now be able to run Python codes that will make use of your GPU node to compute AI and ML workloads.

Feel free to explore different ways to use this feature. For example, the HuggingFace course on natural language processing is a good introduction to machine learning.

ThreeFold V3 Manual