Setup OS Environment
Backend.AI and its associated components share common requirements and configurations for proper operation. This section explains how to configure the OS environment.
Note
This section assumes the installation on Ubuntu 20.04 LTS.
Create a user account for operation
We will create a user account bai to install and operate Backend.AI
services. Set the UID and GID to 1100 to prevent conflicts with
other users or groups. sudo privilege is required so add bai to
sudo group.
$ username="bai"
$ password="secure-password"
$ sudo adduser --disabled-password --uid 1100 --gecos "" $username
$ echo "$username:$password" | sudo chpasswd
$ sudo usermod -aG sudo bai
If you do not want to expose your password in the shell history, remove the
--disabled-password option and interactively enter your password.
Login as the bai user and continue the installation.
Install Docker engine
Backend.AI requires Docker Engine to create a compute session with the Docker
container backend. Also, some service components are deployed as containers. So
installing Docker Engine is
required. Ensure docker-compose-plugin is installed as well to use
docker compose command.
After the installation, add the bai user to the docker group not to
issue the sudo prefix command every time interacting with the Docker engine.
$ sudo usermod -aG docker bai
Logout and login again to apply the group membership change.
Optimize sysctl/ulimit parameters
This is not essential but the recommended step to optimize the performance and stability of operating Backend.AI. Refer to the guide of the Manager repiository for the details of the kernel parameters and the ulimit settings. Depending on the Backend.AI services you install, the optimal values may vary. Each service installation section guide with the values, if needed.
Note
Modern systems may have already set the optimal parameters. In that case, you can skip this step.
To cleanly separate the configurations, you may follow the steps below.
Save the resource limit parameters in
/etc/security/limits.d/99-backendai.conf.root hard nofile 512000 root soft nofile 512000 root hard nproc 65536 root soft nproc 65536 bai hard nofile 512000 bai soft nofile 512000 bai hard nproc 65536 bai soft nproc 65536
Logout and login again to apply the resource limit changes.
Save the kernel parameters in
/etc/sysctl.d/99-backendai.conf.fs.file-max=2048000 net.core.somaxconn=1024 net.ipv4.tcp_max_syn_backlog=1024 net.ipv4.tcp_slow_start_after_idle=0 net.ipv4.tcp_fin_timeout=10 net.ipv4.tcp_window_scaling=1 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_early_retrans=1 net.ipv4.ip_local_port_range="10000 65000" net.core.rmem_max=16777216 net.core.wmem_max=16777216 net.ipv4.tcp_rmem=4096 12582912 16777216 net.ipv4.tcp_wmem=4096 12582912 16777216 vm.overcommit_memory=1
Apply the kernel parameters with
sudo sysctl -p /etc/sysctl.d/99-backendai.conf.
Prepare required Python versions and virtual environments
Prepare a Python distribution whose version meets the requirements of the target package. Backend.AI 22.09, for example, requires Python 3.10. The latest information on the Python version compatibility can be found at here.
There can be several ways to prepare a specific Python version. Here, we will be using a standalone static built Python.
Use a standalone static built Python (Recommended)
Obtain distribution of a standalone static built Python according to required python version, target machine architecture and etc. Then extract the distribution to a directory of your choice.
$ curl -L "https://github.com/astral-sh/python-build-standalone/releases/download/${PYTHON_RELEASE_DATE}/cpython-${PYTHON_VERSION}+${PYTHON_RELEASE_DATE}-${TARGET_MACHINE_ARCHITECTURE}-${ARCHIVE_FLAVOR}.tar.gz" > cpython-${PYTHON_VERSION}+${PYTHON_RELEASE_DATE}-${TARGET_MACHINE_ARCHITECTURE}-${ARCHIVE_FLAVOR}.tar.gz
$ tar -xf "cpython-${PYTHON_VERSION}+${PYTHON_RELEASE_DATE}-${TARGET_MACHINE_ARCHITECTURE}-${ARCHIVE_FLAVOR}.tar.gz"
$ mkdir -p "/home/${USERNAME}/.static-python/versions"
$ mv python "/home/${USERNAME}/.static-python/versions/${PYTHON_VERSION}"
For example,
$ curl -L "https://github.com/astral-sh/python-build-standalone/releases/download/20241219/cpython-3.13.3+20250409-x86_64-unknown-linux-gnu-install_only.tar.gz" > cpython-3.13.3+20250409-x86_64-unknown-linux-gnu-install_only.tar.gz
$ tar -xf "cpython-3.13.3+20241219-x86_64-unknown-linux-gnu-install_only.tar.gz"
$ mkdir -p "/home/bai/.static-python/versions"
$ mv python "/home/bai/.static-python/versions/3.13.3"
Then, you can create multiple virtual environments per service. To create a virtual environment for Backend.AI Manager and activate it, for example, you may run:
$ mkdir "${HOME}/manager"
$ cd "${HOME}/manager"
$ ~/.static-python/versions/3.13.3/bin/python3 -m venv .venv
$ source .venv/bin/activate
$ pip install -U pip setuptools wheel
You also need to make pip available to the Python installation with the
latest wheel and setuptools packages, so that any non-binary extension
packages can be compiled and installed on your system.
(Alternative) Use pyenv to manually build and select a specific Python version
If you prefer, there is no problem using pyenv and pyenv-virtualenv.
Install pyenv and pyenv-virtualenv. Then, install a Python version that are needed:
$ pyenv install "${YOUR_PYTHON_VERSION}"
Note
You may need to install suggested build environment to build Python from pyenv.
Then, you can create multiple virtual environments per service. To create a virtual environment for Backend.AI Manager 22.09.x and automatically activate it, for example, you may run:
$ mkdir "${HOME}/manager"
$ cd "${HOME}/manager"
$ pyenv virtualenv "${YOUR_PYTHON_VERSION}" bai-22.09-manager
$ pyenv local bai-22.09-manager
$ pip install -U pip setuptools wheel
You also need to make pip available to the Python installation with the
latest wheel and setuptools packages, so that any non-binary extension
packages can be compiled and installed on your system.
Configure network aliases
Although not required, using a network aliases instead of IP addresses can make
setup and operation easier. Edit the /etc/hosts file for each node and
append the contents like example below to access each server with network
aliases.
##### BEGIN for Backend.AI services #####
10.20.30.10 bai-m1 # management node 01
10.20.30.20 bai-a01 # agent node 01 (GPU 01)
10.20.30.22 bai-a02 # agent node 02 (GPU 02)
##### END for Backend.AI services #####
Note that the IP addresses should be accessible from other nodes, if you are installing on multiple servers.
Setup accelerators
If there are accelerators (e.g., GPU) on the server, you have to install the vendor-specific drivers and libraries to make sure the accelerators are properly set up and working. Please refer to the vendor documentation for the details.
To integrate NVIDIA GPUs,
Install the NVIDIA driver and CUDA toolkit.
Install the NVIDIA container toolkit (nvidia-docker2).
Pull container images
For compute nodes, you need to pull some container images that are required for creating a compute session. Lablup provides a set of open container images and you may pull the following starter images:
docker pull cr.backend.ai/stable/filebrowser:21.02-ubuntu20.04
docker pull cr.backend.ai/stable/python:3.9-ubuntu20.04
docker pull cr.backend.ai/stable/python-pytorch:1.11-py38-cuda11.3
docker pull cr.backend.ai/stable/python-tensorflow:2.7-py38-cuda11.3