backend.ai

backend.ai

灵活高效的容器化计算集群平台 支持多种框架与加速器

Backend.AI是一个基于容器的计算集群平台,支持多种计算和机器学习框架及编程语言。平台提供CUDA GPU、ROCm GPU、TPU和IPU等异构加速器支持,可按需分配和隔离计算资源,适合多租户环境。通过REST、GraphQL和WebSocket API暴露功能,为用户提供灵活高效的计算环境。此平台集成了先进的资源调度功能,可实现按需或批量分配计算资源。Backend.AI采用容器技术实现资源隔离,确保多租户环境的安全性和效率。其开放的API架构便于与现有系统集成,为科研、教育和企业用户提供了强大而灵活的计算解决方案。

Backend.AI容器化计算平台API计算资源管理多租户Github开源项目

Backend.AI

PyPI release version Supported Python versions Wheels Gitter

Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, TPU, IPU and other NPUs.

It allocates and isolates the underlying computing resources for multi-tenant computation sessions on-demand or in batches with customizable job schedulers with its own orchestrator. All its functions are exposed as REST/GraphQL/WebSocket APIs.

Contents in This Repository

This repository contains all open-source server-side components and the client SDK for Python as a reference implementation of API clients.

Directory Structure

  • src/ai/backend/: Source codes
    • manager/: Manager
    • manager/api: Manager API handlers
    • agent/: Agent
    • agent/docker/: Agent's Docker backend
    • agent/k8s/: Agent's Kubernetes backend
    • kernel/: Agent's kernel runner counterpart
    • runner/: Agent's in-kernel prebuilt binaries
    • helpers/: Agent's in-kernel helper package
    • common/: Shared utilities
    • client/: Client SDK
    • cli/: Unified CLI for all components
    • storage/: Storage proxy
    • storage/api: Storage proxy's manager-facing and client-facing APIs
    • web/: Web UI server
      • static/: Backend.AI WebUI release artifacts
    • plugin/: Plugin subsystem
    • test/: Integration test suite
    • testutils/: Shared utilities used by unit tests
    • meta/: Legacy meta package
  • docs/: Unified documentation
  • tests/
    • manager/, agent/, ...: Per-component unit tests
  • configs/
    • manager/, agent/, ...: Per-component sample configurations
  • docker/: Dockerfiles for auxiliary containers
  • fixtures/
    • manager/, ...: Per-component fixtures for development setup and tests
  • plugins/: A directory to place plugins such as accelerators, monitors, etc.
  • scripts/: Scripts to assist development workflows
    • install-dev.sh: The single-node development setup script from the working copy
  • stubs/: Type annotation stub packages written by us
  • tools/: A directory to host Pants-related tooling
  • dist/: A directory to put build artifacts (.whl files) and Pants-exported virtualenvs
  • changes/: News fragments for towncrier
  • pants.toml: The Pants configuration
  • pyproject.toml: Tooling configuration (towncrier, pytest, mypy)
  • BUILD: The root build config file
  • **/BUILD: Per-directory build config files
  • BUILD_ROOT: An indicator to mark the build root directory for Pants
  • requirements.txt: The unified requirements file
  • *.lock, tools/*.lock: The dependency lock files
  • docker-compose.*.yml: Per-version recommended halfstack container configs
  • README.md: This file
  • MIGRATION.md: The migration guide for updating between major releases
  • VERSION: The unified version declaration

Server-side components are licensed under LGPLv3 to promote non-proprietary open innovation in the open-source community while other shared libraries and client SDKs are distributed under the MIT license.

There is no obligation to open your service/system codes if you just run the server-side components as-is (e.g., just run as daemons or import the components without modification in your codes). Please contact us (contact-at-lablup-com) for commercial consulting and more licensing details/options about individual use-cases.

Getting Started

Installation for Single-node Development

Run scripts/install-dev.sh after cloning this repository.

This script checks availability of all required dependencies such as Docker and bootstrap a development setup. Note that it requires sudo and a modern Python installed in the host system based on Linux (Debian/RHEL-likes) or macOS.

Installation for Multi-node Tests & Production

Please consult our documentation for community-supported materials. Contact the sales team (contact@lablup.com) for professional paid support and deployment options.

Accessing Compute Sessions (aka Kernels)

Backend.AI provides websocket tunneling into individual computation sessions (containers), so that users can use their browsers and client CLI to access in-container applications directly in a secure way.

  • Jupyter: data scientists' favorite tool
    • Most container images have intrinsic Jupyter and JupyterLab support.
  • Web-based terminal
    • All container sessions have intrinsic ttyd support.
  • SSH
    • All container sessions have intrinsic SSH/SFTP/SCP support with auto-generated per-user SSH keypair. PyCharm and other IDEs can use on-demand sessions using SSH remote interpreters.
  • VSCode
    • Most container sessions have intrinsic web-based VSCode support.

Working with Storage

Backend.AI provides an abstraction layer on top of existing network-based storages (e.g., NFS/SMB), called vfolders (virtual folders). Each vfolder works like a cloud storage that can be mounted into any computation sessions and shared between users and user groups with differentiated privileges.

Major Components

Manager

It routes external API requests from front-end services to individual agents. It also monitors and scales the cluster of multiple agents (a few tens to hundreds).

Agent

It manages individual server instances and launches/destroys Docker containers where REPL daemons (kernels) run. Each agent on a new EC2 instance self-registers itself to the instance registry via heartbeats.

Storage Proxy

It provides a unified abstraction over multiple different network storage devices with vendor-specific enhancements such as real-time performance metrics and filesystem operation acceleration APIs.

Webserver

It hosts the SPA (single-page application) packaged from our web UI codebase for end-users and basic administration tasks.

Synchronizing the static Backend.AI WebUI version:

$ scripts/download-webui-release.sh <target version to download>

Kernels

Computing environment recipes (Dockerfile) to build the container images to execute on top of the Backend.AI platform.

Jail

A programmable sandbox implemented using ptrace-based system call filtering written in Rust.

Hook

A set of libc overrides for resource control and web-based interactive stdin (paired with agents).

Client SDK Libraries

We offer client SDKs in popular programming languages. These SDKs are freely available with MIT License to ease integration with both commercial and non-commercial software products and services.

Plugins

Legacy Components

These components still exist but are no longer actively maintained.

Media

The front-end support libraries to handle multi-media outputs (e.g., SVG plots, animated vector graphics)

  • The Python package (lablup) is installed inside kernel containers.
  • To interpret and display media generated by the Python package, you need to load the Javascript part in the front-end.
  • https://github.com/lablup/backend.ai-media

IDE and Editor Extensions

We now recommend using in-kernel applications such as Jupyter Lab, Visual Studio Code Server, or native SSH connection to kernels via our client SDK or desktop apps.

Python Version Compatibility

Backend.AI Core VersionPython VersionPantsbuild version
24.03.x / 24.09.x3.12.x2.21.x
23.03.x / 23.09.x3.11.x2.19.x
22.03.x / 22.09.x3.10.x
21.03.x / 21.09.x3.8.x

License

Refer to LICENSE file.

编辑推荐精选

扣子-AI办公

扣子-AI办公

职场AI,就用扣子

AI办公助手,复杂任务高效处理。办公效率低?扣子空间AI助手支持播客生成、PPT制作、网页开发及报告写作,覆盖科研、商业、舆情等领域的专家Agent 7x24小时响应,生活工作无缝切换,提升50%效率!

堆友

堆友

多风格AI绘画神器

堆友平台由阿里巴巴设计团队创建,作为一款AI驱动的设计工具,专为设计师提供一站式增长服务。功能覆盖海量3D素材、AI绘画、实时渲染以及专业抠图,显著提升设计品质和效率。平台不仅提供工具,还是一个促进创意交流和个人发展的空间,界面友好,适合所有级别的设计师和创意工作者。

图像生成AI工具AI反应堆AI工具箱AI绘画GOAI艺术字堆友相机AI图像热门
码上飞

码上飞

零代码AI应用开发平台

零代码AI应用开发平台,用户只需一句话简单描述需求,AI能自动生成小程序、APP或H5网页应用,无需编写代码。

Vora

Vora

免费创建高清无水印Sora视频

Vora是一个免费创建高清无水印Sora视频的AI工具

Refly.AI

Refly.AI

最适合小白的AI自动化工作流平台

无需编码,轻松生成可复用、可变现的AI自动化工作流

酷表ChatExcel

酷表ChatExcel

大模型驱动的Excel数据处理工具

基于大模型交互的表格处理系统,允许用户通过对话方式完成数据整理和可视化分析。系统采用机器学习算法解析用户指令,自动执行排序、公式计算和数据透视等操作,支持多种文件格式导入导出。数据处理响应速度保持在0.8秒以内,支持超过100万行数据的即时分析。

AI工具酷表ChatExcelAI智能客服AI营销产品使用教程
TRAE编程

TRAE编程

AI辅助编程,代码自动修复

Trae是一种自适应的集成开发环境(IDE),通过自动化和多元协作改变开发流程。利用Trae,团队能够更快速、精确地编写和部署代码,从而提高编程效率和项目交付速度。Trae具备上下文感知和代码自动完成功能,是提升开发效率的理想工具。

AI工具TraeAI IDE协作生产力转型热门
AIWritePaper论文写作

AIWritePaper论文写作

AI论文写作指导平台

AIWritePaper论文写作是一站式AI论文写作辅助工具,简化了选题、文献检索至论文撰写的整个过程。通过简单设定,平台可快速生成高质量论文大纲和全文,配合图表、参考文献等一应俱全,同时提供开题报告和答辩PPT等增值服务,保障数据安全,有效提升写作效率和论文质量。

AI辅助写作AI工具AI论文工具论文写作智能生成大纲数据安全AI助手热门
博思AIPPT

博思AIPPT

AI一键生成PPT,就用博思AIPPT!

博思AIPPT,新一代的AI生成PPT平台,支持智能生成PPT、AI美化PPT、文本&链接生成PPT、导入Word/PDF/Markdown文档生成PPT等,内置海量精美PPT模板,涵盖商务、教育、科技等不同风格,同时针对每个页面提供多种版式,一键自适应切换,完美适配各种办公场景。

AI办公办公工具AI工具博思AIPPTAI生成PPT智能排版海量精品模板AI创作热门
潮际好麦

潮际好麦

AI赋能电商视觉革命,一站式智能商拍平台

潮际好麦深耕服装行业,是国内AI试衣效果最好的软件。使用先进AIGC能力为电商卖家批量提供优质的、低成本的商拍图。合作品牌有Shein、Lazada、安踏、百丽等65个国内外头部品牌,以及国内10万+淘宝、天猫、京东等主流平台的品牌商家,为卖家节省将近85%的出图成本,提升约3倍出图效率,让品牌能够快速上架。

下拉加载更多