List view
1. Reorganize Model Service logic 2. Implement RBAC across all actions 3. Improve Agent and kernel architecture 4. Implement S3 volume compatibility 5. Add GPU monitoring metrics 6. Add Kafka compatibility for message queue
Due by June 30, 2025•7/11 issues closed25.6 refers to the 6th sprint version of 2025. It is designated as an LTS version and will be supported for stability until March 31, 2026. - Cloud auto-scaling using CSP APIs - Compose stack (single-session / multi-sesison) to bundle multiple models and services - More RBAC support (introducing new features based on it, such as project administrators) - Account manager for SSO - More Relay-compliant GraphQL schema (maybe full migration at some point) - Raftify-based HA setup - Extensive Prometheus/OpenTelemetry integration across entire project - Enhanced container registry integration (per-project registry, per-project quota, etc.) - Unified storage resource group (storage proxy + storage agent with "direct access" SFTP/filebrowser containers) - VFolder abstractions for object storage buckets (Minio / S3) - Rolling update of Backend.AI cluster (or at least agents) - Multi-license support in a single license server - Retire of keypair resource policies and migration to user resource policies - Also need to update the owner-access-key option to use user identities - Project-first architecture – per-user "workspace" - Project-level sharing of sessions (#2346) - Project-level container image visibility (including user-committed images) - User, vfolder operation audit logs - Migration of resource allocation maps from agent to manager for more holistic scheduling optimization (e.g., guaranteeing no fragmentation of GPUs) - Hierarchical managers to parallelize per-resource-group scheduling and idle checks - Session template revamps - Live propagation of configurations (e.g., fGPU options) via etcd watch - Easier (multi-node) installation of open-source edition - Logging contexts and request IDs - Make idle checkers scoped within resource groups - Optimized App Proxy traffic routing (probably via native modules and/or with Cilium) - User-defined network partitions via flexible SDN control plane integration - Snapshot and lineage tracking of vfolders (when the underlying storage backend supports) - Virtual agents to proxy external container orchestrators and node pools - Experimental agent backends like Singularity and native processes - Improved documentation for various plugins and SDK
Overdue by 2 month(s)•Due by March 31, 2025•150/172 issues closed- Priority scheduler - Customizable agent selection strategy - Model store - Running arbitrary unlabeled container images - Expanded container registry type support (GitLab, GitHub Enterprise, AWS ECS) - Trash bins
Overdue by 8 month(s)•Due by September 30, 2024•246/288 issues closedIssues with no specific deadline.
No due date•25/59 issues closed