From 6a902abfe281c930f9e710dea1ed1e35f8972883 Mon Sep 17 00:00:00 2001 From: Sherin Date: Wed, 26 Feb 2025 09:44:55 +0200 Subject: [PATCH 1/4] Added overview --- docs/admin/config/clusters.md | 8 +- .../cluster-setup/cluster-setup-intro.md | 2 +- docs/admin/runai-setup/installation-types.md | 4 +- docs/developer/overview-developer.md | 2 +- docs/home/components.md | 42 --------- docs/home/documentation-library.md | 63 +++++++++++++ docs/home/overview.md | 92 ++++++++++++------- mkdocs.yml | 2 +- 8 files changed, 131 insertions(+), 84 deletions(-) delete mode 100644 docs/home/components.md create mode 100644 docs/home/documentation-library.md diff --git a/docs/admin/config/clusters.md b/docs/admin/config/clusters.md index a4f2276bf6..5a338e087b 100644 --- a/docs/admin/config/clusters.md +++ b/docs/admin/config/clusters.md @@ -156,7 +156,7 @@ Before starting, make sure you have the following: * Try to identify the problem from the logs. If you cannot resolve the issue, continue to the next step. 5. Contact Run:ai’s support - * If the issue persists, [contact Run:ai’s support](../../home/overview.md#how-to-get-support) for assistance. + * If the issue persists, [contact Run:ai’s support](../../home/documentation-library.md.md#how-to-get-support) for assistance. ??? "Cluster has service issues" __Description__: When a cluster's status is _Has service issues_, it means that one or more Run:ai services running in the cluster are not available. @@ -194,7 +194,7 @@ Before starting, make sure you have the following: ``` 4. Contact Run:ai’s Support - * If the issue persists, contact [contact Run:ai’s support](../../home/overview.md#how-to-get-support) for assistance. + * If the issue persists, contact [contact Run:ai’s support](../../home/documentation-library.md#how-to-get-support) for assistance. ??? "Cluster is waiting to connect" __Description__: When the cluster's status is ‘waiting to connect’, it means that no communication from the cluster services reaches the Run:ai Platform. This may be due to networking issues or issues with Run:ai services. @@ -285,7 +285,7 @@ Before starting, make sure you have the following: * Try to identify the problem from the logs. If you cannot resolve the issue, continue to the next step 5. Contact Run:ai’s support - * If the issue persists, [contact Run:ai’s support](../../home/overview.md#how-to-get-support) for assistance. + * If the issue persists, [contact Run:ai’s support](../../home/documentation-library.md#how-to-get-support) for assistance. ??? "Cluster is missing prerequisites" __Description__: When a cluster's status displays Missing prerequisites, it indicates that at least one of the Mandatory Prerequisites has not been fulfilled. In such cases, Run:ai services may not function properly. @@ -316,5 +316,5 @@ Before starting, make sure you have the following: * This section provides detailed information about any missing resources or prerequisites. Review this information to identify what is needed 5. Contact Run:ai’s support - * If the issue persists, [contact Run:ai’s support](../../home/overview.md#how-to-get-support) for assistance. + * If the issue persists, [contact Run:ai’s support](../../home/documentation-library.md#how-to-get-support) for assistance. diff --git a/docs/admin/runai-setup/cluster-setup/cluster-setup-intro.md b/docs/admin/runai-setup/cluster-setup/cluster-setup-intro.md index 8e092046b8..dd950e4d2e 100644 --- a/docs/admin/runai-setup/cluster-setup/cluster-setup-intro.md +++ b/docs/admin/runai-setup/cluster-setup/cluster-setup-intro.md @@ -8,7 +8,7 @@ This section is a step-by-step guide for setting up a Run:ai cluster. * A Run:ai cluster connects to the Run:ai control plane on the cloud. The control plane provides a control point as well as a monitoring and control user interface for Administrators and Researchers. * A customer may have multiple Run:ai Clusters, all connecting to a single control plane. -For additional details see the [Run:ai system components](../../../home/components.md) +For additional details see the [Run:ai system components](../../../home/overview.md#runai-system-components) ## Documents diff --git a/docs/admin/runai-setup/installation-types.md b/docs/admin/runai-setup/installation-types.md index d4fa524bbb..d2b55be2c4 100644 --- a/docs/admin/runai-setup/installation-types.md +++ b/docs/admin/runai-setup/installation-types.md @@ -3,8 +3,8 @@ Run:ai consists of two components: -* The Run:ai [Cluster](../../home/components.md#runai-cluster). One or more data-science GPU clusters hosted by the customer (on-prem or cloud). -* The Run:ai [Control plane](../../home/components.md#components). A single entity that monitors clusters, sets priorities, and business policies. +* The Run:ai [Cluster](../../home/overview.md#runai-cluster). One or more data-science GPU clusters hosted by the customer (on-prem or cloud). +* The Run:ai [Control plane](../../home/overview.md#runai-control-plane). A single entity that monitors clusters, sets priorities, and business policies. There are two main installation options: diff --git a/docs/developer/overview-developer.md b/docs/developer/overview-developer.md index 8815e6bd3e..11f8ba33fc 100644 --- a/docs/developer/overview-developer.md +++ b/docs/developer/overview-developer.md @@ -11,7 +11,7 @@ Developers can access Run:ai through various programmatic interfaces. ## API Architecture -Run:ai is composed of a single, multi-tenant control plane. Each tenant can be connected to one or more GPU clusters. See [Run:ai system components](../home/components.md) for detailed information. +Run:ai is composed of a single, multi-tenant control plane. Each tenant can be connected to one or more GPU clusters. See [Run:ai system components](../home/overview.md#runai-system-components) for detailed information. The following programming interfaces are available: diff --git a/docs/home/components.md b/docs/home/components.md deleted file mode 100644 index 59fecb170a..0000000000 --- a/docs/home/components.md +++ /dev/null @@ -1,42 +0,0 @@ -# Run:ai System Components - -## Components - -Run:ai is made up of two components: - -* The __Run:ai cluster__ provides scheduling services and workload management. -* The __Run:ai control plane__ provides resource management, Workload submission and cluster monitoring. - -Technology-wise, both are installed over a [Kubernetes](https://kubernetes.io){target=_blank} Cluster. - -Run:ai users: - -* Researchers submit Machine Learning workloads via the Run:ai Console, the Run:ai Command-Line Interface (CLI), or directly by sending YAML files to Kubernetes. -* Administrators monitor and set priorities via the Run:ai User Interface - -![multi-cluster-architecture](img/multi-cluster-architecture.png) - -## Run:ai Cluster - -* Run:ai comes with its own Scheduler. The Run:ai scheduler extends the Kubernetes scheduler. It uses business rules to schedule workloads sent by Researchers. -* Run:ai schedules _Workloads_. Workloads include the actual researcher code running as a Kubernetes container, together with all the system resources required to run the code, such as user storage, network endpoints to access the container etc. -* The cluster uses an outbound-only, secure connection to synchronize with the Run:ai control plane. Information includes meta-data sync and various metrics on Workloads, Nodes etc. -* The Run:ai cluster is installed as a [Kubernetes Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/){target=_blank} -* Run:ai is installed in its own Kubernetes _namespace_ named __runai__ -* Workloads are run in the context of Run:ai __Projects__. Each Project is mapped to a Kubernetes namespace with its own settings and access control. - -## Run:ai Control Plane on the cloud - -The Run:ai control plane is used by multiple customers (tenants) to manage resources (such as Projects & Departments), submit Workloads and monitor multiple clusters. - -A single Run:ai customer (tenant) defined in the control-plane, can manage multiple Run:ai clusters. So a single customer, can manage mutltiple GPU clusters in multiple locations/subnets from a single interface. - -## Self-hosted Control-Plane - -The Run:ai control plane can also be locally installed. To understand the various installation options see the [installation types](../admin/runai-setup/installation-types.md) document. - - - - - - diff --git a/docs/home/documentation-library.md b/docs/home/documentation-library.md new file mode 100644 index 0000000000..0df55ef97a --- /dev/null +++ b/docs/home/documentation-library.md @@ -0,0 +1,63 @@ +# Run:ai Documentation Library + + +Welcome to the Run:ai documentation area. For an introduction about what is the Run:ai Platform see [Run:ai platform](https://www.run.ai/platform/){target=_blank} on the run.ai website. + +The Run:ai documentation is targeting four personas: + +* __Infrastructure Administrator__ - An IT person, responsible for the installation, setup and IT maintenance of the Run:ai product. Infrastructure Administrator documentation can be found [here](../admin/overview-administrator.md). + +* __Platform Administrator__ - Responsible for the day-to-day administration of the product. Platform Administrator documentation can be found [here](../platform-admin/overview.md). + + +* __Researcher__ — Using Run:ai to spin up notebooks, submit Workloads, prompt models, etc. Researcher documentation can be found [here](../Researcher/overview-researcher.md). + +* __Developer__ — Using various APIs to automate work with Run:ai. The Developer documentation can be found [here](../developer/overview-developer.md). + +## How to Get Support + +To get support use the following channels: + +* On the Run:ai user interface at `.run.ai`, use the 'Contact Support' link on the top right. + +* Or submit a ticket by clicking the button below: + +[Submit a Ticket](https://runai.secure.force.com/casesupport/CreateCaseForm){target=_blank .md-button .custom-ticket-button} + + + +## Community + +Run:ai provides its customers with access to the _Run:ai Customer Community portal_ to submit tickets, track ticket progress and update support cases. + +[Customer Community Portal](https://runai-support.force.com/community/s/){target=_blank .md-button .custom-ticket-button} + +Reach out to customer support for credentials. + + +## Run:ai Cloud Status Page + +Run:ai cloud availability is monitored at [status.run.ai](https://status.run.ai){target=_blank}. + +## Collect Logs to Send to Support + +As an IT Administrator, you can collect Run:ai logs to send to support. For more information see [logs collection](../admin/troubleshooting/logs-collection.md). + +## Example Code + +Code for the Docker images referred to on this site is available at [https://github.com/run-ai/docs/tree/master/quickstart](https://github.com/run-ai/docs/tree/master/quickstart){target=_blank}. + +The following images are used throughout the documentation: + +| Image | Description | Source | +|--------|-------------|--------| +| [runai.jfrog.io/demo/quickstart](https://runai.jfrog.io/artifactory/demo/quickstart){target=_blank} | Basic training image. Multi-GPU support | [https://github.com/run-ai/docs/tree/master/quickstart/main](https://github.com/run-ai/docs/tree/master/quickstart/main){target=_blank} | +| [runai.jfrog.io/demo/quickstart-distributed](https://runai.jfrog.io/artifactory/demo/quickstart-distributed){target=_blank} | Distributed training using MPI and Horovod | [https://github.com/run-ai/docs/tree/master/quickstart/distributed](https://github.com/run-ai/docs/tree/master/quickstart/distributed){target=_blank} | +| [zembutsu/docker-sample-nginx](https://hub.docker.com/r/zembutsu/docker-sample-nginx) | Build (interactive) with Connected Ports | [https://github.com/zembutsu/docker-sample-nginx](https://github.com/zembutsu/docker-sample-nginx){target=_blank} | +| [runai.jfrog.io/demo/quickstart-x-forwarding](https://runai.jfrog.io/artifactory/demo/quickstart-x-forwarding){target=_blank} | Use X11 forwarding from Docker image | [https://github.com/run-ai/docs/tree/master/quickstart/x-forwarding](https://github.com/run-ai/docs/tree/master/quickstart/x-forwarding){target=_blank} | +| [runai.jfrog.io/demo/pycharm-demo](https://runai.jfrog.io/artifactory/demo/pycharm-demo){target=_blank} | Image used for tool integration (PyCharm and VSCode) | [https://github.com/run-ai/docs/tree/master/quickstart/python%2Bssh](https://github.com/run-ai/docs/tree/master/quickstart/python%2Bssh){target=_blank} | +| [runai.jfrog.io/demo/example-triton-client](https://runai.jfrog.io/artifactory/demo/example-triton-client){target=_blank} and [runai.jfrog.io/demo/example-triton-server](https://runai.jfrog.io/artifactory/demo/example-triton-server){target=_blank} | Basic Inference | [https://github.com/run-ai/models/tree/main/models/triton](https://github.com/run-ai/models/tree/main/models/triton){target=_blank} | + +## Contributing to the documentation + +This documentation is made better by individuals from our customer and partner community. If you see something worth fixing, please comment at the bottom of the page or create a pull request via GitHub. The public GitHub repository can be found on the top-right of this page. diff --git a/docs/home/overview.md b/docs/home/overview.md index 0df55ef97a..c4e7a6b1c9 100644 --- a/docs/home/overview.md +++ b/docs/home/overview.md @@ -1,63 +1,89 @@ -# Run:ai Documentation Library +# Overview +Run:ai is a GPU orchestration and optimization platform that helps organizations maximize compute utilization for AI workloads. By optimizing the use of expensive compute resources, Run:ai accelerates AI development cycles, and drives faster time-to-market for AI-powered innovations. -Welcome to the Run:ai documentation area. For an introduction about what is the Run:ai Platform see [Run:ai platform](https://www.run.ai/platform/){target=_blank} on the run.ai website. +Built on Kubernetes, Run:ai supports dynamic GPU allocation, workload submission, workload scheduling, and resource sharing, ensuring that AI teams get the compute power they need while IT teams maintain control over infrastructure efficiency. -The Run:ai documentation is targeting four personas: +## Benefits of using Run:ai -* __Infrastructure Administrator__ - An IT person, responsible for the installation, setup and IT maintenance of the Run:ai product. Infrastructure Administrator documentation can be found [here](../admin/overview-administrator.md). +### Cost optimization -* __Platform Administrator__ - Responsible for the day-to-day administration of the product. Platform Administrator documentation can be found [here](../platform-admin/overview.md). +The high cost of AI infrastructure makes cost optimization a crucial consideration for organizations. Run:ai contributes to cost savings through: +* Improved hardware utilization, hence accelerating return on investment. +* Automated resource allocation, minimizing manual intervention and management overheads. +* Detailed analytics and reporting, enabling organizations to make data-driven decisions about resource provisioning and resource costs. -* __Researcher__ — Using Run:ai to spin up notebooks, submit Workloads, prompt models, etc. Researcher documentation can be found [here](../Researcher/overview-researcher.md). +### Accelerated time-to-market -* __Developer__ — Using various APIs to automate work with Run:ai. The Developer documentation can be found [here](../developer/overview-developer.md). +Run:ai's accelerates development cycles by: -## How to Get Support +* Increasing access to computational resources and shared resource pooling. +* Reducing wait times for computational resources. +* Enabling faster experimentation and streamlining the transition from development to production. -To get support use the following channels: +### Scalability -* On the Run:ai user interface at `.run.ai`, use the 'Contact Support' link on the top right. +Designed to handle large-scale GPU clusters, including thousands of nodes, and manage a high throughput of workloads, makes it ideal for extensive and demanding environments. -* Or submit a ticket by clicking the button below: +## How Run:ai helps your organization -[Submit a Ticket](https://runai.secure.force.com/casesupport/CreateCaseForm){target=_blank .md-button .custom-ticket-button} +### For infrastructure administrators +Run:ai centralizes cluster management and optimizes infrastructure control by offering: +* [**Centralized cluster management**](../admin/config/clusters.md) – Manage all clusters from a single platform, ensuring consistency and control across environments. +* [**Usage monitoring and capacity planning**](../platform-admin/performance/dashboard-analysis.md) – Gain real-time and historical insights into GPU consumption across clusters to optimize resource allocation and plan future capacity needs efficiently. +* [**Policy enforcement** ](../platform-admin/workloads/policies/overview.md)– Define and enforce security and usage policies to align GPU consumption with business and compliance requirements. +* [**Enterprise-grade authentication**](../admin/authentication/authentication-overview.md) – Integrate with your organization's identity provider for streamlined authentication (Single Sign On) and role-based access control (RBAC). +* **Kubernetes-native application** – Install as a Kubernetes-native application, seamlessly extending Kubernetes for native cloud experience and operational standards (install, upgrade, configure). -## Community +### For platform administrators -Run:ai provides its customers with access to the _Run:ai Customer Community portal_ to submit tickets, track ticket progress and update support cases. +Run:ai simplifies AI infrastructure management by providing a structured approach to managing AI initiatives, resources, and user access. It enables platform administrators maintain control, efficiency, and scalability across their infrastructure: -[Customer Community Portal](https://runai-support.force.com/community/s/){target=_blank .md-button .custom-ticket-button} +* [**AI Initiative structuring and management**](../platform-admin/aiinitiatives/overview.md#mapping-your-organization) – Map and set up AI initiatives according to your organization's structure, ensuring clear resource allocation. +* [**Centralized GPU resource management**](../platform-admin/aiinitiatives/overview.md#mapping-your-resources) – Enable seamless sharing and pooling of GPUs across multiple users, reducing idle time and optimizing utilization. +* [**User and access control** ](../platform-admin/aiinitiatives/overview.md#assigning-users-to-projects-and-departments)– Assign users (AI practitioners, ML engineers) to specific projects and departments to manage access and enforce security policies, utilizing role-based access control (RBAC) to ensure permissions align with user roles. +* [**Workload scheduling**](../Researcher/scheduling/how-the-scheduler-works.md) – Use scheduling to prioritize and allocate GPUs based on workload needs. +* [**Monitoring and insights**](../platform-admin/performance/dashboard-analysis.md) – Track real-time and historical data on GPU usage to help track resource consumption and optimize costs. -Reach out to customer support for credentials. +### For AI Practitioners +Run:ai empowers data scientists and ML engineers by providing: -## Run:ai Cloud Status Page +* [**Optimized workload scheduling**](../Researcher/scheduling/how-the-scheduler-works.md) – Ensure high-priority jobs get GPU resources. Workloads dynamically receive resources based on demand. +* [**Fractional GPU usage**](../Researcher/scheduling/fractions.md) – Request and utilize only a fraction of a GPU's memory, ensuring efficient resource allocation and leaving room for other workloads. +* [**AI initiatives lifecycle support** ](../platform-admin/workloads/overviews/introduction-to-workloads.md)– Run your entire AI initiatives lifecycle – Jupyter Notebooks, training jobs, and inference workloads efficiently. +* [**Interactive session**](../platform-admin/workloads/overviews/workload-types.md) – Ensure an uninterrupted experience when working on Jupyter Notebooks without taking away GPUs. +* [**Scalability for training and inference**](../platform-admin/workloads/overviews/workload-types.md) – Support for distributed training across multiple GPUs and auto-scales inference workloads. +* [**Integrations**](../platform-admin/integrations/integration-overview.md) – Integrate with popular ML frameworks - PyTorch, TensorFlow, XGBoost, Knative, Spark, Kubeflow Pipelines, Apache Airflow, Argo workloads, Ray and more. +* [**Flexible workload submission** ](../platform-admin/workloads/overviews/introduction-to-workloads.md) – Submit workloads using the Run:ai UI, API, CLI or run third-party workloads. -Run:ai cloud availability is monitored at [status.run.ai](https://status.run.ai){target=_blank}. +## Run:ai system components -## Collect Logs to Send to Support +Run:ai is made up of two components both installed over a [Kubernetes](https://kubernetes.io) cluster: -As an IT Administrator, you can collect Run:ai logs to send to support. For more information see [logs collection](../admin/troubleshooting/logs-collection.md). +* **Run:ai cluster** – Provides Scheduling and workload management, extending Kubernetes native capabilities. +* **Run:ai control plane** – Provides resource management, handles workload submission and provides cluster monitoring and analytics. -## Example Code +![multi-cluster-architecture](img/multi-cluster-architecture.png) -Code for the Docker images referred to on this site is available at [https://github.com/run-ai/docs/tree/master/quickstart](https://github.com/run-ai/docs/tree/master/quickstart){target=_blank}. +### Run:ai cluster -The following images are used throughout the documentation: +The Run:ai cluster is responsible for scheduling AI workloads and efficiently allocating GPU resources across users and projects: -| Image | Description | Source | -|--------|-------------|--------| -| [runai.jfrog.io/demo/quickstart](https://runai.jfrog.io/artifactory/demo/quickstart){target=_blank} | Basic training image. Multi-GPU support | [https://github.com/run-ai/docs/tree/master/quickstart/main](https://github.com/run-ai/docs/tree/master/quickstart/main){target=_blank} | -| [runai.jfrog.io/demo/quickstart-distributed](https://runai.jfrog.io/artifactory/demo/quickstart-distributed){target=_blank} | Distributed training using MPI and Horovod | [https://github.com/run-ai/docs/tree/master/quickstart/distributed](https://github.com/run-ai/docs/tree/master/quickstart/distributed){target=_blank} | -| [zembutsu/docker-sample-nginx](https://hub.docker.com/r/zembutsu/docker-sample-nginx) | Build (interactive) with Connected Ports | [https://github.com/zembutsu/docker-sample-nginx](https://github.com/zembutsu/docker-sample-nginx){target=_blank} | -| [runai.jfrog.io/demo/quickstart-x-forwarding](https://runai.jfrog.io/artifactory/demo/quickstart-x-forwarding){target=_blank} | Use X11 forwarding from Docker image | [https://github.com/run-ai/docs/tree/master/quickstart/x-forwarding](https://github.com/run-ai/docs/tree/master/quickstart/x-forwarding){target=_blank} | -| [runai.jfrog.io/demo/pycharm-demo](https://runai.jfrog.io/artifactory/demo/pycharm-demo){target=_blank} | Image used for tool integration (PyCharm and VSCode) | [https://github.com/run-ai/docs/tree/master/quickstart/python%2Bssh](https://github.com/run-ai/docs/tree/master/quickstart/python%2Bssh){target=_blank} | -| [runai.jfrog.io/demo/example-triton-client](https://runai.jfrog.io/artifactory/demo/example-triton-client){target=_blank} and [runai.jfrog.io/demo/example-triton-server](https://runai.jfrog.io/artifactory/demo/example-triton-server){target=_blank} | Basic Inference | [https://github.com/run-ai/models/tree/main/models/triton](https://github.com/run-ai/models/tree/main/models/triton){target=_blank} | +* [**Run:ai Scheduler**](../Researcher/scheduling/the-runai-scheduler.md) – Applies AI-aware rules to efficiently schedule workloads submitted by AI practitioners. +* [**Workload management**](../platform-admin/workloads/overviews/introduction-to-workloads.md) – Handles workload management which includes the researcher code running as a Kubernetes container and the system resources required to run the code, such as storage, credentials, network endpoints to access the container and so on. +* [**Kubernetes operator-based deployment** ](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)– Installed as a Kubernetes Operator to automate deployment, upgrades and configuration of Run:ai cluster services. +* **Storage** – Supports Kubernetes-native storage using [Storage Classes](https://kubernetes.io/docs/concepts/storage/storage-classes/), allowing organizations to bring their own storage solutions. Additionally, it also integrates with [external storage solutions](../platform-admin/workloads/assets/overview.md) such as Git, S3, and NFS to support various data requirements. +* **Secured communication** – Uses an outbound-only, secured (SSL) connection to synchronize with the Run:ai control plane. +* **Private** – Run:ai only synchronizes metadata and operational metrics (e.g., workloads, nodes) with the control plane. No proprietary data, model artifacts, or user data sets are ever transmitted, ensuring full data privacy and security. -## Contributing to the documentation +### Run:ai control plane -This documentation is made better by individuals from our customer and partner community. If you see something worth fixing, please comment at the bottom of the page or create a pull request via GitHub. The public GitHub repository can be found on the top-right of this page. +The Run:ai control plane provides a centralized management interface for organizations to oversee their GPU infrastructure across multiple locations/subnets, accessible via Web UI, [API ](api-reference/)and [CLI.](cli-reference/) The control plane can be deployed on the cloud or on-premise for organizations that require local control over their infrastructure (self-hosted). Self-hosted installation supports both connected and air-gapped. See [self-hosted installation](https://app.gitbook.com/s/Lalm8gxJMp5dA0whA4ip/overview) for more details. + +* [**Multi-cluster management** ](../admin/config/clusters.md)– Manages multiple Run:ai clusters for a single tenant across different locations and subnets from a single unified interface. +* [**Resource and access management** ](../platform-admin/aiinitiatives/overview.md)– Allows administrators to define Projects, Departments and user roles, enforcing policies for fair resource distribution. +* [**Workload submission and monitoring**](../platform-admin/workloads/overviews/managing-workloads.md) – Allows teams to submit workloads, track usage, and monitor GPU performance in real time. diff --git a/mkdocs.yml b/mkdocs.yml index 91c98ae869..2e2bd0b35c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -188,7 +188,7 @@ plugins: nav: - Home: - 'Overview': 'home/overview.md' - - 'System Components' : 'home/components.md' + - 'Documentation Library' : 'home/documentation-library.md' - 'Whats New' : - 'Run:ai SaaS Updates' : 'home/saas-updates.md' - 'Version 2.20' : 'home/whats-new-2-20.md' From 095371feaab5d64fdfa4706ae0460b03688b7efb Mon Sep 17 00:00:00 2001 From: Sherin Date: Wed, 26 Feb 2025 12:08:16 +0200 Subject: [PATCH 2/4] Update overview.md --- docs/home/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/home/overview.md b/docs/home/overview.md index c4e7a6b1c9..7942b60d68 100644 --- a/docs/home/overview.md +++ b/docs/home/overview.md @@ -82,7 +82,7 @@ The Run:ai cluster is responsible for scheduling AI workloads and efficiently al ### Run:ai control plane -The Run:ai control plane provides a centralized management interface for organizations to oversee their GPU infrastructure across multiple locations/subnets, accessible via Web UI, [API ](api-reference/)and [CLI.](cli-reference/) The control plane can be deployed on the cloud or on-premise for organizations that require local control over their infrastructure (self-hosted). Self-hosted installation supports both connected and air-gapped. See [self-hosted installation](https://app.gitbook.com/s/Lalm8gxJMp5dA0whA4ip/overview) for more details. +The Run:ai control plane provides a centralized management interface for organizations to oversee their GPU infrastructure across multiple locations/subnets, accessible via Web UI, [API](../developer/overview-developer.md) and [CLI](../Researcher/cli-reference/). The control plane can be deployed on the cloud or on-premise for organizations that require local control over their infrastructure - self-hosted. Self-hosted installation supports both connected and air-gapped. See [self-hosted installation](../admin/runai-setup/self-hosted/overview.md) for more details. * [**Multi-cluster management** ](../admin/config/clusters.md)– Manages multiple Run:ai clusters for a single tenant across different locations and subnets from a single unified interface. * [**Resource and access management** ](../platform-admin/aiinitiatives/overview.md)– Allows administrators to define Projects, Departments and user roles, enforcing policies for fair resource distribution. From 5ca386c357d10dc5d684f4cb233c70fffeb1dba4 Mon Sep 17 00:00:00 2001 From: Sherin Date: Wed, 26 Feb 2025 12:14:03 +0200 Subject: [PATCH 3/4] Update overview.md --- docs/home/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/home/overview.md b/docs/home/overview.md index 7942b60d68..200e71a5b9 100644 --- a/docs/home/overview.md +++ b/docs/home/overview.md @@ -82,7 +82,7 @@ The Run:ai cluster is responsible for scheduling AI workloads and efficiently al ### Run:ai control plane -The Run:ai control plane provides a centralized management interface for organizations to oversee their GPU infrastructure across multiple locations/subnets, accessible via Web UI, [API](../developer/overview-developer.md) and [CLI](../Researcher/cli-reference/). The control plane can be deployed on the cloud or on-premise for organizations that require local control over their infrastructure - self-hosted. Self-hosted installation supports both connected and air-gapped. See [self-hosted installation](../admin/runai-setup/self-hosted/overview.md) for more details. +The Run:ai control plane provides a centralized management interface for organizations to oversee their GPU infrastructure across multiple locations/subnets, accessible via Web UI, [API](../developer/overview-developer.md) and [CLI](../Researcher/cli-reference/new-cli/runai.md). The control plane can be deployed on the cloud or on-premise for organizations that require local control over their infrastructure - self-hosted. Self-hosted installation supports both connected and air-gapped. See [self-hosted installation](../admin/runai-setup/self-hosted/overview.md) for more details. * [**Multi-cluster management** ](../admin/config/clusters.md)– Manages multiple Run:ai clusters for a single tenant across different locations and subnets from a single unified interface. * [**Resource and access management** ](../platform-admin/aiinitiatives/overview.md)– Allows administrators to define Projects, Departments and user roles, enforcing policies for fair resource distribution. From 51e059677374bec19df074c546db43dd66589566 Mon Sep 17 00:00:00 2001 From: Sherin Date: Wed, 26 Feb 2025 13:00:33 +0200 Subject: [PATCH 4/4] Update overview.md --- docs/home/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/home/overview.md b/docs/home/overview.md index 200e71a5b9..93ffe518d9 100644 --- a/docs/home/overview.md +++ b/docs/home/overview.md @@ -48,7 +48,7 @@ Run:ai simplifies AI infrastructure management by providing a structured approac * [**Workload scheduling**](../Researcher/scheduling/how-the-scheduler-works.md) – Use scheduling to prioritize and allocate GPUs based on workload needs. * [**Monitoring and insights**](../platform-admin/performance/dashboard-analysis.md) – Track real-time and historical data on GPU usage to help track resource consumption and optimize costs. -### For AI Practitioners +### For AI practitioners Run:ai empowers data scientists and ML engineers by providing: