From 0222156af3f3a5ae14bb231bcd51388c4c129b05 Mon Sep 17 00:00:00 2001 From: Yaron Date: Tue, 6 Aug 2024 13:44:52 +0300 Subject: [PATCH] fix-anchors --- docs/Researcher/scheduling/GPU-time-slicing-scheduler.md | 2 +- docs/Researcher/scheduling/dynamic-gpu-fractions.md | 2 +- docs/admin/performance/dashboard-analysis.md | 2 +- docs/admin/runai-setup/cluster-setup/cluster-install.md | 4 ++-- .../runai-setup/cluster-setup/cluster-prerequisites.md | 2 +- docs/admin/runai-setup/cluster-setup/cluster-upgrade.md | 2 +- docs/admin/runai-setup/config/dr.md | 2 +- docs/admin/runai-setup/config/ha.md | 4 ++-- docs/admin/runai-setup/config/org-cert.md | 2 +- docs/admin/runai-setup/maintenance/node-downtime.md | 2 +- docs/admin/runai-setup/self-hosted/k8s/backend.md | 2 +- docs/admin/runai-setup/self-hosted/k8s/cluster.md | 2 +- docs/admin/runai-setup/self-hosted/k8s/preparations.md | 2 +- .../admin/runai-setup/self-hosted/k8s/project-management.md | 2 +- docs/admin/runai-setup/self-hosted/k8s/upgrade.md | 4 ++-- docs/admin/runai-setup/self-hosted/ocp/cluster.md | 2 +- docs/admin/runai-setup/self-hosted/ocp/upgrade.md | 6 +++--- docs/admin/troubleshooting/troubleshooting.md | 4 ++-- docs/admin/workloads/README.md | 2 +- docs/developer/deprecated/inference/setup.md | 2 +- docs/home/whats-new-2-13.md | 4 ++-- docs/home/whats-new-2-15.md | 6 +++--- docs/home/whats-new-2-16.md | 3 +-- docs/home/whats-new-2-17.md | 2 +- graveyard/whats-new-2-14.md | 6 +++--- 25 files changed, 36 insertions(+), 37 deletions(-) diff --git a/docs/Researcher/scheduling/GPU-time-slicing-scheduler.md b/docs/Researcher/scheduling/GPU-time-slicing-scheduler.md index 2f210b0b29..371a938646 100644 --- a/docs/Researcher/scheduling/GPU-time-slicing-scheduler.md +++ b/docs/Researcher/scheduling/GPU-time-slicing-scheduler.md @@ -11,7 +11,7 @@ Run:ai supports simultaneous submission of multiple workloads to a single GPU wh ## New Time-slicing scheduler by Run:ai -To provide customers with predictable and accurate GPU compute resources scheduling, Run:ai is introducing a new feature called Time-slicing GPU scheduler which adds **fractional compute** capabilities on top of other existing Run:ai **memory fractions** capabilities. Unlike the default NVIDIA GPU orchestrator which doesn’t provide the ability to split or limit the runtime of each workload, Run:ai created a new mechanism that gives each workload **exclusive** access to the full GPU for a **limited** amount of time ([lease time](#timeslicing-plan-and-lease-times)) in each scheduling cycle ([plan time](#timeslicing-plan-and-lease-times)). This cycle repeats itself for the lifetime of the workload. +To provide customers with predictable and accurate GPU compute resources scheduling, Run:ai is introducing a new feature called Time-slicing GPU scheduler which adds **fractional compute** capabilities on top of other existing Run:ai **memory fractions** capabilities. Unlike the default NVIDIA GPU orchestrator which doesn’t provide the ability to split or limit the runtime of each workload, Run:ai created a new mechanism that gives each workload **exclusive** access to the full GPU for a **limited** amount of time ([lease time](#time-slicing-plan-and-lease-times)) in each scheduling cycle ([plan time](#timeslicing-plan-and-lease-times)). This cycle repeats itself for the lifetime of the workload. Using the GPU runtime this way guarantees a workload is granted its requested GPU compute resources proportionally to its requested GPU fraction. diff --git a/docs/Researcher/scheduling/dynamic-gpu-fractions.md b/docs/Researcher/scheduling/dynamic-gpu-fractions.md index a5841dfb37..3d29b1648c 100644 --- a/docs/Researcher/scheduling/dynamic-gpu-fractions.md +++ b/docs/Researcher/scheduling/dynamic-gpu-fractions.md @@ -71,7 +71,7 @@ The supported values depend on the label used. You can use them in either the UI ## Compute Resources UI with Dynamic Fractions support To enable the UI elements for Dynamic Fractions, press *Settings*, *General*, then open the *Resources* pane and toggle *GPU Resource Optimization*. This enables all the UI features related to *GPU Resource Optimization* for the whole tenant. There are other per cluster or per node-pool configurations that should be configured in order to use the capabilities of ‘GPU Resource Optimization’ See the documentation for each of these features. -Once the ‘GPU Resource Optimization’ feature is enabled, you will be able to create *Compute Resources* with the *GPU Portion (Fraction)* Limit and *GPU Memory Limit*. In addition, you will be able to view the workloads’ utilization vs. Request and Limit parameters in the [Metrics](../../admin/workloads/submitting-workloads.md#workloads-table) pane for each workload. +Once the ‘GPU Resource Optimization’ feature is enabled, you will be able to create *Compute Resources* with the *GPU Portion (Fraction)* Limit and *GPU Memory Limit*. In addition, you will be able to view the workloads’ utilization vs. Request and Limit parameters in the Metrics pane for each workload. ![GPU Limit](img/GPU-resource-limit-enabled.png) diff --git a/docs/admin/performance/dashboard-analysis.md b/docs/admin/performance/dashboard-analysis.md index a57ac7318a..f05db08995 100644 --- a/docs/admin/performance/dashboard-analysis.md +++ b/docs/admin/performance/dashboard-analysis.md @@ -25,7 +25,7 @@ These dashboards give system administrators the ability to drill down to see det There are 5 dashboards: -* [**GPU/CPU Overview**](#gpucpu-overview-dashboard) dashboard—Provides information about what is happening right now in the cluster. +* [**GPU/CPU Overview**](#gpucpu-overview-dashboard-new-and-legacy) dashboard—Provides information about what is happening right now in the cluster. * [**Quota Management**](#quota-management-dashboard) dashboard—Provides information about quota utilization. * [**Analytics**](#analytics-dashboard) dashboard—Provides long term analysis of cluster behavior. * [**Multi-Cluster Overview**](#multi-cluster-overview-dashboard) dashboard—Provides a more holistic, multi-cluster view of what is happening right now. The dashboard is intended for organizations that have more than one connected cluster. diff --git a/docs/admin/runai-setup/cluster-setup/cluster-install.md b/docs/admin/runai-setup/cluster-setup/cluster-install.md index 29402d5c57..9a7fa3aa4d 100644 --- a/docs/admin/runai-setup/cluster-setup/cluster-install.md +++ b/docs/admin/runai-setup/cluster-setup/cluster-install.md @@ -33,7 +33,7 @@ On the next page: ## Verify your cluster's health * Verify that the cluster status in the Run:ai Control Plane's [Clusters Table](#cluster-table) is `Connected`. -* Go to the [Overview Dashboard](../../performance/dashboard-analysis.md#overview-dashboard) and verify that the number of GPUs on the top right reflects your GPU resources on your cluster and the list of machines with GPU resources appears on the bottom line. +* Go to the [Overview Dashboard](../../performance/dashboard-analysis.md#gpucpu-overview-dashboard-new-and-legacy) and verify that the number of GPUs on the top right reflects your GPU resources on your cluster and the list of machines with GPU resources appears on the bottom line. * In case of issues, see the [Troubleshooting guide](../../troubleshooting/cluster-health-check.md). ## Researcher Authentication @@ -69,7 +69,7 @@ The following table describes the different statuses that a cluster could be in. | Service issues | At least one of the *Services* is not working properly. You can view the list of nonfunctioning services for more information | | Connected | All services are connected and up and running. | -See the [Troubleshooting guide](../../troubleshooting/cluster-health-check.md#verifying-cluster-health) to help troubleshoot issues in the cluster. +See the [Troubleshooting guide](../../troubleshooting/cluster-health-check.md) to help troubleshoot issues in the cluster. ## Customize your installation diff --git a/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md b/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md index e3b39aa8b7..58b5de1852 100644 --- a/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md +++ b/docs/admin/runai-setup/cluster-setup/cluster-prerequisites.md @@ -69,7 +69,7 @@ For information on supported versions of managed Kubernetes, it's important to c For an up-to-date end-of-life statement of Kubernetes see [Kubernetes Release History](https://kubernetes.io/releases/){target=_blank}. !!! Note - Run:ai allows scheduling of Jobs with PVCs. See for example the command-line interface flag [--pvc-new](../../../Researcher/cli-reference/runai-submit.md#new-pvc-stringarray). A Job scheduled with a PVC based on a specific type of storage class (a storage class with the property `volumeBindingMode` equals to `WaitForFirstConsumer`) will [not work](https://kubernetes.io/docs/concepts/storage/storage-capacity/){target=_blank} on Kubernetes 1.23 or lower. + Run:ai allows scheduling of Jobs with PVCs. See for example the command-line interface flag [--pvc-new](../../../Researcher/cli-reference/runai-submit.md#--new-pvc--stringarray). A Job scheduled with a PVC based on a specific type of storage class (a storage class with the property `volumeBindingMode` equals to `WaitForFirstConsumer`) will [not work](https://kubernetes.io/docs/concepts/storage/storage-capacity/){target=_blank} on Kubernetes 1.23 or lower. #### Pod Security Admission diff --git a/docs/admin/runai-setup/cluster-setup/cluster-upgrade.md b/docs/admin/runai-setup/cluster-setup/cluster-upgrade.md index 1e90b20fc2..0f4bd4c0f3 100644 --- a/docs/admin/runai-setup/cluster-setup/cluster-upgrade.md +++ b/docs/admin/runai-setup/cluster-setup/cluster-upgrade.md @@ -71,7 +71,7 @@ The process: ## Verify Successful Installation -See [Verify your installation](cluster-install.md#verify-your-installation) on how to verify a Run:ai cluster installation +See [Verify your installation](cluster-install.md#verify-your-clusters-health) on how to verify a Run:ai cluster installation diff --git a/docs/admin/runai-setup/config/dr.md b/docs/admin/runai-setup/config/dr.md index 4549781c72..1edd04ffe2 100644 --- a/docs/admin/runai-setup/config/dr.md +++ b/docs/admin/runai-setup/config/dr.md @@ -33,7 +33,7 @@ Run:ai stores metric history using [Thanos](https://github.com/thanos-io/thanos) ### Backing up Control-Plane Configuration -The installation of the Run:ai control plane can be [configured](../self-hosted/k8s/backend.md#optional-additional-configurations). The configuration is provided as `--set` command in the helm installation. These changes will be preserved on upgrade, but will not be preserved on uninstall or on damage to Kubernetes. Thus, it is best to back up these customizations. For a list of customizations used during the installation, run: +The installation of the Run:ai control plane can be [configured](../self-hosted/k8s/backend.md#additional-runai-configurations-optional). The configuration is provided as `--set` command in the helm installation. These changes will be preserved on upgrade, but will not be preserved on uninstall or upon damage to Kubernetes. Thus, it is best to back up these customizations. For a list of customizations used during the installation, run: `helm get values runai-backend -n runai-backend` diff --git a/docs/admin/runai-setup/config/ha.md b/docs/admin/runai-setup/config/ha.md index 2368a0321b..6035af31d3 100644 --- a/docs/admin/runai-setup/config/ha.md +++ b/docs/admin/runai-setup/config/ha.md @@ -11,7 +11,7 @@ A different scenario is a high transaction load, leading to system overload. To ### Run:ai system workers -The Run:ai control plane allows the **optional** [gathering of Run:ai pods into specific nodes](../self-hosted/k8s/preparations.md#optional-mark-runai-system-workers). When this feature is used, it is important to set more than one node as a Run:ai system worker. Otherwise, the horizontal scaling described below will not span multiple nodes, and the system will remain with a single point of failure. +The Run:ai control plane allows the **optional** [gathering of Run:ai pods into specific nodes](../self-hosted/k8s/preparations.md#mark-runai-system-workers-optional). When this feature is used, it is important to set more than one node as a Run:ai system worker. Otherwise, the horizontal scaling described below will not span multiple nodes, and the system will remain with a single point of failure. ### Horizontal Scalability of Run:ai services @@ -40,7 +40,7 @@ Run:ai uses three third parties which are managed as Kubernetes StatefulSets: ### Run:ai system workers -The Run:ai cluster allows the **mandatory** [gathering of Run:ai pods into specific nodes](../self-hosted/k8s/preparations.md#optional-mark-runai-system-workers). When this feature is used, it is important to set more than one node as a Run:ai system worker. Otherwise, the horizontal scaling described below may not span multiple nodes, and the system will remain with a single point of failure. +The Run:ai cluster allows the **mandatory** [gathering of Run:ai pods into specific nodes](../self-hosted/k8s/preparations.md#mark-runai-system-workers-optional). When this feature is used, it is important to set more than one node as a Run:ai system worker. Otherwise, the horizontal scaling described below may not span multiple nodes, and the system will remain with a single point of failure. ### Prometheus diff --git a/docs/admin/runai-setup/config/org-cert.md b/docs/admin/runai-setup/config/org-cert.md index 12cd79c656..0ad2aac17a 100644 --- a/docs/admin/runai-setup/config/org-cert.md +++ b/docs/admin/runai-setup/config/org-cert.md @@ -24,7 +24,7 @@ kubectl -n runai-backend create secret generic runai-ca-cert \ --from-file=runai-ca.pem= ``` -* As part of the installation instructions you need to create a secret for [runai-backend-tls](../self-hosted/k8s/backend.md#domain-certificate). Use the local certificate authority instead. +* As part of the installation instructions, you need to create a secret for [runai-backend-tls](../self-hosted/k8s/preparations.md#domain-certificate). Use the local certificate authority instead. * Install the control plane, add the following flag to the helm command `--set global.customCA.enabled=true` ## Cluster Installation diff --git a/docs/admin/runai-setup/maintenance/node-downtime.md b/docs/admin/runai-setup/maintenance/node-downtime.md index bf9292a428..6df0118d54 100644 --- a/docs/admin/runai-setup/maintenance/node-downtime.md +++ b/docs/admin/runai-setup/maintenance/node-downtime.md @@ -64,7 +64,7 @@ kubectl taint nodes runai=drain:NoExecute- kubectl delete node ``` -However, if you plan to bring back the node, you will need to rejoin the node into the cluster. See [Rejoin](#Rejoin-a-Node-into-the-Kubernetes-Cluster). +However, if you plan to bring back the node, you will need to rejoin the node into the cluster. See [Rejoin](#rejoin-a-node-into-the-kubernetes-cluster). diff --git a/docs/admin/runai-setup/self-hosted/k8s/backend.md b/docs/admin/runai-setup/self-hosted/k8s/backend.md index 19d4ad2563..0da4d6b440 100644 --- a/docs/admin/runai-setup/self-hosted/k8s/backend.md +++ b/docs/admin/runai-setup/self-hosted/k8s/backend.md @@ -17,7 +17,7 @@ Run the helm command below: --set global.domain= # (1) ``` - 1. Domain name described [here](prerequisites.md#domain-name). + 1. Domain name described [here](preparations.md#domain-certificate). !!! Info To install a specific version, add `--version ` to the install command. You can find available versions by running `helm search repo -l runai-backend`. diff --git a/docs/admin/runai-setup/self-hosted/k8s/cluster.md b/docs/admin/runai-setup/self-hosted/k8s/cluster.md index 31e3969863..1ba5754586 100644 --- a/docs/admin/runai-setup/self-hosted/k8s/cluster.md +++ b/docs/admin/runai-setup/self-hosted/k8s/cluster.md @@ -22,7 +22,7 @@ Install prerequisites as per [cluster prerequisites](../../cluster-setup/cluster * Do not add the helm repository and do not run `helm repo update`. * Instead, edit the `helm upgrade` command. * Replace `runai/runai-cluster` with `runai-cluster-.tgz`. - * Add `--set global.image.registry=` where the registry address is as entered in the [preparation section](./preparations.md#runai-software-files) + * Add `--set global.image.registry=` where the registry address is as entered in the [preparation section](./preparations.md#software-artifacts) The command should look like the following: diff --git a/docs/admin/runai-setup/self-hosted/k8s/preparations.md b/docs/admin/runai-setup/self-hosted/k8s/preparations.md index e834a975f9..483f754864 100644 --- a/docs/admin/runai-setup/self-hosted/k8s/preparations.md +++ b/docs/admin/runai-setup/self-hosted/k8s/preparations.md @@ -96,7 +96,7 @@ kubectl label node node-role.kubernetes.io/runai-system=true ### External Postgres database (optional) -If you have opted to use an [external PostgreSQL database](prerequisites.md#external-postgresql-database-optional), you need to perform initial setup to ensure successful installation. Follow these steps: +If you have opted to use an [external PostgreSQL database](prerequisites.md#external-postgres-database-optional), you need to perform initial setup to ensure successful installation. Follow these steps: 1. Create a SQL script file, edit the parameters below, and save it locally: * Replace `` with a dedicate database name for RunAi in your PostgreSQL database. diff --git a/docs/admin/runai-setup/self-hosted/k8s/project-management.md b/docs/admin/runai-setup/self-hosted/k8s/project-management.md index 3dae532faa..0a8942352b 100644 --- a/docs/admin/runai-setup/self-hosted/k8s/project-management.md +++ b/docs/admin/runai-setup/self-hosted/k8s/project-management.md @@ -22,7 +22,7 @@ This process may **need to be altered** if, Run:ai allows the **association** of a Run:ai Project with any existing Kubernetes namespace: -* When [setting up](cluster.md#customize-installation) a Run:ai cluster, Disable namespace creation by setting the cluster flag `createNamespaces` to `false`. +* When [setting up](cluster.md#optional-customize-installation) a Run:ai cluster, Disable namespace creation by setting the cluster flag `createNamespaces` to `false`. * Using the Run:ai User Interface, create a new Project ``. A namespace will **not** be created. * Associate and existing namepace `` with the Run:ai project by running: diff --git a/docs/admin/runai-setup/self-hosted/k8s/upgrade.md b/docs/admin/runai-setup/self-hosted/k8s/upgrade.md index cb321ab766..2bae1b4b07 100644 --- a/docs/admin/runai-setup/self-hosted/k8s/upgrade.md +++ b/docs/admin/runai-setup/self-hosted/k8s/upgrade.md @@ -30,7 +30,7 @@ If you are installing an air-gapped version of Run:ai, The Run:ai tar file conta === "Airgapped" * Ask for a tar file `runai-air-gapped-.tar.gz` from Run:ai customer support. The file contains the new version you want to upgrade to. `` is the updated version of the Run:ai control plane. - * Upload the images as described [here](preparations.md#runai-software-files). + * Upload the images as described [here](preparations.md#software-artifacts). ## Before upgrade @@ -94,7 +94,7 @@ kubectl delete ing -n runai-backend runai-backend-ingress The Run:ai control-plane installation has been rewritten and is no longer using a _backend values file_. Instead, to customize the installation use standard `--set` flags. If you have previously customized the installation, you must now extract these customizations and add them as `--set` flag to the helm installation: * Find previous customizations to the control plane if such exist. Run:ai provides a utility for that here `https://raw.githubusercontent.com/run-ai/docs/v2.13/install/backend/cp-helm-vals-diff.sh`. For information on how to use this utility please contact Run:ai customer support. -* Search for the customizations you found in the [optional configurations](./backend.md#optional-additional-configurations) table and add them in the new format. +* Search for the customizations you found in the [optional configurations](./backend.md#additional-runai-configurations-optional) table and add them in the new format. ## Upgrade Control Plane diff --git a/docs/admin/runai-setup/self-hosted/ocp/cluster.md b/docs/admin/runai-setup/self-hosted/ocp/cluster.md index 1b20c02767..e290fd6f08 100644 --- a/docs/admin/runai-setup/self-hosted/ocp/cluster.md +++ b/docs/admin/runai-setup/self-hosted/ocp/cluster.md @@ -48,7 +48,7 @@ The last namespace (`runai-scale-adjust`) is only required if the cluster is a c * Do not add the helm repository and do not run `helm repo update`. * Instead, edit the `helm upgrade` command. * Replace `runai/runai-cluster` with `runai-cluster-.tgz`. - * Add `--set global.image.registry=` where the registry address is as entered in the [preparation section](./preparations.md#runai-software-files) + * Add `--set global.image.registry=` where the registry address is as entered in the [preparation section](./preparations.md#software-artifacts) * Add `--set global.customCA.enabled=true` and perform the instructions for [local certificate authority](../../config/org-cert.md). The command should look like the following: diff --git a/docs/admin/runai-setup/self-hosted/ocp/upgrade.md b/docs/admin/runai-setup/self-hosted/ocp/upgrade.md index 3423cd2f59..f12e8a9564 100644 --- a/docs/admin/runai-setup/self-hosted/ocp/upgrade.md +++ b/docs/admin/runai-setup/self-hosted/ocp/upgrade.md @@ -29,7 +29,7 @@ If you are installing an air-gapped version of Run:ai, The Run:ai tar file conta === "Airgapped" * Ask for a tar file `runai-air-gapped-.tar.gz` from Run:ai customer support. The file contains the new version you want to upgrade to. `` is the updated version of the Run:ai control plane. - * Upload the images as described [here](preparations.md#runai-software-files). + * Upload the images as described [here](preparations.md#software-artifacts). ## Before upgrade @@ -47,7 +47,7 @@ kubectl delete secret -n runai-backend runai-backend-postgresql kubectl delete sts -n runai-backend keycloak runai-backend-postgresql ``` -Then upgrade the control plane as described [below](#upgrade-the-control-plane). Before upgrading, find customizations and merge them as discussed below. +Then upgrade the control plane as described [below](#upgrade-control-plane). Before upgrading, find customizations and merge them as discussed below. ### Upgrade from version 2.9, 2.10 or 2.11 @@ -72,7 +72,7 @@ kubectl patch pvc -n runai-backend pvc-postgresql -p '{"metadata": {"annotation The Run:ai control-plane installation has been rewritten and is no longer using a _backend values file_. Instead, to customize the installation use standard `--set` flags. If you have previously customized the installation, you must now extract these customizations and add them as `--set` flag to the helm installation: * Find previous customizations to the control plane if such exist. Run:ai provides a utility for that here `https://raw.githubusercontent.com/run-ai/docs/v2.13/install/backend/cp-helm-vals-diff.sh`. For information on how to use this utility please contact Run:ai customer support. -* Search for the customizations you found in the [optional configurations](./backend.md#optional-additional-configurations) table and add them in the new format. +* Search for the customizations you found in the [optional configurations](./backend.md#additional-runai-configurations-optional) table and add them in the new format. ## Upgrade Control Plane diff --git a/docs/admin/troubleshooting/troubleshooting.md b/docs/admin/troubleshooting/troubleshooting.md index c1bd740216..f826fbe7e2 100644 --- a/docs/admin/troubleshooting/troubleshooting.md +++ b/docs/admin/troubleshooting/troubleshooting.md @@ -61,7 +61,7 @@ Add verbosity to Prometheus as describe [here](diagnostics.md).Verify that there are no errors. If there are connectivity-related errors you may need to: - * Check your firewall for outbound connections. See the required permitted URL list in [Network requirements](../runai-setup/cluster-setup/cluster-prerequisites.md#network-access-requirements.md). + * Check your firewall for outbound connections. See the required permitted URL list in [Network requirements](../runai-setup/cluster-setup/cluster-prerequisites.md#network-access-requirements). * If you need to set up an internet proxy or certificate, please contact Run:ai customer support. @@ -250,7 +250,7 @@ __Resolution__ * Run: `runai pods -n runai | grep agent`. See that the agent is in _Running_ state. Select the agent's full name and run: `kubectl logs -n runai runai-agent-`. - * Verify that there are no errors. If there are connectivity-related errors you may need to check your firewall for outbound connections. See the required permitted URL list in [Network requirements](../runai-setup/cluster-setup/cluster-prerequisites.md#network-requirements). + * Verify that there are no errors. If there are connectivity-related errors you may need to check your firewall for outbound connections. See the required permitted URL list in [Network requirements](../runai-setup/cluster-setup/cluster-prerequisites.md#network-access-requirements). * If you need to set up an internet proxy or certificate, please contact Run:ai customer support. ??? "Jobs are not syncing" diff --git a/docs/admin/workloads/README.md b/docs/admin/workloads/README.md index 125df62201..d19550f04d 100644 --- a/docs/admin/workloads/README.md +++ b/docs/admin/workloads/README.md @@ -31,7 +31,7 @@ Third party integrations are tools that Run:ai supports and manages. These are t 1. Smart gang scheduling (workload aware). 2. Specific workload aware visibility so that different kinds of pods are identified as a single workload (for example, GPU Utilization, workload view, dashboards). -For more information, see [Supported integrations](#supported-integrations). +For more information, see [Supported integrations](#third-party-integrations). ### Typical Kubernetes workloads diff --git a/docs/developer/deprecated/inference/setup.md b/docs/developer/deprecated/inference/setup.md index 7e620d67fc..2d76550693 100644 --- a/docs/developer/deprecated/inference/setup.md +++ b/docs/developer/deprecated/inference/setup.md @@ -47,7 +47,7 @@ kubectl get pods -n runai --selector=app=runai-mps-server -o wide * Verify that all mps-server pods are in `Running` state. -* Submit a workload with MPS enabled using the [--mps](../../../Researcher/cli-reference/runai-submit.md#mig-profile-string) flag. Then run: +* Submit a workload with MPS enabled using the [--mps](../../../Researcher/cli-reference/runai-submit.md#--mig-profile-string) flag. Then run: ``` runai list diff --git a/docs/home/whats-new-2-13.md b/docs/home/whats-new-2-13.md index 4617fce223..18ebbb8128 100644 --- a/docs/home/whats-new-2-13.md +++ b/docs/home/whats-new-2-13.md @@ -145,8 +145,8 @@ The association between workspaces and node pools is done using *Compute resourc ## Installation * The manual process of upgrading Kubernetes CRDs is no longer needed when upgrading to the most recent version (2.13) of Run:ai. -* From Run:ai 2.12 and above, the control-plane installation has been simplified and no longer requires the creation of a *backend values* file. Instead, install directly using `helm` as described in [Install the Run:ai Control Plane](../admin/runai-setup/self-hosted/k8s/backend.md#install-the-control-plane). -* From Run:ai 2.12 and above, the air-gapped, control-plane installation now generates a `custom-env.yaml` values file during the [preparation](../admin/runai-setup/self-hosted/k8s/preparations.md#prepare-installation-artifacts) stage. This is used when installing the [control-plane](../admin/runai-setup/self-hosted/k8s/backend.md#install-the-control-plane). +* From Run:ai 2.12 and above, the control-plane installation has been simplified and no longer requires the creation of a *backend values* file. Instead, install directly using `helm` as described in [Install the Run:ai Control Plane](../admin/runai-setup/self-hosted/k8s/backend.md#install-the-runai-control-plane). +* From Run:ai 2.12 and above, the air-gapped, control-plane installation now generates a `custom-env.yaml` values file during the [preparation](../admin/runai-setup/self-hosted/k8s/preparations.md#software-artifacts) stage. This is used when installing the [control-plane](../admin/runai-setup/self-hosted/k8s/backend.md#install-the-runai-control-plane). ### Known issues diff --git a/docs/home/whats-new-2-15.md b/docs/home/whats-new-2-15.md index be70fcf204..4c823b01dc 100644 --- a/docs/home/whats-new-2-15.md +++ b/docs/home/whats-new-2-15.md @@ -38,7 +38,7 @@ date: 2023-Dec-3 * Improved support for Kubeflow Notebooks. Run:ai now supports the scheduling of Kubeflow notebooks with fractional GPUs. Kubeflow notebooks are identified automatically and appear with a dedicated icon in the *Jobs* UI. * Improved the *Trainings* and *Workspaces* forms. Now the runtime field for *Command* and *Arguments* can be edited directly in the new *Workspace* or *Training* creation form. -* Added new functionality to the Run:ai CLI that allows submitting a workload with multiple service types at the same time in a CSV style format. Both the CLI and the UI now offer the same functionality. For more information, see [runai submit](../Researcher/cli-reference/runai-submit.md#s----service-type-string). +* Added new functionality to the Run:ai CLI that allows submitting a workload with multiple service types at the same time in a CSV style format. Both the CLI and the UI now offer the same functionality. For more information, see [runai submit](../Researcher/cli-reference/runai-submit.md#-s----service-type-string). * Improved functionality in the `runai submit` command so that the port for the container is specified using the `nodeport` flag. For more information, see `runai submit` [--service-type](../Researcher/cli-reference/runai-submit.md#-s-service-type-string) `nodeport`. #### Credentials @@ -51,7 +51,7 @@ date: 2023-Dec-3 #### Volumes and Storage -* Added support for Ephemeral volumes in *Workspaces*. Ephemeral storage is temporary storage that gets wiped out and lost when the workspace is deleted. Adding Ephemeral storage to a workspace ties that storage to the lifecycle of the *Workspace* to which it was added. Ephemeral storage is added to the *Workspace* configuration form in the *Volume* pane. For configuration information, see [Create a new workspace](../Researcher/user-interface/workspaces/create/workspace-v2.md#create-a-new-workspace). +* Added support for Ephemeral volumes in *Workspaces*. Ephemeral storage is temporary storage that gets wiped out and lost when the workspace is deleted. Adding Ephemeral storage to a workspace ties that storage to the lifecycle of the *Workspace* to which it was added. Ephemeral storage is added to the *Workspace* configuration form in the *Volume* pane. For configuration information, see [Create a new workspace](../Researcher/user-interface/workspaces/create/workspace-v2.md). #### Templates @@ -67,7 +67,7 @@ date: 2023-Dec-3 #### Auto Delete Jobs -* Added new functionality to the UI and CLI that provides configuration options to automatically delete jobs after a specified amount of time upon completion. Auto-deletion provides more efficient use of resources and makes it easier for researchers to manage their jobs. For more configuration options in the UI, see *Auto deletion* (Step 9) in [Create a new workspace](../Researcher/user-interface/workspaces/create/workspace-v2.md#create-a-new-workspace). For more information on the CLI flag, see [--auto-deletion-time-after-completion](../Researcher/cli-reference/runai-submit.md#-auto-deletion-time-after-completion). +* Added new functionality to the UI and CLI that provides configuration options to automatically delete jobs after a specified amount of time upon completion. Auto-deletion provides more efficient use of resources and makes it easier for researchers to manage their jobs. For more configuration options in the UI, see *Auto deletion* (Step 9) in [Create a new workspace](../Researcher/user-interface/workspaces/create/workspace-v2.md). For more information on the CLI flag, see [--auto-deletion-time-after-completion](../Researcher/cli-reference/runai-submit.md#-auto-deletion-time-after-completion). ### Run:ai Administrator diff --git a/docs/home/whats-new-2-16.md b/docs/home/whats-new-2-16.md index fad9843629..19c3ba3db9 100644 --- a/docs/home/whats-new-2-16.md +++ b/docs/home/whats-new-2-16.md @@ -42,7 +42,6 @@ date: 2023-Dec-4 * Added a chart displaying the number of free GPUs per node. Free GPU are GPUs that have not been allocated to a workload. * Added a dashlet that displays the total vs. ready resources for GPUs and CPUs. The dashlet indicates how many total nodes are in the platform, and how many are available. - For more information, see [Total and Ready GPU or CPU Nodes](../admin/performance/dashboard-analysis.md#total-and-ready-gpu-or-cpu-nodes). * Added additional columns to the consumption report for both *Projects* and *Departments* tables. The new columns are: @@ -58,7 +57,7 @@ date: 2023-Dec-4 #### Policies -* Added new *Policy Manager. The new *Policy Manager* provides administrators the ability to impose restrictions and default vaules on system resources. The new *Policy Manager* provides a YAML editor for configuration of the policies. Administrators can easily add both *Workspace* or *Training* policies. The editor makes it easy to see the configuration that has been applied and provides a quick and easy method to edit the policies. The new *Policy Editor* brings other important policy features such as the ability to see non-compliant resources in workloads. For more information, see [Policies](../admin/workloads/policies/README.md#policies). +* Added new *Policy Manager. The new *Policy Manager* provides administrators the ability to impose restrictions and default values on system resources. The new *Policy Manager* provides a YAML editor for the configuration of the policies. Administrators can easily add both *Workspace* or *Training* policies. The editor makes it easy to see the configuration that has been applied and provides a quick and easy method to edit the policies. The new *Policy Editor* brings other important policy features such as the ability to see non-compliant resources in workloads. For more information, see [Policies](../admin/workloads/policies/README.md#policy-editor-ui). * Added a new policy manager. Enabling the *New Policy Manager* provides new tools to discover how resources are not compliant. Non-compliant resources and will appear greyed out and cannot be selected. To see how a resource is not compliant, press on the clipboard icon in the upper right hand corner of the resource. Policies can also be applied to specific scopes within the Run:ai platform. For more information, see [Viewing Project Policies](../admin/aiinitiatives/org/projects.md#adding-a-new-project). diff --git a/docs/home/whats-new-2-17.md b/docs/home/whats-new-2-17.md index b41968086f..050f6fdc86 100644 --- a/docs/home/whats-new-2-17.md +++ b/docs/home/whats-new-2-17.md @@ -126,7 +126,7 @@ Deprecated features will be available for **two** versions ahead of the notifica ### API support and endpoint deprecations -The endpoints and parameters specified in the API reference are the ones that are officially supported by Run:ai. For more information about Run:ai's API support policy and deprecation process, see [Developer overview](../developer/overview-developer.md#administrator-api). +The endpoints and parameters specified in the API reference are the ones that are officially supported by Run:ai. For more information about Run:ai's API support policy and deprecation process, see [Developer overview](../developer/overview-developer.md#control-plane-api). #### Deprecated APIs and API fields diff --git a/graveyard/whats-new-2-14.md b/graveyard/whats-new-2-14.md index 28fdac3746..cacf8cd78a 100644 --- a/graveyard/whats-new-2-14.md +++ b/graveyard/whats-new-2-14.md @@ -11,7 +11,7 @@ TODO Add RBAC old--new conversion table here. --> ### Auto delete jobs -* Added new functionality to the UI and CLI that provides configuration options which automatically delete jobs after a specified amount of time. Auto-deletion provides more efficient use of resources and makes it easier for researchers to manage their jobs. For more configuration options in the UI, see *Auto deletion* (Step 9) in [Create a new workspace](../Researcher/user-interface/workspaces/create/workspace-v2.md#create-a-new-workspace). For more information on the CLI flag, see [--auto-deletion-time-after-completion](../Researcher/cli-reference/runai-submit.md). +* Added new functionality to the UI and CLI that provides configuration options which automatically delete jobs after a specified amount of time. Auto-deletion provides more efficient use of resources and makes it easier for researchers to manage their jobs. For more configuration options in the UI, see *Auto deletion* (Step 9) in [Create a new workspace](../Researcher/user-interface/workspaces/create/workspace-v2.md). For more information on the CLI flag, see [--auto-deletion-time-after-completion](../Researcher/cli-reference/runai-submit.md). ### Multiple service types @@ -26,7 +26,7 @@ TODO Add RBAC old--new conversion table here. --> ### Ephemeral volumes -* Added support for Ephemeral volumes in *Workspaces*. Ephemeral storage is tied to the lifecycle of the *Workspace*, which is temporary storage that gets wiped out and lost when the workspace is deleted. Ephemeral storage is added to the *Workspace* configuration form in the *Volume* pane. For configuration information, see [Create a new workspace](../Researcher/user-interface/workspaces/create/workspace-v2.md#create-a-new-workspace). +* Added support for Ephemeral volumes in *Workspaces*. Ephemeral storage is tied to the lifecycle of the *Workspace*, which is temporary storage that gets wiped out and lost when the workspace is deleted. Ephemeral storage is added to the *Workspace* configuration form in the *Volume* pane. For configuration information, see [Create a new workspace](../Researcher/user-interface/workspaces/create/workspace-v2.md). ### Email notifications @@ -34,7 +34,7 @@ TODO Add RBAC old--new conversion table here. --> ### CLI improvements -* Improved functionality in the `runai submit` command so that the port for the container is specified using the `nodeport` flag. For more information, see `runai submit`, [--service-type](../Researcher/cli-reference/runai-submit.md#s----service-type-string) `nodeport`. +* Improved functionality in the `runai submit` command so that the port for the container is specified using the `nodeport` flag. For more information, see `runai submit`, [--service-type](../Researcher/cli-reference/runai-submit.md) `nodeport`. ### Policy improvements