Skip to content

Commit c2eb3a7

Browse files
committed
Merge pull request #887 from run-ai/reference-fixes
Reference fixes
1 parent bf22d88 commit c2eb3a7

File tree

8 files changed

+6
-29
lines changed

8 files changed

+6
-29
lines changed

docs/Researcher/Walkthroughs/quickstart-overview.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ Follow the Quickstart documents below to learn more:
77
* [Interactive build sessions with externalized services](walkthrough-build-ports.md)
88
* [Using GPU Fractions](walkthrough-fractions.md)
99
* [Distributed Training](walkthrough-distributed-training.md)
10-
* [Hyperparameter Optimization](walkthrough-hpo.md)
1110
* [Over-Quota, Basic Fairness & Bin Packing](walkthrough-overquota.md)
1211
* [Fairness](walkthrough-queue-fairness.md)
1312
* [Inference](quickstart-inference.md)

docs/Researcher/best-practices/env-variables.md

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,6 @@ Run:ai provides the following environment variables:
1313
Note that the Job can be deleted and then recreated with the same name. A Job UUID will be different even if the Job names are the same.
1414

1515

16-
## Identifying a Pod
17-
18-
With [Hyperparameter Optimization](../Walkthroughs/walkthrough-hpo.md), experiments are run as _Pods_ within the Job. Run:ai provides the following environment variables to identify the Pod.
19-
20-
* ``POD_INDEX`` - An index number (0, 1, 2, 3....) for a specific Pod within the Job. This is useful for Hyperparameter Optimization to allow easy mapping to individual experiments. The Pod index will remain the same if restarted (due to a failure or preemption). Therefore, it can be used by the Researcher to identify experiments.
21-
* ``POD_UUID`` - a unique identifier for the Pod. if the Pod is restarted, the Pod UUID will change.
22-
2316
## GPU Allocation
2417

2518
Run:ai provides an environment variable, visible inside the container, to help identify the number of GPUs allocated for the container. Use `RUNAI_NUM_OF_GPUS`

docs/Researcher/cli-reference/runai-submit.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,6 @@ runai submit --name frac05 -i gcr.io/run-ai-demo/quickstart -g 0.5
5050

5151
(see: [GPU fractions Quickstart](../Walkthroughs/walkthrough-fractions.md)).
5252

53-
Hyperparameter Optimization
54-
55-
```console
56-
runai submit --name hpo1 -i gcr.io/run-ai-demo/quickstart-hpo -g 1 \
57-
--parallelism 3 --completions 12 -v /nfs/john/hpo:/hpo
58-
```
59-
60-
(see: [hyperparameter optimization Quickstart](../Walkthroughs/walkthrough-hpo.md)).
6153

6254
Submit a Job without a name (automatically generates a name)
6355

docs/Researcher/scheduling/the-runai-scheduler.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -226,5 +226,3 @@ To search for good hyperparameters, Researchers typically start a series of smal
226226

227227
With HPO, the Researcher provides a single script that is used with multiple, varying, parameters. Each run is a *pod* (see definition above). Unlike Gang Scheduling, with HPO, pods are **independent**. They are scheduled independently, started, and end independently, and if preempted, the other pods are unaffected. The scheduling behavior for individual pods is exactly as described in the Scheduler Details section above for Jobs.
228228
In case node pools are enabled, if the HPO workload has been assigned with more than one node pool, the different pods might end up running on different node pools.
229-
230-
For more information on Hyperparameter Optimization in Run:ai see [here](../Walkthroughs/walkthrough-hpo.md)

docs/admin/troubleshooting/cluster-health-check.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ kubectl get cm runai-public -oyaml
186186

187187
### Resources not deployed / System unavailable / Reconciliation failed
188188

189-
1. Run the [Preinstall diagnostic script](cluster-prerequisites.md#pre-install-script) and check for issues.
189+
1. Run the [Preinstall diagnostic script](../runai-setup/cluster-setup/cluster-prerequisites.md#pre-install-script) and check for issues.
190190
2. Run
191191

192192
```

docs/admin/workloads/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,8 +121,8 @@ To get the full experience of Run:ai’s environment and platform use the follow
121121

122122
* [Workspaces](../../Researcher/user-interface/workspaces/overview.md#getting-familiar-with-workspaces)
123123
* [Trainings](../../Researcher/user-interface/trainings.md#trainings) (Only available when using the *Jobs* view)
124-
* [Distributed trainings](../../Researcher/user-interface/trainings.md#trainings)
125-
* [Deployment](../admin-ui-setup/deployments.md#viewing-and-submitting-deployments)
124+
* [Distributed training](../../Researcher/user-interface/trainings.md#trainings)
125+
* Deployments.
126126

127127
## Supported integrations
128128

docs/admin/workloads/inference-overview.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,12 @@ Run:ai provides *Inference* services as an equal part together with the other tw
3030

3131
* Multiple replicas will appear in Run:ai as a single *Inference* workload. The workload will appear in all Run:ai dashboards and views as well as the Command-line interface.
3232

33-
* Inference workloads can be submitted via Run:ai [user interface](../admin-ui-setup/deployments.md) as well as [Run:ai API](../../developer/cluster-api/workload-overview-dev.md). Internally, spawning an Inference workload also creates a Kubernetes *Service*. The service is an end-point to which clients can connect.
33+
* Inference workloads can be submitted via Run:ai user interface as well as [Run:ai API](../../developer/cluster-api/workload-overview-dev.md). Internally, spawning an Inference workload also creates a Kubernetes *Service*. The service is an end-point to which clients can connect.
3434

3535
## Autoscaling
3636

3737
To withstand SLA, *Inference* workloads are typically set with *auto scaling*. Auto-scaling is the ability to add more computing power (Kubernetes pods) when the load increases and shrink allocated resources when the system is idle.
38-
39-
There are a number of ways to trigger autoscaling. Run:ai supports the following:
38+
There are several ways to trigger autoscaling. Run:ai supports the following:
4039

4140
| Metric | Units | Run:ai name |
4241
|-----------------|--------------|-----------------|
@@ -45,7 +44,7 @@ There are a number of ways to trigger autoscaling. Run:ai supports the following
4544

4645
The Minimum and Maximum number of replicas can be configured as part of the autoscaling configuration.
4746

48-
Autoscaling also supports a scale to zero policy with *Throughput* and *Concurrency* metrics, meaning that given enough time under the target threshold, the number of replicas will be scaled down to 0.
47+
Autoscaling also supports a scale-to-zero policy with *Throughput* and *Concurrency* metrics, meaning that given enough time under the target threshold, the number of replicas will be scaled down to 0.
4948

5049
This has the benefit of conserving resources at the risk of a delay from "cold starting" the model when traffic resumes.
5150

mkdocs.yml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,9 +113,6 @@ plugins:
113113
'admin/runai-setup/cluster-setup/researcher-authentication.md' : 'admin/runai-setup/authentication/sso.md'
114114
'admin/researcher-setup/cli-troubleshooting.md' : 'admin/troubleshooting/troubleshooting.md'
115115
'developer/deprecated/inference/submit-via-yaml.md' : 'developer/cluster-api/other-resources.md'
116-
'Researcher/researcher-library/rl-hpo-support.md' : 'Researcher/scheduling/hpo.md'
117-
'Researcher/researcher-library/researcher-library-overview.md' : 'Researcher/scheduling/hpo.md'
118-
119116
nav:
120117
- Home:
121118
- 'Overview': 'index.md'
@@ -217,7 +214,6 @@ nav:
217214
- 'Dashboard Analysis' : 'admin/admin-ui-setup/dashboard-analysis.md'
218215
- 'Jobs' : 'admin/admin-ui-setup/jobs.md'
219216
- 'Credentials' : 'admin/admin-ui-setup/credentials-setup.md'
220-
- 'Deployments' : 'admin/admin-ui-setup/deployments.md'
221217
- 'Templates': 'admin/admin-ui-setup/templates.md'
222218
- 'Troubleshooting' :
223219
- 'Cluster Health' : 'admin/troubleshooting/cluster-health-check.md'

0 commit comments

Comments
 (0)