Skip to content

Small fixes #924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/Researcher/best-practices/researcher-notifications.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ date: 2024-Jul-4

Managing numerous data science workloads requires monitoring various stages, including submission, scheduling, initialization, execution, and completion. Additionally, handling suspensions and failures is crucial for ensuring timely workload completion. Email Notifications address this need by sending alerts for critical workload life cycle changes. This empowers data scientists to take necessary actions and prevent delays.

Once the system administrator configures the email notifications, users will receive notifications about their jobs that transition from one status to another. In addition, the user will get warning notifications before workload termination due to project-defined timeouts. Details included in the email are:
Once the system administrator [configures the email notifications](../../admin/runai-setup/notifications/notifications.md), users will receive notifications about their jobs that transition from one status to another. In addition, the user will get warning notifications before workload termination due to project-defined timeouts. Details included in the email are:

* Workload type
* Project and cluster information
Expand Down
4 changes: 2 additions & 2 deletions docs/Researcher/cli-reference/runai-submit-dist-TF.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ runai submit-dist tf --name distributed-job --workers=2 -g 1 \

#### --create-home-dir

> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/authentication/non-root-containers.md).

#### -e `<stringArray> | --environment `<stringArray>`

Expand Down Expand Up @@ -335,7 +335,7 @@ runai submit-dist tf --name distributed-job --workers=2 -g 1 \

#### --run-as-user

> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is _root_ (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is _root_ (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/authentication/non-root-containers.md).

### Scheduling

Expand Down
4 changes: 2 additions & 2 deletions docs/Researcher/cli-reference/runai-submit-dist-mpi.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ You can start an unattended mpi training Job of name dist1, based on Project *te

#### --create-home-dir

> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/authentication/non-root-containers.md).

#### -e `<stringArray> | --environment `<stringArray>`

Expand Down Expand Up @@ -334,7 +334,7 @@ You can start an unattended mpi training Job of name dist1, based on Project *te

#### --run-as-user

> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is _root_ (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is _root_ (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/authentication/non-root-containers.md).

### Scheduling

Expand Down
4 changes: 2 additions & 2 deletions docs/Researcher/cli-reference/runai-submit-dist-pytorch.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ runai submit-dist pytorch --name distributed-job --workers=2 -g 1 \

#### --create-home-dir

> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/authentication/non-root-containers.md).

#### -e `<stringArray> | --environment `<stringArray>`

Expand Down Expand Up @@ -342,7 +342,7 @@ runai submit-dist pytorch --name distributed-job --workers=2 -g 1 \

#### --run-as-user

> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is _root_ (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is _root_ (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/authentication/non-root-containers.md).

### Scheduling

Expand Down
4 changes: 2 additions & 2 deletions docs/Researcher/cli-reference/runai-submit-dist-xgboost.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ runai submit-dist xgboost --name distributed-job --workers=2 -g 1 \

#### --create-home-dir

> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/authentication/non-root-containers.md).

#### -e `<stringArray> | --environment `<stringArray>`

Expand Down Expand Up @@ -326,7 +326,7 @@ runai submit-dist xgboost --name distributed-job --workers=2 -g 1 \

#### --run-as-user

> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is _root_ (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is _root_ (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/authentication/non-root-containers.md).

### Scheduling

Expand Down
4 changes: 2 additions & 2 deletions docs/Researcher/cli-reference/runai-submit.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1

#### --create-home-dir

> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/authentication/non-root-containers.md).

#### -e `<stringArray>` | --environment `<stringArray>`

Expand Down Expand Up @@ -400,7 +400,7 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1

#### --run-as-user

> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is *root* (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is *root* (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/authentication/non-root-containers.md).

### Scheduling

Expand Down
2 changes: 1 addition & 1 deletion docs/Researcher/overview-researcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Researcher Documentation Overview
---
# Overview: Researcher Documentation

Researchers use Run:ai to submit jobs.
_Researchers_, or _AI practitioners_, use Run:ai to submit Workloads.

As part of the Researcher documentation you will find:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ then run `id`, you will see the **root** user.

## Use Run:ai flags to limit root access

There are two [runai submit](../../../Researcher/cli-reference/runai-submit.md) flags which control user identity at the Researcher level:
There are two [runai [submit](../../Researcher/cli-reference/runai-submit.md) flags that control user identity at the Researcher level:

* The flag `--run-as-user` starts the container with a specific user. The user is the current Linux user (see below for other behaviors if used in conjunction with Single sign-on).
* The flag `--prevent-privilege-escalation` prevents the container from elevating its own privileges into `root` (e.g. running `sudo` or changing system files.).
Expand Down Expand Up @@ -50,7 +50,7 @@ then verify that you cannot run `su` to become root within the container.
### Setting a Cluster-Wide Default


The two flags are voluntary. They are not enforced by the system. It is however possible to enforce them using [Policies](../../workloads/policies/policies.md). Polices allow an Administrator to force compliance on both the User Interface and Command-line interface.
The two flags are voluntary. They are not enforced by the system. It is however possible to enforce them using [Policies](../workloads/policies/policies.md). Policies allow an Administrator to force compliance on both the User Interface and Command-line interface.


## Passing user identity
Expand All @@ -60,7 +60,7 @@ A best practice is to store the user identifier (UID) and the group identifier (

To perform this, you must:

* Set up [single sign-on](../../authentication/authentication-overview.md). Perform the steps for UID/GID integration.
* Set up [single sign-on](authentication-overview.md). Perform the steps for UID/GID integration.
* Run: `runai login` and enter your credentials
* Use the flag --run-as-user

Expand Down
10 changes: 5 additions & 5 deletions docs/admin/overview-administrator.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ The Infrastructure Administrator is an IT person, responsible for the installati
As part of the Infrastructure Administrator documentation you will find:

* Install Run:ai
* How to set up and modify a GPU cluster with Run:ai.
* Set up a Run:ai Cluster.
* Set up Researchers to work with Run:ai.
* Configure the Run:ai system
* Setup users by connecting Run:ai to an identity provider.
* IT maintenance of the Run:ai system
* Troubleshooting Run:ai and understanding cluster health.
* IT Configuration of the Run:ai system
* Connect Run:ai to an identity provider.
* Maintenance & monitoring of the Run:ai system
* Troubleshooting.
9 changes: 6 additions & 3 deletions docs/admin/runai-setup/config/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,12 @@ This section provides a list of installation-related articles dealing with a wid
| Article | Purpose |
|---------------------------------------------------------|-----------|
| [Designating Specific Role Nodes](node-roles.md) | Set one or more designated Run:ai system nodes or limit Run:ai monitoring and scheduling to specific nodes in the cluster. |
| [Setup Project-based Researcher Access Control](../../authentication/researcher-authentication.md) | Enable Run:ai access control is at the __Project__ level. |
| [Single sign-on](../../authentication/authentication-overview.md) | Integrate with the organization's Identity Provider to provide single sign-on for Run:ai |
| [Review Kubernetes Access provided to Run:ai](access-roles.md) | In Restrictive Kubernetes environments such as when using OpenShift, understand and control what Kubernetes roles are provided to Run:ai |
| [External access to Containers](allow-external-access-to-containers.md) | Understand the available options for Researchers to access containers from the outside |
| [User Identity in Container](non-root-containers.md) | The identity of the user in the container determines its access to cluster resources. The document explains multiple way on how to propagate the user identity into the container. |
| [Install the Run:ai Administrator Command-line Interface](cli-admin-install.md) | The Administrator command-line is useful in a variety of flows such as cluster upgrade, node setup etc. |
| [Set Node affinity with cloud node pools](node-affinity-with-cloud-node-pools.md) | Set node affinity when using a cloud provider for your cluster |
| [Local Certificate Authority](org-cert.md) | For self-hosted Run:ai environments, specifically air-gapped installation, setup a local certificate authority to allow customers to safely connect to Run:ai |
| [Backup & Restore](dr.md) | For self-hosted Run:ai environments, set up a scheduled backup of Run:ai data |
| [High Availability](ha.md) | Configure Run:ai such that it will continue to provide service even if parts of the system are down. |
| [Scaling](large-clusters.md) | Scale the Run:ai cluster and the Run:ai control-plane to withstand large transaction loads |
| [Emails and system notification](../notifications/notifications.md) | Configure e-mail notification |
2 changes: 1 addition & 1 deletion docs/home/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Run:ai cloud availability is monitored at [status.run.ai](https://status.run.ai)

As an IT Administrator, you can collect Run:ai logs to send to support:

* Install the [Run:ai Administrator command-line interface](admin/runai-setup/config/cli-admin-install.md).
* Install the [Run:ai Administrator command-line interface](../admin/runai-setup/config/cli-admin-install.md).
* Run `runai-adm collect-logs`. The command will generate a compressed file containing all of the existing Run:ai log files.

!!! Note
Expand Down
2 changes: 1 addition & 1 deletion docs/platform-admin/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The Platform Administrator is responsible for the day-to-day administration of t
As part of the Platform Administrator documentation you will find:


* Provide the right access to system users.
* Provide the right access level to users.
* Configure Run:ai meta-data such as Projects, Departments, Node pools etc.
* Setup Workload Policies and Assets
* Analyze system performance and perform suggested actions.
4 changes: 2 additions & 2 deletions docs/snippets/common-submit-cli-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@

#### --create-home-dir

> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../admin/runai-setup/config/non-root-containers.md).
> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../admin/authentication/non-root-containers.md).

#### -e `<stringArray> | --environment `<stringArray>`

Expand Down Expand Up @@ -265,7 +265,7 @@

#### --run-as-user

> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is _root_ (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../admin/runai-setup/config/non-root-containers.md).
> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is _root_ (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../admin/authentication/non-root-containers.md).

### Scheduling

Expand Down
Loading
Loading