|
| 1 | +--- |
| 2 | +title: Data Volumes |
| 3 | +summary: This article is about what is a data volume and how to configure them for use in the Run:ai platform. |
| 4 | +authors: |
| 5 | + - Jason Novich |
| 6 | +date: 2024-Jun-19 |
| 7 | +--- |
| 8 | + |
| 9 | +Data Volumes offer a powerful solution for storing, managing, and sharing AI training data within your Run.ai environment. This functionality promotes collaboration, simplifies data access control, and streamlines the AI development lifecycle. |
| 10 | + |
| 11 | +## What are Data Volumes |
| 12 | + |
| 13 | +Data Volumes are snapshots of datasets stored in Kubernetes Persistent Volume Claims (PVCs). They act as a central repository for training data, and offer several key benefits. |
| 14 | + |
| 15 | +* Managed with dedicated permissions—Data admins, a new role within Run.ai, have exclusive control over data volume creation, data population, and sharing. |
| 16 | +* Shared between multiple scopes—Unlike other Run:ai data sources, data volumes can be shared across projects, departments, or clusters. This promotes data reuse and collaboration within your organization. |
| 17 | +* Coupled to workloads in the submission process— Similar to other Run:ai data sources, Data volumes can be easily attached to AI workloads during submission, specifying the data path within the workload environment. |
| 18 | + |
| 19 | +!!! Note |
| 20 | + Data volumes are not versioned. |
| 21 | + |
| 22 | +## Data volumes use cases |
| 23 | + |
| 24 | +The following are typical use cases for Data Volumes: |
| 25 | + |
| 26 | +* Sharing large data sets with multiple researchers in my organization—Sometimes we have data located in a remote location. After moving it inside the cluster, sharing it easily with multiple users is still hard. Data volumes can help you do that seamlessly and with maximum security and control |
| 27 | +* Sharing data created during the AI work cycle—When it is needed to share training results, generated data sets or other artifacts with our team members. Data volume helps you take your data and share it with your colleagues. |
| 28 | + |
| 29 | +## Data volumes authorization |
| 30 | + |
| 31 | +There is now a new role called `Data Volumes Administrator` which contains the following two sets of permissions and allows you to manage your Data Volumes easily. |
| 32 | + |
| 33 | +Data Volumes administrator contains two permission entities: |
| 34 | + |
| 35 | +* Data volumes - CRUD |
| 36 | +* Data volumes - sharing list - CRUD |
| 37 | + |
| 38 | +Data volumes (should have the origin project in the scope) |
| 39 | + |
| 40 | +* Can create DV in the scope |
| 41 | +* Can read DV in the scope |
| 42 | +* Can update DV in the scope |
| 43 | +* Can delete DV in the scope (even if DV is shared out of its scope) |
| 44 | + |
| 45 | +Data volumes - sharing list |
| 46 | + |
| 47 | +* Can Share DV in the scope |
| 48 | +* Can unshare DV from the scope |
| 49 | + |
| 50 | +### Data volume administrator permissions |
| 51 | + |
| 52 | +| Entity | Permissions | |
| 53 | +| --- | --- | |
| 54 | +| Data volumes | CRUD | |
| 55 | +| Data volumes - sharing list | CRUD | |
| 56 | +| Account | R | |
| 57 | +| Department | R | |
| 58 | +| Project | R | |
| 59 | +| Jobs | R | |
| 60 | +| Workloads | R | |
| 61 | +| Cluster | R | |
| 62 | +| Overview dashboard | R | |
| 63 | +| Consumption dashboard | R | |
| 64 | +| Analytics dashboard | R | |
| 65 | +| Policies | R | |
| 66 | +| workloads | R | |
| 67 | +| Workspaces | R | |
| 68 | +| Trainings | R | |
| 69 | +| Environments | R | |
| 70 | +| Compute resources | R | |
| 71 | +| Templates | R | |
| 72 | +| Data source | R | |
| 73 | +| Inferences | R | |
| 74 | + |
| 75 | +### Data volume permissions for each role |
| 76 | + |
| 77 | +| Role | DV permissions | |
| 78 | +| --- | --- | |
| 79 | +| Data volume administrator | DV CRUD, Sharing CRUD | |
| 80 | +| System administrator | DV CRUD, Sharing CRUD | |
| 81 | +| Department admin | DV CRUD, Sharing CRUD | |
| 82 | +| Department viewer | DV R, Sharing R | |
| 83 | +| Researcher manager | DV CRUD, Sharing CRUD | |
| 84 | +| Editor | DV CRUD | |
| 85 | +| L1 | DV CRUD | |
| 86 | +| L2 | DV R | |
| 87 | +| ML engineer | DV R | |
| 88 | +| Assets admins | DV R | |
| 89 | +| Application admin | DV R | |
| 90 | +| Cloud operator | DV CRUD, Sharing CRUD | |
| 91 | +| Viewer | DV R | |
| 92 | + |
| 93 | +## Using Data volumes |
| 94 | + |
| 95 | +This section outlines the procedure for creating, sharing, and submitting (Researcher) data volumes. |
| 96 | + |
| 97 | +### Creating Data Volumes |
| 98 | + |
| 99 | +!!! Note |
| 100 | + Data volume admins can create data volumes within specific projects. Since data volumes are created from PVCs, there has to be a PVC in the namespace of a run:ai project for Run:Ai to have access to it and create the Data volume from it. Once the DV is created, the admin manages its sharing configurations. |
| 101 | + |
| 102 | +Data Volumes are created using the API endpoint. |
| 103 | + |
| 104 | +[Data Volumes](https://app.run.ai/api/docs#tag/Data-Volumes) |
| 105 | + |
| 106 | +### Sharing Data volumes |
| 107 | + |
| 108 | +Sharing permissions is a sub-entity of the Data volume management permissions. Meaning they can be assigned independently. A user can have permission to create a DV but not to share it and vice versa. A data volume can be shared with one or multiple scopes. In all the scopes that the DV is shared, it can be used by the users in their workloads. |
0 commit comments