|
2 | 2 |
|
3 | 3 | ## Getting CoCos |
4 | 4 |
|
5 | | -CoCos is found on the [CoCos repository](https://github.com/ultravioletrs/cocos). You should fork the repository in order to make changes to the repository. After forking the repository, you can clone it as follows: |
| 5 | +1. Fork the [CoCos repository](https://github.com/ultravioletrs/cocos) to your GitHub account. |
| 6 | +2. Clone your fork: |
6 | 7 |
|
7 | | -```shell |
8 | | -git clone <forked repository> $SOMEPATH/cocos |
9 | | -cd $SOMEPATH/cocos |
10 | | -``` |
| 8 | + ```shell |
| 9 | + git clone <your-fork-url> $SOMEPATH/cocos |
| 10 | + cd $SOMEPATH/cocos |
| 11 | + ``` |
11 | 12 |
|
12 | | -## Building |
| 13 | +## Build Environment |
13 | 14 |
|
14 | | -### Prerequisites |
| 15 | +The project uses Go and Protocol Buffers. Make sure the following tools are installed: |
15 | 16 |
|
| 17 | +- [Go](https://go.dev/doc/install) 1.20 or later |
16 | 18 | - [Protocol Buffers](https://grpc.io/docs/languages/go/quickstart/) |
17 | | -- [Golang](https://go.dev/doc/install) |
| 19 | +- [GNU Make](https://www.gnu.org/software/make/) |
| 20 | +- [QEMU-KVM](https://www.qemu.org/) for running local VMs |
| 21 | +- Optional: [Buildroot](https://buildroot.org/) when building the HAL image |
| 22 | + |
| 23 | +### Building All Services |
18 | 24 |
|
19 | | -### Build All Services |
| 25 | +Run `make` in the repository root to compile the Agent, CLI and Manager. Artifacts are placed in the `build` directory. You can also build a single component: |
20 | 26 |
|
21 | | -Use the GNU Make tool to build all CoCos services `make`. Build artifacts will be put in the build directory. |
| 27 | +```shell |
| 28 | +make cli # produces ./build/cocos-cli |
| 29 | +make manager # produces ./build/cocos-manager |
| 30 | +make agent # produces ./build/cocos-agent |
| 31 | +``` |
22 | 32 |
|
23 | | -### Building HAL |
| 33 | +### Building the HAL Image |
24 | 34 |
|
25 | | -To build the custom linux image that will host agent, run: |
| 35 | +The HAL is a minimal Linux distribution used inside the confidential VM. To build it, clone Buildroot and run: |
26 | 36 |
|
27 | 37 | ```shell |
28 | 38 | git clone https://github.com/buildroot/buildroot.git |
29 | 39 | cd buildroot |
30 | | -git checkout 2024.11-rc2 |
| 40 | +git checkout 2024.11-rc2 |
31 | 41 | make BR2_EXTERNAL=../cocos/hal/linux cocos_defconfig |
32 | | -make menuconfig #optional for additional configuration |
| 42 | +make menuconfig # optional, for additional configuration |
33 | 43 | make |
34 | 44 | ``` |
35 | 45 |
|
36 | | -#### Testing HAL image |
| 46 | +The kernel image and root filesystem appear in `buildroot/output/images`. Copy `bzImage` and `rootfs.cpio.gz` to `cmd/manager/img` when testing locally. |
37 | 47 |
|
38 | | -##### Launch the VM |
| 48 | +### Testing the HAL Image |
39 | 49 |
|
40 | | -To launch the virtual machine containing agent for testing purposes, run: |
| 50 | +After building, you can boot a VM that runs the Agent using QEMU. Substitute the paths for your system: |
41 | 51 |
|
42 | 52 | ```shell |
43 | 53 | sudo find / -name OVMF_CODE.fd |
44 | | -# => /usr/share/OVMF/OVMF_CODE.fd |
45 | 54 | OVMF_CODE=/usr/share/OVMF/OVMF_CODE.fd |
46 | | - |
47 | 55 | sudo find / -name OVMF_VARS.fd |
48 | | -# => /usr/share/OVMF/OVMF_VARS.fd |
49 | 56 | OVMF_VARS=/usr/share/OVMF/OVMF_VARS.fd |
50 | 57 |
|
51 | | -KERNEL="buildroot/output/images/bzImage" |
52 | | -INITRD="buildroot/output/images/rootfs.cpio.gz" |
| 58 | +KERNEL=buildroot/output/images/bzImage |
| 59 | +INITRD=buildroot/output/images/rootfs.cpio.gz |
| 60 | +IGVM=svsm/bin/coconut-qemu.igvm |
| 61 | +ENV_PATH=<path>/<to>/<env_directory> |
| 62 | +CERT_PATH=<path>/<to>/<cert_directory> |
53 | 63 |
|
54 | | -qemu-system-x86_64 \ |
| 64 | +sudo qemu-system-x86_64 \ |
55 | 65 | -enable-kvm \ |
56 | 66 | -cpu EPYC-v4 \ |
57 | 67 | -machine q35 \ |
58 | | - -smp 4 \ |
59 | | - -m 25G,slots=5,maxmem=30G \ |
60 | | - -no-reboot \ |
61 | | - -drive if=pflash,format=raw,unit=0,file=$OVMF_CODE,readonly=on \ |
62 | | - -netdev user,id=vmnic,hostfwd=tcp::7020-:7002 \ |
| 68 | + -smp 4,maxcpus=16 \ |
| 69 | + -m 8G,slots=5,maxmem=30G \ |
| 70 | + -netdev user,id=vmnic,hostfwd=tcp::7022-:7002 \ |
63 | 71 | -device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= \ |
| 72 | + -machine confidential-guest-support=sev0,memory-backend=ram1,igvm-cfg=igvm0 \ |
| 73 | + -object memory-backend-memfd,id=ram1,size=8G,share=true,prealloc=false,reserve=false \ |
| 74 | + -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \ |
| 75 | + -object igvm-cfg,id=igvm0,file=$IGVM \ |
64 | 76 | -kernel $KERNEL \ |
65 | | - -append "earlyprintk=serial console=ttyS0" \ |
| 77 | + -append "console=null quiet" \ |
66 | 78 | -initrd $INITRD \ |
67 | 79 | -nographic \ |
68 | 80 | -monitor pty \ |
69 | 81 | -monitor unix:monitor,server,nowait \ |
70 | | - -fsdev local,id=cert_fs,path=/home/sammyk/Documents/certs,security_model=mapped \ |
71 | | - -device virtio-9p-pci,fsdev=cert_fs,mount_tag=certs_share \ |
72 | | - -fsdev local,id=env_fs,path=/home/sammyk/Documents/env,security_model=mapped \ |
73 | | - -device virtio-9p-pci,fsdev=env_fs,mount_tag=env_share |
| 82 | + -fsdev local,id=env_fs,path=$ENV_PATH,security_model=mapped \ |
| 83 | + -device virtio-9p-pci,fsdev=env_fs,mount_tag=env_share \ |
| 84 | + -fsdev local,id=cert_fs,path=$CERT_PATH,security_model=mapped \ |
| 85 | + -device virtio-9p-pci,fsdev=cert_fs,mount_tag=certs_share |
74 | 86 | ``` |
75 | 87 |
|
76 | | -The default password is `root`. |
| 88 | +The default login password is `root`. |
77 | 89 |
|
78 | | -### Testing Agent Independently |
| 90 | +### Testing the Agent Independently |
79 | 91 |
|
80 | | -Agent once started will wait to receive its configuration via v-sock. For testing purposes you can use the script in `cocos/test/manual/agent-config`. This script sends agent config and also receives logs and events from agent. Once the VM is launched you can send config including computation manifest to agent as follows: |
| 92 | +With a VM running, the Agent waits for connection to a computations management server via gRPC. You can start the Agent independently for testing: |
81 | 93 |
|
82 | 94 | ```shell |
83 | 95 | cd cocos |
84 | | -go run ./test/manual/agent-config/main.go <data-path> <algo-path> <public-key-path> <attested-tls-bool> |
| 96 | + |
| 97 | +AGENT_CVM_GRPC_URL=<cvms_server_host:port>\ |
| 98 | +AGENT_CVM_GRPC_CLIENT_CERT=<path-to-client-cert> \ |
| 99 | +AGENT_CVM_GRPC_CLIENT_KEY=<path-to-client-key> \ |
| 100 | +AGENT_CVM_GRPC_SERVER_CA_CERT=<path-to-server-ca-cert> \ |
| 101 | +go run cmd/agent/main.go \ |
| 102 | + -algo-path <path-to-algorithm> \ |
| 103 | + -public-key-path <path-to-public-key> \ |
| 104 | + -attested-tls-bool <true|false> \ |
| 105 | + -data-paths <comma-separated-data-paths> \ |
| 106 | + -client-ca-file <path-to-client-ca-file> \ |
| 107 | + -ca-url <ca-url-if-attestedTLS-true> \ |
| 108 | + -cvm-id <cvm-id-if-attestedTLS-true> |
85 | 109 | ``` |
86 | 110 |
|
87 | | -### Testing Manager |
| 111 | +Agent, once up, will attempt to connect to the computations management server on the `AGENT_CVM_GRPC_URL`. If agent and the computations management server are running on the same host, the local ip address of the server will suffice. If the agent is running inside the vm, the public ip address of the computations management server (available on the internet) needs to be provided for agent to be able to connect to it. |
88 | 112 |
|
89 | | -Manager is a gRPC client and needs gRPC sever to connect to. We have an example server for testing purposes in `test/computations`. Run the server as follows: |
| 113 | +Using localhost as the `AGENT_CVM_GRPC_URL` will only work if agent is running outside a vm, and the computations server is running on the local host. If agent is running inside the vm, using localhost will fail. |
90 | 114 |
|
91 | | -```shell |
92 | | -go run ./test/computations/main.go /path/to/algo/file /path/to/public/key/file <attested_tls_bool> /path/to/data/file1.zip path/to/data/file2.zip path/to/data/file3.zip |
93 | | -``` |
| 115 | +A running computations management server is required for the Agent to function. The Agent will connect to the server and wait for a computation manifest. Instructions for running a test computations management server are provided in the [CVMs server documentation](/docs/getting-started.md#run-the-server). |
94 | 116 |
|
95 | | -#### Run Manager |
| 117 | +### Testing the Manager |
96 | 118 |
|
97 | | -Create two directories in `cocos/cmd/manager`, the directories are `img` and `tmp`. |
98 | | -Copy `rootfs.cpio.gz` and `bzImage` from the buildroot output directory files to `cocos/cmd/manager/img`. |
| 119 | +A simple gRPC server is provided under `test/cvms/main.go` for development. Start it with the instructions in the [CVMs server documentation](/docs/getting-started.md#run-the-server). |
99 | 120 |
|
100 | | -Next run manager client. |
| 121 | +Create `img` and `tmp` directories inside `cmd/manager` and copy the built kernel and rootfs there. Then run the Manager: |
101 | 122 |
|
102 | 123 | ```shell |
103 | 124 | cd cmd/manager |
104 | | -MANAGER_GRPC_HOST=localhost \ |
105 | | -MANAGER_GRPC_PORT=7002 \ |
| 125 | +MANAGER_QEMU_SMP_MAXCPUS=4 \ |
| 126 | +MANAGER_GRPC_URL=localhost:7002 \ |
106 | 127 | MANAGER_LOG_LEVEL=debug \ |
107 | 128 | MANAGER_QEMU_USE_SUDO=false \ |
108 | | -MANAGER_QEMU_ENABLE_SEV=false \ |
109 | | -MANAGER_QEMU_SEV_CBITPOS=51 \ |
110 | | -MANAGER_QEMU_OVMF_CODE_FILE=/usr/share/edk2/ovmf/OVMF_CODE.fd \ |
111 | | -MANAGER_QEMU_OVMF_VARS_FILE=/usr/share/edk2/ovmf/OVMF_VARS.fd \ |
| 129 | +MANAGER_QEMU_ENABLE_SEV_SNP=false \ |
| 130 | +MANAGER_QEMU_SEV_SNP_CBITPOS=51 \ |
| 131 | +MANAGER_QEMU_OVMF_CODE_FILE=/usr/share/edk2/x64/OVMF_CODE.fd \ |
| 132 | +MANAGER_QEMU_OVMF_VARS_FILE=/usr/share/edk2/x64/OVMF_VARS.fd \ |
112 | 133 | ./build/cocos-manager |
113 | 134 | ``` |
114 | 135 |
|
115 | | -This will result in manager sending a whoIam request to manager-server. Manager server will then launch a VM with agent running and having received the computation manifest. |
| 136 | +Manager will start a gRPC server and wait for client connections which can be used to create and manage vms. More information on how to run manager can be found in the [Manager docs](/docs/manager.md). |
| 137 | + |
| 138 | +### Manager Environment Configuration |
| 139 | + |
| 140 | +When running under systemd or via `make run`, the Manager reads variables from |
| 141 | +`/etc/cocos/cocos-manager.env`. This file defines gRPC options and numerous |
| 142 | +`MANAGER_QEMU_*` settings controlling the VM image, memory size and CPU |
| 143 | +parameters. Adjust these values before starting the service if custom resources |
| 144 | +or ports are required. |
| 145 | + |
| 146 | +Example entries from `cocos-manager.env`: |
| 147 | + |
| 148 | +```shell |
| 149 | +# Manager Service Configuration |
| 150 | +MANAGER_GRPC_PORT=6101 |
| 151 | +MANAGER_GRPC_HOST=0.0.0.0 |
| 152 | + |
| 153 | +# QEMU Configuration |
| 154 | +MANAGER_QEMU_MEMORY_SIZE=25G |
| 155 | +MANAGER_QEMU_OVMF_CODE_FILE=/usr/share/edk2/x64/OVMF_CODE.fd |
| 156 | +``` |
| 157 | + |
| 158 | +### Running Manager as a Service |
| 159 | + |
| 160 | +The repository provides a systemd unit at `init/systemd/cocos-manager.service`. |
| 161 | +Install the binary, configuration and unit file with: |
| 162 | + |
| 163 | +```shell |
| 164 | +sudo make install_service |
| 165 | +``` |
| 166 | + |
| 167 | +Start the Manager via systemd: |
| 168 | + |
| 169 | +```shell |
| 170 | +sudo systemctl start cocos-manager.service |
| 171 | +``` |
| 172 | + |
| 173 | +You can also run `make run` to install the service and immediately start it. |
| 174 | + |
| 175 | +## Code Generation |
116 | 176 |
|
117 | | -## Protobuf |
| 177 | +Whenever `.proto` files are modified, regenerate the Go sources with: |
118 | 178 |
|
119 | | -If you've made any changes to .proto files, you should call protoc command prior to compiling individual microservices. |
| 179 | +```shell |
| 180 | +make protoc |
| 181 | +``` |
120 | 182 |
|
121 | | -To do this by hand, execute: |
122 | | -`make protoc` |
| 183 | +Mocks for unit tests rely on method signatures. Refresh them after interface changes: |
| 184 | + |
| 185 | +```shell |
| 186 | +make mocks |
| 187 | +``` |
| 188 | + |
| 189 | +--- |
| 190 | + |
| 191 | +## Building a Custom Computation Management Server |
| 192 | + |
| 193 | +To integrate with CoCos agents, implement a gRPC server that conforms to the [`cvms.proto`](https://github.com/ultravioletrs/cocos/blob/main/agent/cvms/cvms.proto) interface. |
| 194 | + |
| 195 | +The core method to implement is: |
| 196 | + |
| 197 | +```proto |
| 198 | +rpc Process(stream ClientStreamMessage) returns (stream ServerStreamMessage); |
| 199 | +``` |
123 | 200 |
|
124 | | -## Mocks |
| 201 | +This is a **bi-directional streaming RPC** where the **client (CoCos agent)** and the **server (your control plane)** exchange messages continuously over a long-lived connection. |
125 | 202 |
|
126 | | -To run tests, some of the services are mocked and these need to be updated if the function signatures are changed. |
| 203 | +### Server-Side Requests |
127 | 204 |
|
128 | | -To do this, execute: |
129 | | -`make mocks` |
| 205 | +The server sends the following messages to the agent: |
| 206 | + |
| 207 | +- **`ComputationRunReq`**: |
| 208 | + Triggers execution of a new computation. Includes details of the computation and datasets required. |
| 209 | + |
| 210 | +- **`RunReqChunks`**: |
| 211 | + Used to stream large payloads (e.g., binaries or configs). Sent in sequence before the computation starts. |
| 212 | + |
| 213 | +- **`AgentStateReq`**: |
| 214 | + Requests a snapshot of the agent's current state. |
| 215 | + |
| 216 | +- **`StopComputation`**: |
| 217 | + Instructs the agent to stop a running computation gracefully. |
| 218 | + |
| 219 | +- **`DisconnectReq`**: |
| 220 | + Tells the agent to close the current connection, to terminate a cvm. |
| 221 | + |
| 222 | +### Agent-Side Responses |
| 223 | + |
| 224 | +The agent responds with the following messages: |
| 225 | + |
| 226 | +- **`RunResponse`**: |
| 227 | + Acknowledges receipt and execution of a computation run. Includes the computation id and error, if present. |
| 228 | + |
| 229 | +- **`AgentLog`**: |
| 230 | + Streams runtime logs from the agent, useful for observability and debugging. |
| 231 | + |
| 232 | +- **`AgentEvent`**: |
| 233 | + Reports events of the processes carried out by the agent during the computation. |
| 234 | + |
| 235 | +- **`AttestationResponse`**: |
| 236 | + Provides cryptographic proof of a trusted execution environment. |
| 237 | + |
| 238 | +- **`StopComputationResponse`**: |
| 239 | + Confirms that a stop request was honored and the computation terminated. |
| 240 | + |
| 241 | +### Example Handler in Go |
| 242 | + |
| 243 | +```go |
| 244 | +func (s *server) Process(stream cvms.Service_ProcessServer) error { |
| 245 | + for { |
| 246 | + msg, err := stream.Recv() |
| 247 | + if err != nil { |
| 248 | + return err |
| 249 | + } |
| 250 | + |
| 251 | + switch m := msg.Message.(type) { |
| 252 | + case *cvms.ClientStreamMessage_RunRes: |
| 253 | + handleRunResponse(m.RunRes) |
| 254 | + case *cvms.ClientStreamMessage_Attestation: |
| 255 | + validateAttestation(m.Attestation) |
| 256 | + // Handle other types accordingly |
| 257 | + } |
| 258 | + |
| 259 | + // Example request: ask for agent state |
| 260 | + _ = stream.Send(&cvms.ServerStreamMessage{ |
| 261 | + Message: &cvms.ServerStreamMessage_AgentStateReq{ |
| 262 | + AgentStateReq: &cvms.AgentStateReq{Id: "agent-1"}, |
| 263 | + }, |
| 264 | + }) |
| 265 | + } |
| 266 | +} |
| 267 | +``` |
| 268 | + |
| 269 | +### Hints |
| 270 | + |
| 271 | +- Use **chunked messages (`RunReqChunks`)** for large uploads. |
| 272 | +- Maintain **connection health** by periodically sending `AgentStateReq` or heartbeat pings. |
| 273 | + |
| 274 | +--- |
| 275 | + |
| 276 | +## Running Tests |
| 277 | + |
| 278 | +Execute all unit tests across packages with: |
| 279 | + |
| 280 | +```shell |
| 281 | +go test ./... |
| 282 | +``` |
| 283 | + |
| 284 | +Run `make mocks` first if new interfaces were introduced. |
130 | 285 |
|
131 | 286 | ## Troubleshooting |
132 | 287 |
|
133 | | -If you run `ps aux | grep qemu-system-x86_64` and it returns give you something like this: |
| 288 | +Zombie `qemu-system-x86_64` processes can linger after failed runs. Remove them with: |
| 289 | + |
| 290 | +```shell |
| 291 | +pkill -f qemu-system-x86_64 |
| 292 | +``` |
| 293 | + |
| 294 | +If any remain visible in `ps aux | grep qemu-system-x86_64`, terminate them manually with `kill -9 <PID>`. |
| 295 | + |
| 296 | +Check the Manager service status with: |
134 | 297 |
|
135 | 298 | ```shell |
136 | | -sammy 13913 0.0 0.0 0 0 pts/2 Z+ 20:17 0:00 [qemu-system-x86] <defunct> |
| 299 | +sudo systemctl status cocos-manager.service |
137 | 300 | ``` |
138 | 301 |
|
139 | | -means that the a QEMU virtual machine that is currently defunct, meaning that it is no longer running. More precisely, the defunct process in the output is also known as a ["zombie" process](https://en.wikipedia.org/wiki/Zombie_process). |
| 302 | +View recent logs or follow output using `journalctl`: |
| 303 | + |
| 304 | +```shell |
| 305 | +journalctl -u cocos-manager.service |
| 306 | +``` |
| 307 | + |
| 308 | +## Repository Structure |
| 309 | + |
| 310 | +- `agent/` – Agent service code and gRPC definitions |
| 311 | +- `cmd/` – Entry points for CLI, Agent and Manager binaries |
| 312 | +- `hal/` – Hardware Abstraction Layer build files |
| 313 | +- `manager/` – Manager service, QEMU helpers and API definitions |
| 314 | +- `scripts/` – Build scripts such as the attestation policy helper |
| 315 | +- `test/` – Manual test harnesses and sample servers |
| 316 | + |
| 317 | +## Contributing |
140 | 318 |
|
141 | | -### Kill `qemu-system-x86_64` Processes |
| 319 | +1. Create a feature branch in your fork. |
| 320 | +2. Ensure `make` completes successfully and `go test ./...` passes. |
| 321 | +3. Open a pull request with a detailed description of your changes. |
142 | 322 |
|
143 | | -To kill any leftover `qemu-system-x86_64` processes, use |
144 | | -`pkill -f qemu-system-x86_64` |
145 | | -The pkill command is used to kill processes by name or by pattern. The `-f` flag to specify that we want to kill processes that match the pattern `qemu-system-x86_64`. It sends the SIGKILL signal to all processes that are running `qemu-system-x86_64`. |
| 323 | +## Further Documentation |
146 | 324 |
|
147 | | -If this does not work, i.e. if `ps aux | grep qemu-system-x86_64` still outputs `qemu-system-x86_64` related process(es), you can kill the unwanted process with `kill -9 <PID>`, which also sends a SIGKILL signal to the process. |
| 325 | +Additional guides and design documents are available on the [official documentation site](https://docs.cocos.ultraviolet.rs) and in component `README` files. |
0 commit comments