Skip to content

Commit eaaac14

Browse files
authored
CD-60 - Update developer guide (#116)
* update dev-guide Signed-off-by: WashingtonKK <washingtonkigan@gmail.com> * lint Signed-off-by: WashingtonKK <washingtonkigan@gmail.com> * surround shell block with white space Signed-off-by: WashingtonKK <washingtonkigan@gmail.com> * update dev guide Signed-off-by: WashingtonKK <washingtonkigan@gmail.com> * add info on ip Signed-off-by: WashingtonKK <washingtonkigan@gmail.com> * update dev-guide Signed-off-by: WashingtonKK <washingtonkigan@gmail.com> * fix sidebar Signed-off-by: WashingtonKK <washingtonkigan@gmail.com> * remove excess info from menu items Signed-off-by: WashingtonKK <washingtonkigan@gmail.com> * fix typo Signed-off-by: WashingtonKK <washingtonkigan@gmail.com> --------- Signed-off-by: WashingtonKK <washingtonkigan@gmail.com>
1 parent 08bae63 commit eaaac14

File tree

1 file changed

+247
-69
lines changed

1 file changed

+247
-69
lines changed

docs/developer-guide.md

Lines changed: 247 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -2,146 +2,324 @@
22

33
## Getting CoCos
44

5-
CoCos is found on the [CoCos repository](https://github.com/ultravioletrs/cocos). You should fork the repository in order to make changes to the repository. After forking the repository, you can clone it as follows:
5+
1. Fork the [CoCos repository](https://github.com/ultravioletrs/cocos) to your GitHub account.
6+
2. Clone your fork:
67

7-
```shell
8-
git clone <forked repository> $SOMEPATH/cocos
9-
cd $SOMEPATH/cocos
10-
```
8+
```shell
9+
git clone <your-fork-url> $SOMEPATH/cocos
10+
cd $SOMEPATH/cocos
11+
```
1112

12-
## Building
13+
## Build Environment
1314

14-
### Prerequisites
15+
The project uses Go and Protocol Buffers. Make sure the following tools are installed:
1516

17+
- [Go](https://go.dev/doc/install) 1.20 or later
1618
- [Protocol Buffers](https://grpc.io/docs/languages/go/quickstart/)
17-
- [Golang](https://go.dev/doc/install)
19+
- [GNU Make](https://www.gnu.org/software/make/)
20+
- [QEMU-KVM](https://www.qemu.org/) for running local VMs
21+
- Optional: [Buildroot](https://buildroot.org/) when building the HAL image
22+
23+
### Building All Services
1824

19-
### Build All Services
25+
Run `make` in the repository root to compile the Agent, CLI and Manager. Artifacts are placed in the `build` directory. You can also build a single component:
2026

21-
Use the GNU Make tool to build all CoCos services `make`. Build artifacts will be put in the build directory.
27+
```shell
28+
make cli # produces ./build/cocos-cli
29+
make manager # produces ./build/cocos-manager
30+
make agent # produces ./build/cocos-agent
31+
```
2232

23-
### Building HAL
33+
### Building the HAL Image
2434

25-
To build the custom linux image that will host agent, run:
35+
The HAL is a minimal Linux distribution used inside the confidential VM. To build it, clone Buildroot and run:
2636

2737
```shell
2838
git clone https://github.com/buildroot/buildroot.git
2939
cd buildroot
30-
git checkout 2024.11-rc2
40+
git checkout 2024.11-rc2
3141
make BR2_EXTERNAL=../cocos/hal/linux cocos_defconfig
32-
make menuconfig #optional for additional configuration
42+
make menuconfig # optional, for additional configuration
3343
make
3444
```
3545

36-
#### Testing HAL image
46+
The kernel image and root filesystem appear in `buildroot/output/images`. Copy `bzImage` and `rootfs.cpio.gz` to `cmd/manager/img` when testing locally.
3747

38-
##### Launch the VM
48+
### Testing the HAL Image
3949

40-
To launch the virtual machine containing agent for testing purposes, run:
50+
After building, you can boot a VM that runs the Agent using QEMU. Substitute the paths for your system:
4151

4252
```shell
4353
sudo find / -name OVMF_CODE.fd
44-
# => /usr/share/OVMF/OVMF_CODE.fd
4554
OVMF_CODE=/usr/share/OVMF/OVMF_CODE.fd
46-
4755
sudo find / -name OVMF_VARS.fd
48-
# => /usr/share/OVMF/OVMF_VARS.fd
4956
OVMF_VARS=/usr/share/OVMF/OVMF_VARS.fd
5057

51-
KERNEL="buildroot/output/images/bzImage"
52-
INITRD="buildroot/output/images/rootfs.cpio.gz"
58+
KERNEL=buildroot/output/images/bzImage
59+
INITRD=buildroot/output/images/rootfs.cpio.gz
60+
IGVM=svsm/bin/coconut-qemu.igvm
61+
ENV_PATH=<path>/<to>/<env_directory>
62+
CERT_PATH=<path>/<to>/<cert_directory>
5363

54-
qemu-system-x86_64 \
64+
sudo qemu-system-x86_64 \
5565
-enable-kvm \
5666
-cpu EPYC-v4 \
5767
-machine q35 \
58-
-smp 4 \
59-
-m 25G,slots=5,maxmem=30G \
60-
-no-reboot \
61-
-drive if=pflash,format=raw,unit=0,file=$OVMF_CODE,readonly=on \
62-
-netdev user,id=vmnic,hostfwd=tcp::7020-:7002 \
68+
-smp 4,maxcpus=16 \
69+
-m 8G,slots=5,maxmem=30G \
70+
-netdev user,id=vmnic,hostfwd=tcp::7022-:7002 \
6371
-device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= \
72+
-machine confidential-guest-support=sev0,memory-backend=ram1,igvm-cfg=igvm0 \
73+
-object memory-backend-memfd,id=ram1,size=8G,share=true,prealloc=false,reserve=false \
74+
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \
75+
-object igvm-cfg,id=igvm0,file=$IGVM \
6476
-kernel $KERNEL \
65-
-append "earlyprintk=serial console=ttyS0" \
77+
-append "console=null quiet" \
6678
-initrd $INITRD \
6779
-nographic \
6880
-monitor pty \
6981
-monitor unix:monitor,server,nowait \
70-
-fsdev local,id=cert_fs,path=/home/sammyk/Documents/certs,security_model=mapped \
71-
-device virtio-9p-pci,fsdev=cert_fs,mount_tag=certs_share \
72-
-fsdev local,id=env_fs,path=/home/sammyk/Documents/env,security_model=mapped \
73-
-device virtio-9p-pci,fsdev=env_fs,mount_tag=env_share
82+
-fsdev local,id=env_fs,path=$ENV_PATH,security_model=mapped \
83+
-device virtio-9p-pci,fsdev=env_fs,mount_tag=env_share \
84+
-fsdev local,id=cert_fs,path=$CERT_PATH,security_model=mapped \
85+
-device virtio-9p-pci,fsdev=cert_fs,mount_tag=certs_share
7486
```
7587

76-
The default password is `root`.
88+
The default login password is `root`.
7789

78-
### Testing Agent Independently
90+
### Testing the Agent Independently
7991

80-
Agent once started will wait to receive its configuration via v-sock. For testing purposes you can use the script in `cocos/test/manual/agent-config`. This script sends agent config and also receives logs and events from agent. Once the VM is launched you can send config including computation manifest to agent as follows:
92+
With a VM running, the Agent waits for connection to a computations management server via gRPC. You can start the Agent independently for testing:
8193

8294
```shell
8395
cd cocos
84-
go run ./test/manual/agent-config/main.go <data-path> <algo-path> <public-key-path> <attested-tls-bool>
96+
97+
AGENT_CVM_GRPC_URL=<cvms_server_host:port>\
98+
AGENT_CVM_GRPC_CLIENT_CERT=<path-to-client-cert> \
99+
AGENT_CVM_GRPC_CLIENT_KEY=<path-to-client-key> \
100+
AGENT_CVM_GRPC_SERVER_CA_CERT=<path-to-server-ca-cert> \
101+
go run cmd/agent/main.go \
102+
-algo-path <path-to-algorithm> \
103+
-public-key-path <path-to-public-key> \
104+
-attested-tls-bool <true|false> \
105+
-data-paths <comma-separated-data-paths> \
106+
-client-ca-file <path-to-client-ca-file> \
107+
-ca-url <ca-url-if-attestedTLS-true> \
108+
-cvm-id <cvm-id-if-attestedTLS-true>
85109
```
86110

87-
### Testing Manager
111+
Agent, once up, will attempt to connect to the computations management server on the `AGENT_CVM_GRPC_URL`. If agent and the computations management server are running on the same host, the local ip address of the server will suffice. If the agent is running inside the vm, the public ip address of the computations management server (available on the internet) needs to be provided for agent to be able to connect to it.
88112

89-
Manager is a gRPC client and needs gRPC sever to connect to. We have an example server for testing purposes in `test/computations`. Run the server as follows:
113+
Using localhost as the `AGENT_CVM_GRPC_URL` will only work if agent is running outside a vm, and the computations server is running on the local host. If agent is running inside the vm, using localhost will fail.
90114

91-
```shell
92-
go run ./test/computations/main.go /path/to/algo/file /path/to/public/key/file <attested_tls_bool> /path/to/data/file1.zip path/to/data/file2.zip path/to/data/file3.zip
93-
```
115+
A running computations management server is required for the Agent to function. The Agent will connect to the server and wait for a computation manifest. Instructions for running a test computations management server are provided in the [CVMs server documentation](/docs/getting-started.md#run-the-server).
94116

95-
#### Run Manager
117+
### Testing the Manager
96118

97-
Create two directories in `cocos/cmd/manager`, the directories are `img` and `tmp`.
98-
Copy `rootfs.cpio.gz` and `bzImage` from the buildroot output directory files to `cocos/cmd/manager/img`.
119+
A simple gRPC server is provided under `test/cvms/main.go` for development. Start it with the instructions in the [CVMs server documentation](/docs/getting-started.md#run-the-server).
99120

100-
Next run manager client.
121+
Create `img` and `tmp` directories inside `cmd/manager` and copy the built kernel and rootfs there. Then run the Manager:
101122

102123
```shell
103124
cd cmd/manager
104-
MANAGER_GRPC_HOST=localhost \
105-
MANAGER_GRPC_PORT=7002 \
125+
MANAGER_QEMU_SMP_MAXCPUS=4 \
126+
MANAGER_GRPC_URL=localhost:7002 \
106127
MANAGER_LOG_LEVEL=debug \
107128
MANAGER_QEMU_USE_SUDO=false \
108-
MANAGER_QEMU_ENABLE_SEV=false \
109-
MANAGER_QEMU_SEV_CBITPOS=51 \
110-
MANAGER_QEMU_OVMF_CODE_FILE=/usr/share/edk2/ovmf/OVMF_CODE.fd \
111-
MANAGER_QEMU_OVMF_VARS_FILE=/usr/share/edk2/ovmf/OVMF_VARS.fd \
129+
MANAGER_QEMU_ENABLE_SEV_SNP=false \
130+
MANAGER_QEMU_SEV_SNP_CBITPOS=51 \
131+
MANAGER_QEMU_OVMF_CODE_FILE=/usr/share/edk2/x64/OVMF_CODE.fd \
132+
MANAGER_QEMU_OVMF_VARS_FILE=/usr/share/edk2/x64/OVMF_VARS.fd \
112133
./build/cocos-manager
113134
```
114135

115-
This will result in manager sending a whoIam request to manager-server. Manager server will then launch a VM with agent running and having received the computation manifest.
136+
Manager will start a gRPC server and wait for client connections which can be used to create and manage vms. More information on how to run manager can be found in the [Manager docs](/docs/manager.md).
137+
138+
### Manager Environment Configuration
139+
140+
When running under systemd or via `make run`, the Manager reads variables from
141+
`/etc/cocos/cocos-manager.env`. This file defines gRPC options and numerous
142+
`MANAGER_QEMU_*` settings controlling the VM image, memory size and CPU
143+
parameters. Adjust these values before starting the service if custom resources
144+
or ports are required.
145+
146+
Example entries from `cocos-manager.env`:
147+
148+
```shell
149+
# Manager Service Configuration
150+
MANAGER_GRPC_PORT=6101
151+
MANAGER_GRPC_HOST=0.0.0.0
152+
153+
# QEMU Configuration
154+
MANAGER_QEMU_MEMORY_SIZE=25G
155+
MANAGER_QEMU_OVMF_CODE_FILE=/usr/share/edk2/x64/OVMF_CODE.fd
156+
```
157+
158+
### Running Manager as a Service
159+
160+
The repository provides a systemd unit at `init/systemd/cocos-manager.service`.
161+
Install the binary, configuration and unit file with:
162+
163+
```shell
164+
sudo make install_service
165+
```
166+
167+
Start the Manager via systemd:
168+
169+
```shell
170+
sudo systemctl start cocos-manager.service
171+
```
172+
173+
You can also run `make run` to install the service and immediately start it.
174+
175+
## Code Generation
116176

117-
## Protobuf
177+
Whenever `.proto` files are modified, regenerate the Go sources with:
118178

119-
If you've made any changes to .proto files, you should call protoc command prior to compiling individual microservices.
179+
```shell
180+
make protoc
181+
```
120182

121-
To do this by hand, execute:
122-
`make protoc`
183+
Mocks for unit tests rely on method signatures. Refresh them after interface changes:
184+
185+
```shell
186+
make mocks
187+
```
188+
189+
---
190+
191+
## Building a Custom Computation Management Server
192+
193+
To integrate with CoCos agents, implement a gRPC server that conforms to the [`cvms.proto`](https://github.com/ultravioletrs/cocos/blob/main/agent/cvms/cvms.proto) interface.
194+
195+
The core method to implement is:
196+
197+
```proto
198+
rpc Process(stream ClientStreamMessage) returns (stream ServerStreamMessage);
199+
```
123200

124-
## Mocks
201+
This is a **bi-directional streaming RPC** where the **client (CoCos agent)** and the **server (your control plane)** exchange messages continuously over a long-lived connection.
125202

126-
To run tests, some of the services are mocked and these need to be updated if the function signatures are changed.
203+
### Server-Side Requests
127204

128-
To do this, execute:
129-
`make mocks`
205+
The server sends the following messages to the agent:
206+
207+
- **`ComputationRunReq`**:
208+
Triggers execution of a new computation. Includes details of the computation and datasets required.
209+
210+
- **`RunReqChunks`**:
211+
Used to stream large payloads (e.g., binaries or configs). Sent in sequence before the computation starts.
212+
213+
- **`AgentStateReq`**:
214+
Requests a snapshot of the agent's current state.
215+
216+
- **`StopComputation`**:
217+
Instructs the agent to stop a running computation gracefully.
218+
219+
- **`DisconnectReq`**:
220+
Tells the agent to close the current connection, to terminate a cvm.
221+
222+
### Agent-Side Responses
223+
224+
The agent responds with the following messages:
225+
226+
- **`RunResponse`**:
227+
Acknowledges receipt and execution of a computation run. Includes the computation id and error, if present.
228+
229+
- **`AgentLog`**:
230+
Streams runtime logs from the agent, useful for observability and debugging.
231+
232+
- **`AgentEvent`**:
233+
Reports events of the processes carried out by the agent during the computation.
234+
235+
- **`AttestationResponse`**:
236+
Provides cryptographic proof of a trusted execution environment.
237+
238+
- **`StopComputationResponse`**:
239+
Confirms that a stop request was honored and the computation terminated.
240+
241+
### Example Handler in Go
242+
243+
```go
244+
func (s *server) Process(stream cvms.Service_ProcessServer) error {
245+
for {
246+
msg, err := stream.Recv()
247+
if err != nil {
248+
return err
249+
}
250+
251+
switch m := msg.Message.(type) {
252+
case *cvms.ClientStreamMessage_RunRes:
253+
handleRunResponse(m.RunRes)
254+
case *cvms.ClientStreamMessage_Attestation:
255+
validateAttestation(m.Attestation)
256+
// Handle other types accordingly
257+
}
258+
259+
// Example request: ask for agent state
260+
_ = stream.Send(&cvms.ServerStreamMessage{
261+
Message: &cvms.ServerStreamMessage_AgentStateReq{
262+
AgentStateReq: &cvms.AgentStateReq{Id: "agent-1"},
263+
},
264+
})
265+
}
266+
}
267+
```
268+
269+
### Hints
270+
271+
- Use **chunked messages (`RunReqChunks`)** for large uploads.
272+
- Maintain **connection health** by periodically sending `AgentStateReq` or heartbeat pings.
273+
274+
---
275+
276+
## Running Tests
277+
278+
Execute all unit tests across packages with:
279+
280+
```shell
281+
go test ./...
282+
```
283+
284+
Run `make mocks` first if new interfaces were introduced.
130285

131286
## Troubleshooting
132287

133-
If you run `ps aux | grep qemu-system-x86_64` and it returns give you something like this:
288+
Zombie `qemu-system-x86_64` processes can linger after failed runs. Remove them with:
289+
290+
```shell
291+
pkill -f qemu-system-x86_64
292+
```
293+
294+
If any remain visible in `ps aux | grep qemu-system-x86_64`, terminate them manually with `kill -9 <PID>`.
295+
296+
Check the Manager service status with:
134297

135298
```shell
136-
sammy 13913 0.0 0.0 0 0 pts/2 Z+ 20:17 0:00 [qemu-system-x86] <defunct>
299+
sudo systemctl status cocos-manager.service
137300
```
138301

139-
means that the a QEMU virtual machine that is currently defunct, meaning that it is no longer running. More precisely, the defunct process in the output is also known as a ["zombie" process](https://en.wikipedia.org/wiki/Zombie_process).
302+
View recent logs or follow output using `journalctl`:
303+
304+
```shell
305+
journalctl -u cocos-manager.service
306+
```
307+
308+
## Repository Structure
309+
310+
- `agent/` – Agent service code and gRPC definitions
311+
- `cmd/` – Entry points for CLI, Agent and Manager binaries
312+
- `hal/` – Hardware Abstraction Layer build files
313+
- `manager/` – Manager service, QEMU helpers and API definitions
314+
- `scripts/` – Build scripts such as the attestation policy helper
315+
- `test/` – Manual test harnesses and sample servers
316+
317+
## Contributing
140318

141-
### Kill `qemu-system-x86_64` Processes
319+
1. Create a feature branch in your fork.
320+
2. Ensure `make` completes successfully and `go test ./...` passes.
321+
3. Open a pull request with a detailed description of your changes.
142322

143-
To kill any leftover `qemu-system-x86_64` processes, use
144-
`pkill -f qemu-system-x86_64`
145-
The pkill command is used to kill processes by name or by pattern. The `-f` flag to specify that we want to kill processes that match the pattern `qemu-system-x86_64`. It sends the SIGKILL signal to all processes that are running `qemu-system-x86_64`.
323+
## Further Documentation
146324

147-
If this does not work, i.e. if `ps aux | grep qemu-system-x86_64` still outputs `qemu-system-x86_64` related process(es), you can kill the unwanted process with `kill -9 <PID>`, which also sends a SIGKILL signal to the process.
325+
Additional guides and design documents are available on the [official documentation site](https://docs.cocos.ultraviolet.rs) and in component `README` files.

0 commit comments

Comments
 (0)