Skip to content

Commit e6dacf9

Browse files
committed
runtime: use cgroup CPU limit to set GOMAXPROCS
This CL adds two related features enabled by default via compatibility GODEBUGs containermaxprocs and updatemaxprocs. On Linux, containermaxprocs makes the Go runtime consider cgroup CPU bandwidth limits (quota/period) when setting GOMAXPROCS. If the cgroup limit is lower than the number of logical CPUs available, then the cgroup limit takes precedence. On all OSes, updatemaxprocs makes the Go runtime periodically recalculate the default GOMAXPROCS value and update GOMAXPROCS if it has changed. If GOMAXPROCS is set manually, this update does not occur. This is intended primarily to detect changes to cgroup limits, but it applies on all OSes because the CPU affinity mask can change as well. The runtime only considers the limit in the leaf cgroup (the one that actually contains the process), caching the CPU limit file descriptor(s), which are periodically reread for updates. This is a small departure from the original proposed design. It will not consider limits of parent cgroups (which may be lower than the leaf), and it will not detection cgroup migration after process start. We can consider changing this in the future, but the simpler approach is less invasive; less risk to packages that have some awareness of runtime internals. e.g., if the runtime periodically opens new files during execution, file descriptor leak detection is difficult to implement in a stable way. For #73193. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: I6a6a636c631c1ae577fb8254960377ba91c5dc98 Reviewed-on: https://go-review.googlesource.com/c/go/+/670497 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
1 parent f12c66f commit e6dacf9

File tree

24 files changed

+1327
-43
lines changed

24 files changed

+1327
-43
lines changed

api/next/73193.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
pkg runtime, func SetDefaultGOMAXPROCS() #73193

doc/godebug.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,17 @@ Go command will follow symlinks to regular files embedding files.
169169
The default value `embedfollowsymlinks=0` does not allow following
170170
symlinks. `embedfollowsymlinks=1` will allow following symlinks.
171171

172+
Go 1.25 added a new `containermaxprocs` setting that controls whether the Go
173+
runtime will consider cgroup CPU limits when setting the default GOMAXPROCS.
174+
The default value `containermaxprocs=1` will use cgroup limits in addition to
175+
the total logical CPU count and CPU affinity. `containermaxprocs=0` will
176+
disable consideration of cgroup limits. This setting only affects Linux.
177+
178+
Go 1.25 added a new `updatemaxprocs` setting that controls whether the Go
179+
runtime will periodically update GOMAXPROCS for new CPU affinity or cgroup
180+
limits. The default value `updatemaxprocs=1` will enable periodic updates.
181+
`updatemaxprocs=0` will disable periodic updates.
182+
172183
Go 1.25 corrected the semantics of contention reports for runtime-internal locks,
173184
and so removed the [`runtimecontentionstacks` setting](/pkg/runtime#hdr-Environment_Variable).
174185

doc/next/4-runtime.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,31 @@ This program will now print:
1717

1818
panic: PANIC [recovered, repanicked]
1919

20+
<!-- go.dev/issue/73193 -->
21+
22+
The default behavior of the `GOMAXPROCS` has changed. In prior versions of Go,
23+
`GOMAXPROCS` defaults to the number of logical CPUs available at startup
24+
([runtime.NumCPU]). Go 1.25 introduces two changes:
25+
26+
1. On Linux, the runtime considers the CPU bandwidth limit of the cgroup
27+
containing the process, if any. If the CPU bandwidth limit is lower than the
28+
number of logical CPUs available, `GOMAXPROCS` will default to the lower
29+
limit. In container runtime systems like Kubernetes, cgroup CPU bandwidth
30+
limits generally correspond to the "CPU limit" option. The Go runtime does
31+
not consider the "CPU requests" option.
32+
33+
2. On all OSes, the runtime periodically updates `GOMAXPROCS` if the number
34+
of logical CPUs available or the cgroup CPU bandwidth limit change.
35+
36+
Both of these behaviors are automatically disabled if `GOMAXPROCS` is set
37+
manually via the `GOMAXPROCS` environment variable or a call to
38+
[runtime.GOMAXPROCS]. They can also be disabled explicitly with the [GODEBUG
39+
settings](/doc/godebug) `containermaxprocs=0` and `updatemaxprocs=0`,
40+
respectively.
41+
42+
In order to support reading updated cgroup limits, the runtime will keep cached
43+
file descriptors for the cgroup files for the duration of the process lifetime.
44+
2045
<!-- go.dev/issue/71546 -->
2146

2247
On Linux systems with kernel support for anonymous VMA names
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
The new [SetDefaultGOMAXPROCS] function sets `GOMAXPROCS` to the runtime
2+
default value, as if the `GOMAXPROCS` environment variable is not set. This is
3+
useful for enabling the [new `GOMAXPROCS` default](#runtime) if it has been
4+
disabled by the `GOMAXPROCS` environment variable or a prior call to
5+
[GOMAXPROCS].

src/go/build/deps_test.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -795,6 +795,8 @@ var depsRules = `
795795
FMT, compress/gzip, embed, encoding/binary < encoding/json/internal/jsontest;
796796
CGO, internal/syscall/unix < net/internal/cgotest;
797797
FMT < math/big/internal/asmgen;
798+
799+
FMT, testing < internal/cgrouptest;
798800
`
799801

800802
// listStdPkgs returns the same list of packages as "go list std".
Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
// Copyright 2025 The Go Authors. All rights reserved.
2+
// Use of this source code is governed by a BSD-style
3+
// license that can be found in the LICENSE file.
4+
5+
// Package cgrouptest provides best-effort helpers for running tests inside a
6+
// cgroup.
7+
package cgrouptest
8+
9+
import (
10+
"fmt"
11+
"internal/runtime/cgroup"
12+
"os"
13+
"path/filepath"
14+
"slices"
15+
"strconv"
16+
"strings"
17+
"syscall"
18+
"testing"
19+
)
20+
21+
type CgroupV2 struct {
22+
orig string
23+
path string
24+
}
25+
26+
func (c *CgroupV2) Path() string {
27+
return c.path
28+
}
29+
30+
// Path to cpu.max.
31+
func (c *CgroupV2) CPUMaxPath() string {
32+
return filepath.Join(c.path, "cpu.max")
33+
}
34+
35+
// Set cpu.max. Pass -1 for quota to disable the limit.
36+
func (c *CgroupV2) SetCPUMax(quota, period int64) error {
37+
q := "max"
38+
if quota >= 0 {
39+
q = strconv.FormatInt(quota, 10)
40+
}
41+
buf := fmt.Sprintf("%s %d", q, period)
42+
return os.WriteFile(c.CPUMaxPath(), []byte(buf), 0)
43+
}
44+
45+
// InCgroupV2 creates a new v2 cgroup, migrates the current process into it,
46+
// and then calls fn. When fn returns, the current process is migrated back to
47+
// the original cgroup and the new cgroup is destroyed.
48+
//
49+
// If a new cgroup cannot be created, the test is skipped.
50+
//
51+
// This must not be used in parallel tests, as it affects the entire process.
52+
func InCgroupV2(t *testing.T, fn func(*CgroupV2)) {
53+
mount, rel := findCurrent(t)
54+
parent := findOwnedParent(t, mount, rel)
55+
orig := filepath.Join(mount, rel)
56+
57+
// Make sure the parent allows children to control cpu.
58+
b, err := os.ReadFile(filepath.Join(parent, "cgroup.subtree_control"))
59+
if err != nil {
60+
t.Skipf("unable to read cgroup.subtree_control: %v", err)
61+
}
62+
if !slices.Contains(strings.Fields(string(b)), "cpu") {
63+
// N.B. We should have permission to add cpu to
64+
// subtree_control, but it seems like a bad idea to change this
65+
// on a high-level cgroup that probably has lots of existing
66+
// children.
67+
t.Skipf("Parent cgroup %s does not allow children to control cpu, only %q", parent, string(b))
68+
}
69+
70+
path, err := os.MkdirTemp(parent, "go-cgrouptest")
71+
if err != nil {
72+
t.Skipf("unable to create cgroup directory: %v", err)
73+
}
74+
// Important: defer cleanups so they run even in the event of panic.
75+
//
76+
// TODO(prattmic): Consider running everything in a subprocess just so
77+
// we can clean up if it throws or otherwise doesn't run the defers.
78+
defer func() {
79+
if err := os.Remove(path); err != nil {
80+
// Not much we can do, but at least inform of the
81+
// problem.
82+
t.Errorf("Error removing cgroup directory: %v", err)
83+
}
84+
}()
85+
86+
migrateTo(t, path)
87+
defer migrateTo(t, orig)
88+
89+
c := &CgroupV2{
90+
orig: orig,
91+
path: path,
92+
}
93+
fn(c)
94+
}
95+
96+
// Returns the mount and relative directory of the current cgroup the process
97+
// is in.
98+
func findCurrent(t *testing.T) (string, string) {
99+
// Find the path to our current CPU cgroup. Currently this package is
100+
// only used for CPU cgroup testing, so the distinction of different
101+
// controllers doesn't matter.
102+
var scratch [cgroup.ParseSize]byte
103+
buf := make([]byte, cgroup.PathSize)
104+
n, err := cgroup.FindCPUMountPoint(buf, scratch[:])
105+
if err != nil {
106+
t.Skipf("cgroup: unable to find current cgroup mount: %v", err)
107+
}
108+
mount := string(buf[:n])
109+
110+
n, ver, err := cgroup.FindCPURelativePath(buf, scratch[:])
111+
if err != nil {
112+
t.Skipf("cgroup: unable to find current cgroup path: %v", err)
113+
}
114+
if ver != cgroup.V2 {
115+
t.Skipf("cgroup: running on cgroup v%d want v2", ver)
116+
}
117+
rel := string(buf[1:n]) // The returned path always starts with /, skip it.
118+
rel = filepath.Join(".", rel) // Make sure this isn't empty string at root.
119+
return mount, rel
120+
}
121+
122+
// Returns a parent directory in which we can create our own cgroup subdirectory.
123+
func findOwnedParent(t *testing.T, mount, rel string) string {
124+
// There are many ways cgroups may be set up on a system. We don't try
125+
// to cover all of them, just common ones.
126+
//
127+
// To start with, systemd:
128+
//
129+
// Our test process is likely running inside a user session, in which
130+
// case we are likely inside a cgroup that looks something like:
131+
//
132+
// /sys/fs/cgroup/user.slice/user-1234.slice/user@1234.service/vte-spawn-1.scope/
133+
//
134+
// Possibly with additional slice layers between user@1234.service and
135+
// the leaf scope.
136+
//
137+
// On new enough kernel and systemd versions (exact versions unknown),
138+
// full unprivileged control of the user's cgroups is permitted
139+
// directly via the cgroup filesystem. Specifically, the
140+
// user@1234.service directory is owned by the user, as are all
141+
// subdirectories.
142+
143+
// We want to create our own subdirectory that we can migrate into and
144+
// then manipulate at will. It is tempting to create a new subdirectory
145+
// inside the current cgroup we are already in, however that will likey
146+
// not work. cgroup v2 only allows processes to be in leaf cgroups. Our
147+
// current cgroup likely contains multiple processes (at least this one
148+
// and the cmd/go test runner). If we make a subdirectory and try to
149+
// move our process into that cgroup, then the subdirectory and parent
150+
// would both contain processes. Linux won't allow us to do that [1].
151+
//
152+
// Instead, we will simply walk up to the highest directory that our
153+
// user owns and create our new subdirectory. Since that directory
154+
// already has a bunch of subdirectories, it must not directly contain
155+
// and processes.
156+
//
157+
// (This would fall apart if we already in the highest directory we
158+
// own, such as if there was simply a single cgroup for the entire
159+
// user. Luckily systemd at least does not do this.)
160+
//
161+
// [1] Minor technicality: By default a new subdirectory has no cgroup
162+
// controller (they must be explicitly enabled in the parent's
163+
// cgroup.subtree_control). Linux will allow moving processes into a
164+
// subdirectory that has no controllers while there are still processes
165+
// in the parent, but it won't allow adding controller until the parent
166+
// is empty. As far as I tell, the only purpose of this is to allow
167+
// reorganizing processes into a new set of subdirectories and then
168+
// adding controllers once done.
169+
root, err := os.OpenRoot(mount)
170+
if err != nil {
171+
t.Fatalf("error opening cgroup mount root: %v", err)
172+
}
173+
174+
uid := os.Getuid()
175+
var prev string
176+
for rel != "." {
177+
fi, err := root.Stat(rel)
178+
if err != nil {
179+
t.Fatalf("error stating cgroup path: %v", err)
180+
}
181+
182+
st := fi.Sys().(*syscall.Stat_t)
183+
if int(st.Uid) != uid {
184+
// Stop at first directory we don't own.
185+
break
186+
}
187+
188+
prev = rel
189+
rel = filepath.Join(rel, "..")
190+
}
191+
192+
if prev == "" {
193+
t.Skipf("No parent cgroup owned by UID %d", uid)
194+
}
195+
196+
// We actually want the last directory where we were the owner.
197+
return filepath.Join(mount, prev)
198+
}
199+
200+
// Migrate the current process to the cgroup directory dst.
201+
func migrateTo(t *testing.T, dst string) {
202+
pid := []byte(strconv.FormatInt(int64(os.Getpid()), 10))
203+
if err := os.WriteFile(filepath.Join(dst, "cgroup.procs"), pid, 0); err != nil {
204+
t.Skipf("Unable to migrate into %s: %v", dst, err)
205+
}
206+
}
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
// Copyright 2025 The Go Authors. All rights reserved.
2+
// Use of this source code is governed by a BSD-style
3+
// license that can be found in the LICENSE file.
4+
5+
package cgrouptest
6+
7+
import (
8+
"fmt"
9+
"testing"
10+
)
11+
12+
func TestInCgroupV2(t *testing.T) {
13+
InCgroupV2(t, func(c *CgroupV2) {
14+
fmt.Println("Created", c.Path())
15+
if err := c.SetCPUMax(500000, 100000); err != nil {
16+
t.Errorf("Erroring setting cpu.max: %v", err)
17+
}
18+
})
19+
}

src/internal/coverage/pkid.go

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,18 +45,29 @@ package coverage
4545
// as opposed to a fixed list.
4646

4747
var rtPkgs = [...]string{
48+
"internal/asan",
49+
"internal/byteorder",
50+
"internal/coverage/rtcov",
4851
"internal/cpu",
52+
"internal/bytealg",
4953
"internal/goarch",
50-
"internal/runtime/atomic",
51-
"internal/goos",
54+
"internal/abi",
5255
"internal/chacha8rand",
56+
"internal/godebugs",
57+
"internal/goexperiment",
58+
"internal/goos",
59+
"internal/msan",
60+
"internal/profilerecord",
61+
"internal/race",
62+
"internal/runtime/atomic",
63+
"internal/runtime/exithook",
64+
"internal/runtime/gc",
65+
"internal/runtime/math",
66+
"internal/runtime/strconv",
5367
"internal/runtime/sys",
54-
"internal/abi",
5568
"internal/runtime/maps",
56-
"internal/runtime/math",
57-
"internal/bytealg",
58-
"internal/goexperiment",
5969
"internal/runtime/syscall",
70+
"internal/runtime/cgroup",
6071
"internal/stringslite",
6172
"runtime",
6273
}

src/internal/godebugs/table.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ type Info struct {
2626
// (Otherwise the test in this package will fail.)
2727
var All = []Info{
2828
{Name: "asynctimerchan", Package: "time", Changed: 23, Old: "1"},
29+
{Name: "containermaxprocs", Package: "runtime", Changed: 25, Old: "0"},
2930
{Name: "dataindependenttiming", Package: "crypto/subtle", Opaque: true},
3031
{Name: "decoratemappings", Package: "runtime", Opaque: true, Changed: 25, Old: "0"},
3132
{Name: "embedfollowsymlinks", Package: "cmd/go"},
@@ -61,6 +62,7 @@ var All = []Info{
6162
{Name: "tlsmlkem", Package: "crypto/tls", Changed: 24, Old: "0", Opaque: true},
6263
{Name: "tlsrsakex", Package: "crypto/tls", Changed: 22, Old: "1"},
6364
{Name: "tlsunsafeekm", Package: "crypto/tls", Changed: 22, Old: "1"},
65+
{Name: "updatemaxprocs", Package: "runtime", Changed: 25, Old: "0"},
6466
{Name: "winreadlinkvolume", Package: "os", Changed: 23, Old: "0"},
6567
{Name: "winsymlink", Package: "os", Changed: 23, Old: "0"},
6668
{Name: "x509keypairleaf", Package: "crypto/tls", Changed: 23, Old: "0"},

0 commit comments

Comments
 (0)