Condition for garbage collection #159

penguin-wwy · 2021-12-08T09:21:01Z

penguin-wwy
Dec 8, 2021

Brief summary

Inc GC count in certain places (where the ref is added) to advance GC startup can get improved GC behavior, as well as better CPU and memory performance.

I collected the triggers of garbage collection on the pyperformance (running with --process 1),

	total of gc scan object	total unreachable objects	Collection rate	gc count
2to3	632894	189410	29.93%	347
chameleon	97061	376	0.39%	4
chaos	51337	340	0.66%	4
deltablue	235147	122109	51.93%	176
django_template	163196	243	0.15%	4
dulwich_log	65010	332	0.51%	4
fannkuch	51047	293	0.57%	4
float	1197364	38193	3.19%	596
go	54422	4374	8.04%	8
hexiom	808	128	15.84%	1
json_dumps	51115	320	0.63%	4
json_loads	51169	322	0.63%	4
mako	66121	752	1.14%	4
nqueens	51540	1064	2.06%	16
pathlib	688	32	4.65%	1
pidigits	51068	302	0.59%	4
pyflate	6671	381	5.71%	5
regex_v8	788	125	15.86%	1
richards	51580	317	0.61%	4
sqlalchemy_declarative	496704	26105	5.26%	402
sqlalchemy_imperative	134419	89508	66.59%	90
tornado_http	129716	4993	3.85%	78
xml_etree_process	296655	22528	7.59%	354

The collection rate for many cases is very low, indicating that a lot of objects were scanned, but collection occurred on a small portion.

Most objects can be cleaned by simple reference counting, gc handle unreachable reference cycles.

Count for gc trigger increases in _PyObject_GC_Alloc, this represents an assumption that the more objects alloc, the more reference cycles. However, the correlation between the two may be very weak.

gcstate->generations[0].count > gcstate->generations[0].threshold

Theoretically, only if a container object ref to another container object, reference cycles maybe happen. So, I added a new metric to the statistics.

inc(gc.count) :- store_attr(owner, name, value),  is_container_obj(value)

The above code indicates if a reference assignment occurs and the value is a container object, gc count inc.
In CPython is equivalent to the following determination.

int
PyObject_SetAttr(PyObject *v, PyObject *name, PyObject *value)
{
    ...
    if (Py_TYPE(value)->tp_flags & Py_TPFLAGS_HAVE_GC) {
        _PyThreadState_GET()->interp->gc.generations[0].count ++;
    }
    ...
}

A similar case is list_append, dict_set_item, and so on.

The following table shows the performance when the threshold is set to ( 10_000, 10, 10 )

	total of gc scan object	total unreachable objects	Collection rate	gc count
2to3	518375 (down)	182269	35.16% (up)	121
chameleon	98826	1382	1.40% (up)	20
chaos	47361 (down)	7136	15.07% (up)	112
deltablue	84956 (down)	31257	36.79%	76
django_template	166484	595	0.36% (up)	8
dulwich_log	53202 (down)	2675	5.03% (up)	40
fannkuch	41081 (down)	573	1.39% (up)	8
float	735463 (down)	11069	1.51%	172
go	65590	24091	36.73% (up)	316
hexiom	24930	1285	5.15%	19
json_dumps	41159 (down)	576	1.40% (up)	8
json_loads	41400 (down)	1085	2.62% (up)	16
mako	50783 (down)	1523	3.00% (up)	8
nqueens	42524 (down)	2365	5.56% (up)	36
pathlib	32327	2432	7.52% (up)	40
pidigits	41102 (down)	576	1.40% (up)	8
pyflate	46365	17277	37.26% (up)	269
regex_v8	1011	125	12.36%	1
richards	41849 (down)	829	1.98% (up)	12
sqlalchemy_declarative	56865 (down)	5688	10.00% (up)	86
sqlalchemy_imperative	49927 (down)	44266	88.66% (up)	37
tornado_http	100859 (down)	3581	3.55%	55
xml_etree_process	189226 (down)	4064	2.15%	64

The CPU performance also improved.

./bin/python3 -m pyperf compare_to -G --table ../main/main.json opt_mix1000.json --min-speed 2 --table-format md

Benchmark	main	opt_mix1000
xml_etree_parse	162 ms	136 ms: 1.19x faster
scimark_lu	124 ms	115 ms: 1.08x faster
sqlalchemy_declarative	136 ms	127 ms: 1.07x faster
unpickle_list	5.59 us	5.22 us: 1.07x faster
spectral_norm	115 ms	107 ms: 1.07x faster
float	91.9 ms	87.1 ms: 1.06x faster
sqlalchemy_imperative	21.7 ms	20.6 ms: 1.05x faster
xml_etree_iterparse	106 ms	101 ms: 1.05x faster
deltablue	4.71 ms	4.49 ms: 1.05x faster
pidigits	216 ms	206 ms: 1.05x faster
python_startup	7.13 ms	6.89 ms: 1.03x faster
scimark_sparse_mat_mult	4.87 ms	4.71 ms: 1.03x faster
python_startup_no_site	5.17 ms	5.02 ms: 1.03x faster
json_loads	29.5 us	28.6 us: 1.03x faster
richards	58.8 ms	57.3 ms: 1.03x faster
pickle_dict	29.8 us	29.1 us: 1.03x faster
telco	7.12 ms	6.95 ms: 1.02x faster
regex_v8	25.4 ms	25.9 ms: 1.02x slower
nqueens	87.5 ms	90.5 ms: 1.03x slower
unpickle_pure_python	272 us	282 us: 1.04x slower
chaos	85.1 ms	90.0 ms: 1.06x slower
unpickle	15.7 us	16.7 us: 1.06x slower
Geometric mean	(ref)	1.01x faster

I'm testing the memory footprint and a simple test shows no significant increase.
In addition, maybe more precise metrics and thresholds to get better results.

ericsnowcurrently · 2021-12-08T16:17:31Z

ericsnowcurrently
Dec 8, 2021
Maintainer

Thanks for the info! It looks like you put a good amount of time into this analysis, which everyone appreciates.

It sounds like you are saying that by manually advancing the GC generation in certain places we can get improved GC behavior, as well as better CPU and memory performance. Did I understand that right? Just to be sure, would you mind adding a brief summary of your key observation/proposal at the top of your original post, just so there's no confusion? Thanks!

Also, I'm afraid I'm missing some context. Would you mind filling in the gaps for me for the parts below?

I collected the triggers of garbage collection on the pyperformance (running with --process 1),

What is the full command you used?

Theoretically, only if a container object ref to another container object, reference cycles maybe happen. So, I added a new metric to the statistics.
inc(gc.count) :- store_attr(owner, value),  is_container_obj(value)

Is that CPython code or some declarative query or what? Where does that code go?

Threshold set (10000, 10, 10)

Where does one set that?

3 replies

brandtbucher Dec 8, 2021
Maintainer

Threshold set (10000, 10, 10)

Where does one set that?

You can tweak the GC's thresholds like this by calling gc.set_threshold(10_000, 10, 10).

penguin-wwy Dec 9, 2021
Author

@ericsnowcurrently Thanks for your advice :)

Your understanding is right. In short, I was looking for the best time to start GC.
The core question is what kind of metrics better reflect the complexity of the variable reference graph.

Count for gc trigger increases in _PyObject_GC_Alloc, this represents an assumption that the more objects alloc, the more reference cycles.

The current method considers only the number of nodes (which represents the number of gc var).
I include the number of edges (which represents ref between gc variables) because this is the least expensive to achieve.

Other notes

What is the full command you used?

I set gc callback by gc.callbacks.append(gc_call_back) at the start point and clear them at the end case, for each case in pyperformance.

Is that CPython code or some declarative query or what? Where does that code go?

This code represents the logic of my rule (a datalog dialect).
python/cpython@main...penguin-wwy:gc_cond
The code implementation is very simple.

ericsnowcurrently Dec 9, 2021
Maintainer

Thanks for the info.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Condition for garbage collection #159

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Condition for garbage collection #159

Uh oh!

Uh oh!

penguin-wwy Dec 8, 2021

Replies: 1 comment · 3 replies

Uh oh!

ericsnowcurrently Dec 8, 2021 Maintainer

Uh oh!

brandtbucher Dec 8, 2021 Maintainer

Uh oh!

penguin-wwy Dec 9, 2021 Author

Uh oh!

ericsnowcurrently Dec 9, 2021 Maintainer

penguin-wwy
Dec 8, 2021

Replies: 1 comment 3 replies

ericsnowcurrently
Dec 8, 2021
Maintainer

brandtbucher Dec 8, 2021
Maintainer

penguin-wwy Dec 9, 2021
Author

ericsnowcurrently Dec 9, 2021
Maintainer