Skip to content

Global counter for malloc has measurable overhead #141

Open
@qinsoon

Description

@qinsoon

We have a global variable that counts malloc'd bytes and gets updated for every malloc call. If there are multiple threads that are doing malloc, there will be contention and will have measurable overhead.

The following is measured with Julia GCBenchmarks, using the multithreaded benchmarks (using 8 mutator threads). The two builds both return 0 in vm_live_bytes() for a fair comparison, and the build with no-malloc-counter does not have the malloc counter update. The results showed that there is measurable overhead for some benchmarks, e.g. 2% slowdown for mergesort_parallel.

MMTK_MIN_HSIZE=31650 MMTK_MAX_HSIZE=31650 /home/yilin/Code/julia_workspace/julia/julia-mmtk-immix-release-no-malloc-counter/usr/bin/julia --project=/home/yilin/Code/julia_workspace/GCBenchmarks /home/yilin/Code/julia_workspace/GCBenchmarks/run_benchmarks.jl multithreaded mergesort_parallel mergesort_parallel -n 1 --threads=8
total time gc time mutator time total time error
('multithreaded-big_arrays-issue-52937', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') 7328.7 0 7328.7 3.26144
('multithreaded-big_arrays-issue-52937', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') 7345.78 0 7345.78 2.8509
('multithreaded-big_arrays-objarray', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') 7279.05 0 7279.05 7.97443
('multithreaded-big_arrays-objarray', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') 7288.47 0 7288.47 6.95254
('multithreaded-binary_tree-tree_immutable', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') 2233.35 360.83 1872.52 3.61634
('multithreaded-binary_tree-tree_immutable', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') 2231.79 360.56 1871.23 3.18454
('multithreaded-binary_tree-tree_mutable', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') 3130.31 640.23 2490.08 6.81284
('multithreaded-binary_tree-tree_mutable', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') 3132.71 641.74 2490.97 6.62351
('multithreaded-mergesort_parallel-mergesort_parallel', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') 20202.5 0 20202.5 811.654
('multithreaded-mergesort_parallel-mergesort_parallel', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') 20648 0 20648 608.926
('multithreaded-mm_divide_and_conquer-mm_divide_and_conquer', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') 791.47 0 791.47 1.83954
('multithreaded-mm_divide_and_conquer-mm_divide_and_conquer', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') 797.59 0 797.59 1.93677

One way to mitigate this issue is to reduce the frequency of global counter update. We could have a local counter for malloc'd bytes, and only update the global counter for every X bytes allocated (X could be 16K or something).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions