Skip to content

Commit 5aa0c82

Browse files
committed
Merge bitcoin/bitcoin#25325: Add pool based memory resource
9f947fc Use PoolAllocator for CCoinsMap (Martin Leitner-Ankerl) 5e4ac5a Call ReallocateCache() on each Flush() (Martin Leitner-Ankerl) 1afca6b Add PoolResource fuzzer (Martin Leitner-Ankerl) e19943f Calculate memory usage correctly for unordered_maps that use PoolAllocator (Martin Leitner-Ankerl) b8401c3 Add pool based memory resource & allocator (Martin Leitner-Ankerl) Pull request description: A memory resource similar to `std::pmr::unsynchronized_pool_resource`, but optimized for node-based containers. The goal is to be able to cache more coins with the same memory usage, and allocate/deallocate faster. This is a reimplementation of #22702. The goal was to implement it in a way that is simpler to review & test * There is now a generic `PoolResource` for allocating/deallocating memory. This has practically the same API as `std::pmr::memory_resource`. (Unfortunately I cannot use std::pmr because libc++ simply doesn't implement that API). * Thanks to sipa there is now a fuzzer for PoolResource! On a fast machine I ran it for ~770 million executions without finding any issue. * The estimation of the correct node size is now gone, PoolResource now has multiple pools and just needs to be created large enough to have space for the unordered_map nodes. I ran benchmarks with #22702, mergebase, and this PR. Frequency locked Intel i7-8700, clang++ 13.0.1 to reindex up to block 690000. ```sh bitcoind -dbcache=5000 -assumevalid=00000000000000000002a23d6df20eecec15b21d32c75833cce28f113de888b7 -reindex-chainstate -printtoconsole=0 -stopatheight=690000 ``` The performance is practically identical with #22702, just 0.4% slower. It's ~21% faster than master: ![Progress in Million Transactions over Time(2)](https://user-images.githubusercontent.com/14386/173288685-91952ade-f304-4825-8bfb-0725a71ca17b.png) ![Size of Cache in MiB over Time](https://user-images.githubusercontent.com/14386/173291421-e6b410be-ac77-479b-ad24-5fafcebf81eb.png) Note that on cache drops mergebase's memory doesnt go so far down because it does not free the `CCoinsMap` bucket array. ![Size of Cache in Million tx over Time(1)](https://user-images.githubusercontent.com/14386/173288703-a80c9c9e-93c8-4a16-9df8-610c89c61cc4.png) ACKs for top commit: LarryRuane: ACK 9f947fc achow101: re-ACK 9f947fc john-moffett: ACK 9f947fc jonatack: re-ACK 9f947fc Tree-SHA512: 48caf57d1775875a612b54388ef64c53952cd48741cacfe20d89049f2fb35301b5c28e69264b7d659a3ca33d4c714d47bafad6fd547c4075f08b45acc87c0f45
2 parents 3a93957 + 9f947fc commit 5aa0c82

16 files changed

+1004
-24
lines changed

src/Makefile.am

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,7 @@ BITCOIN_CORE_H = \
260260
shutdown.h \
261261
signet.h \
262262
streams.h \
263+
support/allocators/pool.h \
263264
support/allocators/secure.h \
264265
support/allocators/zeroafterfree.h \
265266
support/cleanse.h \

src/Makefile.bench.include

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ bench_bench_bitcoin_SOURCES = \
4242
bench/nanobench.h \
4343
bench/peer_eviction.cpp \
4444
bench/poly1305.cpp \
45+
bench/pool.cpp \
4546
bench/prevector.cpp \
4647
bench/rollingbloom.cpp \
4748
bench/rpc_blockchain.cpp \

src/Makefile.test.include

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ BITCOIN_TESTS =\
116116
test/pmt_tests.cpp \
117117
test/policy_fee_tests.cpp \
118118
test/policyestimator_tests.cpp \
119+
test/pool_tests.cpp \
119120
test/pow_tests.cpp \
120121
test/prevector_tests.cpp \
121122
test/raii_event_tests.cpp \
@@ -301,6 +302,7 @@ test_fuzz_fuzz_SOURCES = \
301302
test/fuzz/partially_downloaded_block.cpp \
302303
test/fuzz/policy_estimator.cpp \
303304
test/fuzz/policy_estimator_io.cpp \
305+
test/fuzz/poolresource.cpp \
304306
test/fuzz/pow.cpp \
305307
test/fuzz/prevector.cpp \
306308
test/fuzz/primitives_transaction.cpp \

src/Makefile.test_util.include

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ TEST_UTIL_H = \
1515
test/util/logging.h \
1616
test/util/mining.h \
1717
test/util/net.h \
18+
test/util/poolresourcetester.h \
1819
test/util/random.h \
1920
test/util/script.h \
2021
test/util/setup_common.h \

src/bench/pool.cpp

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
// Copyright (c) 2022 The Bitcoin Core developers
2+
// Distributed under the MIT software license, see the accompanying
3+
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
4+
5+
#include <bench/bench.h>
6+
#include <support/allocators/pool.h>
7+
8+
#include <unordered_map>
9+
10+
template <typename Map>
11+
void BenchFillClearMap(benchmark::Bench& bench, Map& map)
12+
{
13+
size_t batch_size = 5000;
14+
15+
// make sure each iteration of the benchmark contains exactly 5000 inserts and one clear.
16+
// do this at least 10 times so we get reasonable accurate results
17+
18+
bench.batch(batch_size).minEpochIterations(10).run([&] {
19+
auto rng = ankerl::nanobench::Rng(1234);
20+
for (size_t i = 0; i < batch_size; ++i) {
21+
map[rng()];
22+
}
23+
map.clear();
24+
});
25+
}
26+
27+
static void PoolAllocator_StdUnorderedMap(benchmark::Bench& bench)
28+
{
29+
auto map = std::unordered_map<uint64_t, uint64_t>();
30+
BenchFillClearMap(bench, map);
31+
}
32+
33+
static void PoolAllocator_StdUnorderedMapWithPoolResource(benchmark::Bench& bench)
34+
{
35+
using Map = std::unordered_map<uint64_t,
36+
uint64_t,
37+
std::hash<uint64_t>,
38+
std::equal_to<uint64_t>,
39+
PoolAllocator<std::pair<const uint64_t, uint64_t>,
40+
sizeof(std::pair<const uint64_t, uint64_t>) + 4 * sizeof(void*),
41+
alignof(void*)>>;
42+
43+
// make sure the resource supports large enough pools to hold the node. We do this by adding the size of a few pointers to it.
44+
auto pool_resource = Map::allocator_type::ResourceType();
45+
auto map = Map{0, std::hash<uint64_t>{}, std::equal_to<uint64_t>{}, &pool_resource};
46+
BenchFillClearMap(bench, map);
47+
}
48+
49+
BENCHMARK(PoolAllocator_StdUnorderedMap, benchmark::PriorityLevel::HIGH);
50+
BENCHMARK(PoolAllocator_StdUnorderedMapWithPoolResource, benchmark::PriorityLevel::HIGH);

src/coins.cpp

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ size_t CCoinsViewBacked::EstimateSize() const { return base->EstimateSize(); }
3434

3535
CCoinsViewCache::CCoinsViewCache(CCoinsView* baseIn, bool deterministic) :
3636
CCoinsViewBacked(baseIn), m_deterministic(deterministic),
37-
cacheCoins(0, SaltedOutpointHasher(/*deterministic=*/deterministic))
37+
cacheCoins(0, SaltedOutpointHasher(/*deterministic=*/deterministic), CCoinsMap::key_equal{}, &m_cache_coins_memory_resource)
3838
{}
3939

4040
size_t CCoinsViewCache::DynamicMemoryUsage() const {
@@ -253,9 +253,12 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
253253

254254
bool CCoinsViewCache::Flush() {
255255
bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase=*/true);
256-
if (fOk && !cacheCoins.empty()) {
257-
/* BatchWrite must erase all cacheCoins elements when erase=true. */
258-
throw std::logic_error("Not all cached coins were erased");
256+
if (fOk) {
257+
if (!cacheCoins.empty()) {
258+
/* BatchWrite must erase all cacheCoins elements when erase=true. */
259+
throw std::logic_error("Not all cached coins were erased");
260+
}
261+
ReallocateCache();
259262
}
260263
cachedCoinsUsage = 0;
261264
return fOk;
@@ -314,7 +317,9 @@ void CCoinsViewCache::ReallocateCache()
314317
// Cache should be empty when we're calling this.
315318
assert(cacheCoins.size() == 0);
316319
cacheCoins.~CCoinsMap();
317-
::new (&cacheCoins) CCoinsMap(0, SaltedOutpointHasher(/*deterministic=*/m_deterministic));
320+
m_cache_coins_memory_resource.~CCoinsMapMemoryResource();
321+
::new (&m_cache_coins_memory_resource) CCoinsMapMemoryResource{};
322+
::new (&cacheCoins) CCoinsMap{0, SaltedOutpointHasher{/*deterministic=*/m_deterministic}, CCoinsMap::key_equal{}, &m_cache_coins_memory_resource};
318323
}
319324

320325
void CCoinsViewCache::SanityCheck() const

src/coins.h

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
#include <memusage.h>
1212
#include <primitives/transaction.h>
1313
#include <serialize.h>
14+
#include <support/allocators/pool.h>
1415
#include <uint256.h>
1516
#include <util/hasher.h>
1617

@@ -131,7 +132,23 @@ struct CCoinsCacheEntry
131132
CCoinsCacheEntry(Coin&& coin_, unsigned char flag) : coin(std::move(coin_)), flags(flag) {}
132133
};
133134

134-
typedef std::unordered_map<COutPoint, CCoinsCacheEntry, SaltedOutpointHasher> CCoinsMap;
135+
/**
136+
* PoolAllocator's MAX_BLOCK_SIZE_BYTES parameter here uses sizeof the data, and adds the size
137+
* of 4 pointers. We do not know the exact node size used in the std::unordered_node implementation
138+
* because it is implementation defined. Most implementations have an overhead of 1 or 2 pointers,
139+
* so nodes can be connected in a linked list, and in some cases the hash value is stored as well.
140+
* Using an additional sizeof(void*)*4 for MAX_BLOCK_SIZE_BYTES should thus be sufficient so that
141+
* all implementations can allocate the nodes from the PoolAllocator.
142+
*/
143+
using CCoinsMap = std::unordered_map<COutPoint,
144+
CCoinsCacheEntry,
145+
SaltedOutpointHasher,
146+
std::equal_to<COutPoint>,
147+
PoolAllocator<std::pair<const COutPoint, CCoinsCacheEntry>,
148+
sizeof(std::pair<const COutPoint, CCoinsCacheEntry>) + sizeof(void*) * 4,
149+
alignof(void*)>>;
150+
151+
using CCoinsMapMemoryResource = CCoinsMap::allocator_type::ResourceType;
135152

136153
/** Cursor for iterating over CoinsView state */
137154
class CCoinsViewCursor
@@ -220,6 +237,7 @@ class CCoinsViewCache : public CCoinsViewBacked
220237
* declared as "const".
221238
*/
222239
mutable uint256 hashBlock;
240+
mutable CCoinsMapMemoryResource m_cache_coins_memory_resource{};
223241
mutable CCoinsMap cacheCoins;
224242

225243
/* Cached dynamic memory usage for the inner Coin objects. */

src/memusage.h

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77

88
#include <indirectmap.h>
99
#include <prevector.h>
10+
#include <support/allocators/pool.h>
1011

1112
#include <cassert>
1213
#include <cstdlib>
@@ -166,6 +167,25 @@ static inline size_t DynamicUsage(const std::unordered_map<X, Y, Z>& m)
166167
return MallocUsage(sizeof(unordered_node<std::pair<const X, Y> >)) * m.size() + MallocUsage(sizeof(void*) * m.bucket_count());
167168
}
168169

170+
template <class Key, class T, class Hash, class Pred, std::size_t MAX_BLOCK_SIZE_BYTES, std::size_t ALIGN_BYTES>
171+
static inline size_t DynamicUsage(const std::unordered_map<Key,
172+
T,
173+
Hash,
174+
Pred,
175+
PoolAllocator<std::pair<const Key, T>,
176+
MAX_BLOCK_SIZE_BYTES,
177+
ALIGN_BYTES>>& m)
178+
{
179+
auto* pool_resource = m.get_allocator().resource();
180+
181+
// The allocated chunks are stored in a std::list. Size per node should
182+
// therefore be 3 pointers: next, previous, and a pointer to the chunk.
183+
size_t estimated_list_node_size = MallocUsage(sizeof(void*) * 3);
184+
size_t usage_resource = estimated_list_node_size * pool_resource->NumAllocatedChunks();
185+
size_t usage_chunks = MallocUsage(pool_resource->ChunkSizeBytes()) * pool_resource->NumAllocatedChunks();
186+
return usage_resource + usage_chunks + MallocUsage(sizeof(void*) * m.bucket_count());
169187
}
170188

189+
} // namespace memusage
190+
171191
#endif // BITCOIN_MEMUSAGE_H

0 commit comments

Comments
 (0)