Requirements

c++17 compatible compiler for running c++ version

# Install miniconda and activate the environment
# https://www.anaconda.com/docs/getting-started/miniconda/install
pip3 install open3d
pip3 install pybind11
sudo apt-get install zlib1g-dev
# You have to install within conda. If you install in system python, cmake will have issues finding the pybind11 headers. see CI.yml for cmake command.

This repository contains a Python script that uses the Open3D library to visualize point cloud data. The script loads a point cloud from a file, applies voxel downsampling, and visualizes the original and downsampled point clouds side by side. There are three main scripts in this repository:

minuet_mapping.py and c++ version minuet_trace_cpp (6x faster than python version)
mem_trace_reader.py (for reading the memory trace file generated by the minuet_mapping.py and c++ version)
read_pcl.py (for reading the point cloud data and visualizing it using Open3D. Can hookup trace reader using pybind)

Point cloud mapping tracer

# Python version. Feature complete.
python3 minuet_trace.py --config config.json
# c++ version. Scatter pending
cd c++; cmake -B build; cd build; make
./minuet_trace_cpp --config ../config.json

Output files produced:**

ls out/
checksums.json       kernel_map.bin       metadata.bin.gz
gather_trace.bin.gz  kernel_map.bin.gz    scatter_trace.bin.gz
gemms.bin.gz         map_trace.bin.gz

# To read map trace file
python3 mem_trace_reader.py --trace-file out/map_trace.bin.gz --size

# To read gemm list

$ python3 gemms_trace_reader.py --trace-file ./out/gemms.bin.gz
Successfully loaded configuration from /Users/ashriram/Desktop/minuet/config.json
{'num_offsets': 1, 'gemm_M': 4, 'gemm_N': 1, 'padding': 0}
{'num_offsets': 2, 'gemm_M': 4, 'gemm_N': 2, 'padding': 2}
{'num_offsets': 2, 'gemm_M': 4, 'gemm_N': 2, 'padding': 2}

# Gather reader
python3 mem_trace_reader.py --trace-file out/gather_trace.bin.gz --sizeof-addr 8

# Scatter reader
python3 mem_trace_reader.py --trace-file out/scatter_trace.bin.gz --sizeof-addr 8

map_trace.bin.gz: Memory access trace for the mapping phase.
kernel_map.bin.gz: Kernel map data (not a trace)
gather_trace.bin.gz: Gather phase.
scatter_trace.bin.gz: Scatter phase.
gemms.bin.gz: GEMM operations trace.
metadata.bin.gz: A compressed binary file containing metadata about the simulation, such as the number of threads, sizes of various data structures, and base addresses for tensors.

Main timing simulation loop goes in phases MAP->GATHER->GEMM (one line at a time)->SCATTER

Config file**

{
  "NUM_THREADS": 8,
  "SIZE_KEY": 4,
  "SIZE_INT": 4,
  "SIZE_WEIGHT": 4,
  "I_BASE": "0x10000000",
  "QK_BASE": "0x20000000",
  "QI_BASE": "0x30000000",
  "QO_BASE": "0x40000000",
  "PIV_BASE": "0x50000000",
  "KM_BASE": "0x60000000",
  "WO_BASE": "0x80000000",
  "IV_BASE": "0x100000000",
  "WV_BASE": "0xF00000000",
  "GEMM_ALIGNMENT": 4,
  "GEMM_WT_GROUP": 2,
  "GEMM_SIZE": 4,
  "NUM_TILES_GATHER": 4,
  "TILE_FEATS_GATHER": 16,
  "BULK_FEATS_GATHER": 4,
  "N_THREADS_GATHER": 1,
  "debug": true,
  "output_dir": "./trace_out_python/",
  "NUM_PIVOTS": 2
}

Description of the parameters:

NUM_THREADS: Number of virtual threads to simulate for the mapping phase.
SIZE_KEY: Size of a key in bytes (e.g., for coordinates).
SIZE_INT: Size of an integer in bytes.
SIZE_WEIGHT: Size of a weight value in bytes.
I_BASE, QK_BASE, QI_BASE, QO_BASE, PIV_BASE, KM_BASE, WO_BASE, IV_BASE, WV_BASE: Base addresses for different data structures (tensors) used in the simulation, defined in hexadecimal format. I: Input, QK: Query Keys, PIV: Pivot Keys, KM: Kernel Map, IV: Input Feature Vectors, GEMM_BASE: Buffers for GEMM
GEMM_ALIGNMENT: Target matrix size for GEMM; number of inputs fused, GEMM_WT_GROUP: Max number of weights per group (break out condition for groups)

This project simulates the memory access patterns of a Minuet-like kernel map algorithm. It includes both a c++ version and a python version. The simulation employs a fixed pool of virtual threads to model parallel processing. The primary output is a compressed binary file (map_trace.bin.gz python version or map_trace_cpp.bin.gz c++ version) containing the recorded memory accesses, which can be used for analyzing memory behavior, locality, and potential bottlenecks.

Input Format The input to the simulation is a set of 3D coordinates representing points in space. These coordinates are quantized and packed into 12-bit integer keys. The input data is generated within the simulation itself, so no external input files are required.

Output Trace Format.

Section	Field	Data Type	Description
Header	Entry Count	`uint32_t`	Total number of trace entries that follow
Trace Entry	`phase_id`	`uint8_t`	Integer ID representing the computational phase during which the access occurred. Dynamically assigned during the `write_gmem_trace` function, and the mapping is printed to the console
	`thread_id`	`uint8_t`	The ID of the virtual thread that performed the memory access
	`op_id`	`uint8_t`	Integer ID for the memory operation type (0 for Read 'R', 1 for Write 'W')
	`tensor_id`	`uint8_t`	Integer ID representing the tensor region being accessed
	`addr_int`	`uint32_t`	The memory address that was accessed. Note that addresses are truncated to 32 bits in the output file

NOTE THAT THIS IS JUST THE MAPPING PHASE SO 32 BIT ADDRESS SPACE IS ENOUGH FOR THE MEMORY TRACE FILE. IF WE ARE FETCHING ACTUAL FEATURE VECTORS, THE ADDRESS SPACE WILL NEED TO BE 64 BIT.

Input format

coords: A Python list of tuples, where each tuple (x, y, z) represents the integer coordinates of a point. Example: coords = [(1,5,0),(0,1,1),(0,0,2),(0,0,3)]
offsets: A Python list of 3D offset tuples (dx, dy, dz). These offsets are applied to the input coordinates to generate query points. Example: offsets = [(dx,dy,dz) for dx in (-1,0,1) for dy in (-1,0,1) for dz in (-1,0,1)]

Parameter	Description	Default Value
`debug`	(boolean) If True, enables additional print statements for debugging.	`True`
`NUM_THREADS`	(integer) The number of virtual threads to simulate for parallel processing.	`4`
`I_TILES`	(integer) The number of input elements per tile, used in the `make_tiles_and_pivots` used during the backward binary search phase	`2`
Tensor Base Addresses	These hexadecimal values define the starting memory addresses for different data structures (tensors) used in the simulation.
`I_BASE`	Base address for input point coordinates.	`0x10000000`
`QK_BASE`	Base address for query keys.	`0x20000000`
`QI_BASE`	Base address for query input-index array.	`0x30000000`
`QO_BASE`	Base address for query offset-index array.	`0x40000000`
`PIV_BASE`	Base address for tile pivot keys.	`0x50000000`
`TILE_BASE`	Alias for `I_BASE`, used for tile data reads.	`0x10000000`
`KM_BASE`	Base address for kernel map writes.	`0x60000000`
`WO_BASE`	Base address for reading weight offsets.	`0x80000000`
`IV_BASE`	Base address for input feature vectors. Currently unused.	`0x100000000`
`WV_BASE`	Base address for weight values.	`0x800000000`
Data Type Sizes
`SIZE_KEY`	(integer) Size of a key in bytes (e.g., for coordinates).	`4` (32-bit)
`SIZE_INT`	(integer) Size of an integer in bytes.	`4`
`SIZE_WEIGHT`	(integer) Size of a weight value in bytes.	`4`

Core Simulation Phases

Simulates the following key phases:

Radix Sort Simulation (Radix-Sort & Dedup):
- Input 3D coordinates are quantized and packed into 32-bit integer keys.
- Memory access patterns for a radix sort are simulated on these keys (reads from an input array, writes to an auxiliary array, then swap).
- The keys are then explicitly sorted using std::sort and deduplicated to obtain unique input keys.
Build Queries (Build-Queries):
- Query keys are generated by applying a set of 3D offsets to each unique input key.
- Associated metadata arrays are created: qii (query input-index), qki (query kernel/offset-index), and woffs (packed weight-offset keys).
Sort Query Keys (Sort-QKeys):
- This phase, mirroring the Python script's logic, currently involves assigning the previously generated query keys. No actual sorting operation is performed within this named phase in the C++ version, matching the Python script's behavior.
Tile and Pivot Generation (Tile-Pivots):
- The unique sorted input keys are partitioned into smaller collections called "tiles."
- The first key of each tile is stored as a "pivot." Memory write accesses for these pivot keys are recorded.
Lookup (Lookup):
- This is the main multi-threaded phase where query keys are matched against the tiled input data.
- The workload (queries) is divided among NUM_THREADS worker threads.
- Each thread processes its assigned queries:
  - It performs a backward search on the global pivot keys to efficiently identify a relevant tile for the current query key.
  - It then performs a forward scan within that specific tile to find an exact match for the query key.
  - If a match is found, an entry detailing the match (target coordinates, original input coordinates, offset coordinates) is added to a shared KernelMap data structure.
- Memory accesses (reads for query keys, pivot keys, tile data; writes for kernel map entries) are recorded by each thread into a thread-local trace buffer. These local traces are then merged into the global memory trace vector, protected by a mutex. The KernelMap is also protected by a separate mutex.

Building the c++ Project

Prerequisites

A C++17 compliant compiler (e.g., GCC, Clang, MSVC).
CMake (version 3.10 or higher).
Zlib development libraries (for gzip compression).
- On Debian/Ubuntu: sudo apt-get install zlib1g-dev
- On macOS (using Homebrew): brew install zlib

Build Steps

Clone or download the project source code.
Create a build directory and navigate into it:
```
mkdir build
cd build
```
Run CMake to configure the project:
```
cmake ..
```
If Zlib is installed in a non-standard location, you might need to help CMake find it (e.g., cmake .. -DCMAKE_PREFIX_PATH=/path/to/zlib_install_dir).
Compile the project:
```
make
```

Running the Simulation

After a successful build, an executable named minuet_trace_cpp

To run the simulation:

./minuet_trace_cpp --config ../config.json

The program will: Print information about each phase to the console. If debug is enabled (default is true), print detailed outputs like sorted unique keys, segmented query arrays, and the final kernel map. Print all recorded memory trace entries to the console. Generate the map_trace_cpp.bin.gz file in the current working directory (which will be the build directory if run from there). Project Structure

minuet_trace_cpp/
├── CMakeLists.txt          # CMake build script
├── include/
│   └── minuet_trace.hpp    # Header file with declarations and template definitions
└── src/
    ├── main.cpp            # Main application logic, orchestrates simulation phases
    └── minuet_trace.cpp    # Implementation of core simulation functions

Trace Reader

This script reads the memory trace file generated by the Minuet-like kernel map simulation. It provides functionality to filter the trace by computational phase or memory operation type (read/write) and visualize the memory access patterns using Open3D.

# How to run
$ python mem_trace_reader.py map_trace.bin.gz --sizeof-addr 4 (or 8 for gather/scatter traces)

# # Filter trace by phase
$ python mem_trace_reader.py map_trace.bin.gz --filter-phase "Lookup"

# # Filter trace by operation
$ python mem_trace_reader.py map_trace.bin.gz --filter-op "W"

# # Generate memory access pattern plots
$ python mem_trace_reader.py map_trace.bin.gz --plot

# # Save plot to file
$ python mem_trace_reader.py map_trace.bin.gz --plot --plot-file memory_access.png

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
c++		c++
docs		docs
examples		examples
.gitignore		.gitignore
README.md		README.md
config.json		config.json
coord.py		coord.py
gemms_trace_reader.py		gemms_trace_reader.py
kernel_map_reader.py		kernel_map_reader.py
mem_trace_reader.py		mem_trace_reader.py
metadata_reader.py		metadata_reader.py
minuet_config.py		minuet_config.py
minuet_gather.py		minuet_gather.py
minuet_mapping.py		minuet_mapping.py
minuet_trace.py		minuet_trace.py
minuet_utils.py		minuet_utils.py
read_pcl.py		read_pcl.py
sorted_dict.py		sorted_dict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Requirements

Point cloud mapping tracer

Output files produced:**

Config file**

Output Trace Format.

Input format

Core Simulation Phases

Building the c++ Project

Prerequisites

Build Steps

Running the Simulation

Trace Reader

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

sfu-arch/minuet-tracer

Folders and files

Latest commit

History

Repository files navigation

Requirements

Point cloud mapping tracer

Output files produced:**

Config file**

Output Trace Format.

Input format

Core Simulation Phases

Building the c++ Project

Prerequisites

Build Steps

Running the Simulation

Trace Reader

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages