Skip to content

[BE] Refactor: Modularize extract_source_mappings.py for improved maintainability #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

FindHao
Copy link
Member

@FindHao FindHao commented Jul 25, 2025

The tritonparse/tritonparse/extract_source_mappings.py script was a large, monolithic file containing logic for IR parsing, source mapping, event diffing, and file processing. This made it difficult to read, maintain, and extend.

This pull request refactors the script by breaking it down into smaller, single-responsibility modules. This improves the overall code structure, enhances readability, and makes future development easier.

Description of Changes

The core logic of extract_source_mappings.py has been split into several new modules within the tritonparse/tritonparse/ directory:

  • extract_source_mappings.py (Modified): Now serves as a clean command-line entry point, containing only argument parsing and the main execution call.
  • trace_processor.py (New): The main orchestrator that handles the processing of trace files. It contains the core logic previously found in parse_single_file and parse_single_trace_content.
  • ir_parser.py (New): Contains all functions related to parsing Intermediate Representations (TTIR, TTGIR, PTX, AMDGCN), including loc directive extraction.
  • mapper.py (New): Responsible for creating the bidirectional source mappings between Python source and various IRs.
  • event_diff.py (New): Provides functionality to compare a list of events and generate a summary of their differences.
  • sourcemap_utils.py (New): A collection of general-purpose utility functions (e.g., dictionary flattening) used by the new modules.

Test Plan:

% python -m unittest tests.test_tritonparse -v 
test_convert (tests.test_tritonparse.TestTritonparseCPU.test_convert)
Test convert function with various data types ... ok
test_complex_kernels (tests.test_tritonparse.TestTritonparseCUDA.test_complex_kernels)
A more complex test case involving two distinct Triton kernels, one of which uses autotuning. ... Temporary directory: /tmp/tmphy6i6cw8
--- Testing Matmul Kernel (3 launches) ---
WARNING:tritonparse.structured_logging:fn JitFunctionInfo(module='tests.test_tritonparse', name='TestTritonparseCUDA.test_complex_kernels.<locals>.matmul_kernel', jit_function=JITFunction(tests.test_tritonparse:TestTritonparseCUDA.test_complex_kernels.<locals>.matmul_kernel)) launch_metadata is not None: <function add_launch_metadata at 0x7a6893ca80e0>. It will be overridden by tritonparse.
WARNING:tritonparse.structured_logging:fn JitFunctionInfo(module='tests.test_tritonparse', name='TestTritonparseCUDA.test_complex_kernels.<locals>.matmul_kernel', jit_function=JITFunction(tests.test_tritonparse:TestTritonparseCUDA.test_complex_kernels.<locals>.matmul_kernel)) launch_metadata is not None: <function add_launch_metadata at 0x7a6893ca80e0>. It will be overridden by tritonparse.
Matmul Launch 1 (16x16 @ 16x16) done.
Matmul Launch 2 (32x16 @ 16x32) done.
Matmul Launch 3 (16x32 @ 32x16) done.

--- Testing Fused Op Kernel (4 launches) ---
Fused Op Launch 1: scale=1.0, activation=None
Fused Op Launch 2: scale=2.5, activation=None
Fused Op Launch 3: scale=1.0, activation='relu'
WARNING:tritonparse.structured_logging:fn JitFunctionInfo(module='tests.test_tritonparse', name='TestTritonparseCUDA.test_complex_kernels.<locals>.fused_op_kernel', jit_function=JITFunction(tests.test_tritonparse:TestTritonparseCUDA.test_complex_kernels.<locals>.fused_op_kernel)) launch_metadata is not None: <function add_launch_metadata at 0x7a6893ca80e0>. It will be overridden by tritonparse.
Fused Op Launch 4: scale=1.0, activation='relu', different size
All kernels executed.
tritonparse log file list: /tmp/tmpeo5py734/log_file_list.json
INFO:tritonparse:Copying parsed logs from /tmp/tmpeo5py734 to /tmp/tmphy6i6cw8/parsed_output_complex

================================================================================
📁 TRITONPARSE PARSING RESULTS
================================================================================
📂 Parsed files directory: /tmp/tmphy6i6cw8/parsed_output_complex
📊 Total files generated: 2

📄 Generated files:
--------------------------------------------------
   1. 📝 dedicated_log_triton_trace_findhao__mapped.ndjson.gz (170.3KB)
   2. 📝 log_file_list.json (181B)
================================================================================
✅ Parsing completed successfully!
================================================================================

✓ Generated 1 log files
✓ Generated 2 parsed files
✓ Found 1 .json files and 1 .ndjson.gz files
Checking launch_diff events in dedicated_log_triton_trace_findhao__mapped.ndjson.gz
  Line 402: Found launch_diff event (count: 1)
  Line 897: Found launch_diff event (count: 2)
  Line 1384: Found launch_diff event (count: 3)
  Line 1388: Found launch_diff event (count: 4)
  Line 1392: Found launch_diff event (count: 5)
✓ Total launch_diff events found: 5
✓ Verified 5 launch_diff events in parsed output
✓ Cleaned up temporary directory
ok
test_extract_python_source_info (tests.test_tritonparse.TestTritonparseCUDA.test_extract_python_source_info)
Test extract_python_source_info function ... ok
test_whole_workflow (tests.test_tritonparse.TestTritonparseCUDA.test_whole_workflow)
Test unified_parse functionality ... Temporary directory: /tmp/tmpik7n136f
Found 1 log files in /tmp/tmpik7n136f/logs: ['dedicated_log_triton_trace_findhao_.ndjson']
  Line 1: event_type = 'compilation' (unique hash: 258d1b0a...)
  Line 2: event_type = 'launch' (count: 1)
  Line 3: event_type = 'launch' (count: 2)
Event type counts: {'launch': 2, 'compilation': 1} (unique compilation hashes: 1)
✓ Verified correct event type counts: 1 unique compilation hash, 2 launch events
tritonparse log file list: /tmp/tmp53z4ny7b/log_file_list.json
INFO:tritonparse:Copying parsed logs from /tmp/tmp53z4ny7b to /tmp/tmpik7n136f/parsed_output

================================================================================
📁 TRITONPARSE PARSING RESULTS
================================================================================
📂 Parsed files directory: /tmp/tmpik7n136f/parsed_output
📊 Total files generated: 2

📄 Generated files:
--------------------------------------------------
   1. 📝 dedicated_log_triton_trace_findhao__mapped.ndjson.gz (7.9KB)
   2. 📝 log_file_list.json (181B)
================================================================================
✅ Parsing completed successfully!
================================================================================

✓ Cleaned up temporary directory
ok

----------------------------------------------------------------------
Ran 4 tests in 16.816s

OK

FindHao added 2 commits July 24, 2025 22:29
**Title:** Refactor: Modularize `extract_source_mappings.py` for improved maintainability

### Context

The `tritonparse/tritonparse/extract_source_mappings.py` script was a large, monolithic file containing logic for IR parsing, source mapping, event diffing, and file processing. This made it difficult to read, maintain, and extend.

This pull request refactors the script by breaking it down into smaller, single-responsibility modules. This improves the overall code structure, enhances readability, and makes future development easier.

### Description of Changes

The core logic of `extract_source_mappings.py` has been split into several new modules within the `tritonparse/tritonparse/` directory:

*   **`extract_source_mappings.py` (Modified):** Now serves as a clean command-line entry point, containing only argument parsing and the main execution call.
*   **`trace_processor.py` (New, formerly `processor.py`):** The main orchestrator that handles the processing of trace files. It contains the core logic previously found in `parse_single_file` and `parse_single_trace_content`.
*   **`ir_parser.py` (New):** Contains all functions related to parsing Intermediate Representations (TTIR, TTGIR, PTX, AMDGCN), including `loc` directive extraction.
*   **`mapper.py` (New):** Responsible for creating the bidirectional source mappings between Python source and various IRs.
*   **`event_diff.py` (New, formerly `launch_diff.py`):** Provides functionality to compare a list of events and generate a summary of their differences.
*   **`sourcemap_utils.py` (New):** A collection of general-purpose utility functions (e.g., dictionary flattening) used by the new modules.

Additionally, the `sourcemap_constants.py` file was **removed**, and its contents (regex patterns and constant lists) were moved into the modules where they are directly used (`ir_parser.py` and `event_diff.py`) to reduce the number of files and improve cohesion.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 25, 2025
Copy link

@davidberard98 davidberard98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@facebook-github-bot
Copy link
Contributor

@FindHao has imported this pull request. If you are a Meta employee, you can view this in D78982556.

@facebook-github-bot
Copy link
Contributor

@FindHao merged this pull request in b1f70ba.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants