Skip to content

Add TBE data configuration reporter to TBE forward (v3) #4455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gchalump
Copy link
Contributor

@gchalump gchalump commented Jul 8, 2025

Summary:
Re-land attempt of D75462895

Add TBE data configuration reporter to TBE forward call.

The reporter reports TBE data configuration at the SplitTableBatchedEmbeddingBagsCodegen forward call. The output is a TBEDataConfig object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

Just Knobs for enablement

Environment Variables


The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  • FBGEMM_REPORT_INPUT_PARAMS_INTERVAL:

    • Description: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
    • Example Value: 1 (report every iteration)
  • FBGEMM_REPORT_INPUT_PARAMS_ITER_START:

    • *Description: Specifies the start of the iteration range to capture reports. Default 0.
    • *Example Value: 0 (start reporting from the first iteration)
-   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
  -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
  -   ***Example Value**: `-1` (report until the last iteration)
  • FBGEMM_REPORT_INPUT_PARAMS_BUCKET:

    • Description: Specifies the name of the Manifold bucket where the report data will be saved.
    • Example Value: tlparse_reports
  • FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX:

    • Description: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    • Example Value: tree/tests/

Example Usage


Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2

Explanation

The above setting will report iter 3 and iter 5

  • FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2: The reporter will generate a report every 2 iterations.
  • FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0: The reporter will start generating reports from the first iteration.
  • FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default): The reporter will continue to generate reports until the last iteration interval.
  • FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports: The reports will be saved in the tlparse_reports bucket.
  • FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/: The reports will be stored with the path prefix tree/tests/. For Manifold make sure all folders within the path exist.

Note on Benchmark example

Note that with the --iters 2 option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.



Other includes changes in this Diff:

  • Updates build dependency of tbe_data_config* files
  • Remove shutil and numpy.random lib as it cause uncompatiblity error.
  • Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907

Copy link

netlify bot commented Jul 8, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit e4fd0fe
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68714b9881741300089b0b83
😎 Deploy Preview https://deploy-preview-4455--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76992907

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 8, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
@gchalump gchalump force-pushed the export-D76992907 branch from 729330e to 866a2b1 Compare July 8, 2025 23:26
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76992907

@gchalump gchalump force-pushed the export-D76992907 branch from 866a2b1 to 66ed94d Compare July 9, 2025 22:05
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 9, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 9, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
@gchalump gchalump force-pushed the export-D76992907 branch from 66ed94d to 0a3c0d1 Compare July 9, 2025 22:06
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76992907

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 9, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
@gchalump gchalump force-pushed the export-D76992907 branch from 0a3c0d1 to acc065f Compare July 9, 2025 22:10
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76992907

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 9, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
@gchalump gchalump force-pushed the export-D76992907 branch 2 times, most recently from df98c37 to 882ed07 Compare July 10, 2025 15:53
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 10, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 10, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76992907

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 10, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76992907

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 10, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
@gchalump gchalump force-pushed the export-D76992907 branch 2 times, most recently from 55238b1 to 14cc1b8 Compare July 11, 2025 17:29
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 11, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Jul 11, 2025
Summary:
X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76992907

Summary:
X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`
## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D76992907
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76992907

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants