Skip to content

Performance Investigation

Sherlock edited this page Mar 12, 2021 · 21 revisions

Profiling Tools

  • nvprof

    • try run with/without --print-gpu-summary
    • try --profile-child-processes
    • Action: profile a training run
  • Visual Profiler UI

    • Use ruler to measure a time span
    • Identify the top hitters in kernels
    • Compare two sets of profiling results to identify the performance gap
    • Can you identify the start/end of a train_step from the timeline view?
  • torch profiler

  • Linux perf

Subtopic 3

CUDA Kernels Optimization

ExecutionProvider

What is execution provider? What problems does it solve?(execution_provider.h)

CPU and CUDA are the most commonly used EPs in training (cpu/cuda_execution_provider.cc)

How to register execution provider into a session? or in ortmodule interface?

What's the functionality of ExecutionProvider::GetCapability()?

Clone this wiki locally