2025_musiclearn_async

Prerequisites

Python 3.6 or higher
Audio files in WAV format
Metadata CSV file with file information
Basic understanding of audio processing concepts

Installation

pip install praat-parselmouth pandas matplotlib numpy

Understanding the Data Structure

Your directory structure should look like this:

project/
├── metadata.csv
├── pitch_contour_extractor.py
├── pitch_normalised_contour_extractor.py
├── audio_subset/
│   ├── student/
│   │   ├── 20220408145911.wav
│   │   ├── 20220407121244.wav
│   │   └── ...
│   └── teacher/
│       ├── 20220330100622.wav
│       ├── 20220329084512.wav
│       └── ...
└── output/
    ├── pitch_plots/
    ├── pitch_plots_normalised/
    ├── pitch_normalised_analysis_summary.csv
    └── pitch_data_extracted.csv

Metadata CSV Format

Your metadata.csv should have the following columns:

s_file	t_file	s_bpm	t_bpm	s_scale	t_scale	ground_truth
20220408145911.wav	20220330100622.wav	110	110	G3	A3	"[['0', '10.33', 'P'], ['3.919', '4.203', 'T']]"

Method 1: Using Parselmouth (Recommended)

Basic Pitch Extraction Function

import parselmouth
import numpy as np

def extract_pitch_contour(audio_file_path, pitch_floor=75, pitch_ceiling=600, time_step=0.01):
    """
    Extract pitch contour from an audio file using Parselmouth/Praat
    """
    try:
        # Load the sound file
        sound = parselmouth.Sound(audio_file_path)
        
        # Extract pitch using Praat's algorithm
        pitch = sound.to_pitch(time_step=time_step, 
                              pitch_floor=pitch_floor, 
                              pitch_ceiling=pitch_ceiling)
        
        # Get the pitch values and times
        pitch_values = pitch.selected_array['frequency']
        times = pitch.xs()
        
        return times, pitch_values
        
    except Exception as e:
        print(f"Error processing {audio_file_path}: {str(e)}")
        return None, None

Complete Processing Pipeline

import pandas as pd
import matplotlib.pyplot as plt
import os

def process_metadata_and_plot(csv_file_path):
    """
    Complete pipeline to process metadata CSV and create pitch plots
    """
    # Read metadata
    df = pd.read_csv(csv_file_path)
    
    # Create output directory
    os.makedirs("pitch_plots", exist_ok=True)
    
    results = []
    
    for idx, row in df.iterrows():
        print(f"Processing pair {idx + 1}/{len(df)}")
        
        # Construct file paths
        student_file = f"audio_subset/student/{row['s_file']}"
        teacher_file = f"audio_subset/teacher/{row['t_file']}"
        
        # Extract pitch contours
        student_times, student_pitch = extract_pitch_contour(student_file)
        teacher_times, teacher_pitch = extract_pitch_contour(teacher_file)
        
        if student_times is not None and teacher_times is not None:
            # Create comparison plot
            create_comparison_plot(
                student_times, student_pitch, row['s_file'], row['s_bpm'], row['s_scale'],
                teacher_times, teacher_pitch, row['t_file'], row['t_bpm'], row['t_scale'],
                idx
            )
            
            # Store results for further analysis
            results.append({
                'index': idx,
                'student_file': row['s_file'],
                'teacher_file': row['t_file'],
                'student_times': student_times,
                'student_pitch': student_pitch,
                'teacher_times': teacher_times,
                'teacher_pitch': teacher_pitch
            })
    
    return results

def create_comparison_plot(s_times, s_pitch, s_file, s_bpm, s_scale,
                          t_times, t_pitch, t_file, t_bpm, t_scale, index):
    """
    Create a comparison plot for student and teacher pitch contours
    """
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
    
    # Clean pitch data (replace 0s with NaN for better plotting)
    s_pitch_clean = s_pitch.copy()
    s_pitch_clean[s_pitch_clean == 0] = np.nan
    
    t_pitch_clean = t_pitch.copy()
    t_pitch_clean[t_pitch_clean == 0] = np.nan
    
    # Plot student pitch contour
    ax1.plot(s_times, s_pitch_clean, 'b-', linewidth=1.5, label='Student')
    ax1.set_title(f"Student: {s_file} (BPM: {s_bpm}, Scale: {s_scale})")
    ax1.set_ylabel('Pitch (Hz)')
    ax1.grid(True, alpha=0.3)
    ax1.legend()
    
    # Plot teacher pitch contour
    ax2.plot(t_times, t_pitch_clean, 'r-', linewidth=1.5, label='Teacher')
    ax2.set_title(f"Teacher: {t_file} (BPM: {t_bpm}, Scale: {t_scale})")
    ax2.set_ylabel('Pitch (Hz)')
    ax2.set_xlabel('Time (s)')
    ax2.grid(True, alpha=0.3)
    ax2.legend()
    
    plt.tight_layout()
    
    # Save plot
    plot_filename = f"pitch_plots/comparison_{index:03d}.png"
    plt.savefig(plot_filename, dpi=300, bbox_inches='tight')
    plt.close()
    
    print(f"Saved: {plot_filename}")

Normalisation Techniques

Why Normalization is Needed - Raw Pitch Output Characteristics:

Student (female): Typically 180-300 Hz range
Teacher (male): Typically 80-180 Hz range
Anatomical differences: Vocal fold size creates systematic frequency differences
Comparison challenge: Raw values cannot be directly compared across speakers

1. Z-Score Normalization (Standardization)

z = (frequency - speaker_mean) / speaker_std_dev

Result: Mean = 0, Standard deviation = 1
Use when: Statistical analysis, machine learning
Advantage: Both speakers on same scale
Caution: Problems with flat contours (low std dev)

2. Semitone Conversion

semitones = 12 * log2(frequency / reference_frequency)

Reference options: Speaker mean, 100 Hz, musical note
Result: Perceptually meaningful intervals
Use when: Musical analysis, cross-linguistic studies
Advantage: Reflects human pitch perception

3. Mean-Centered Hz

centered = frequency - speaker_mean

Result: Relative pitch movement in Hz
Use when: Preserving Hz units is important
Advantage: Simple, interpretable
Best for: Student-teacher prosody comparison

4. Proportion of Range (POR)

POR = (frequency - speaker_min) / (speaker_max - speaker_min)

Result: Values between 0 and 1
Use when: Comparing pitch range utilization

For 44.1kHz, 32-bit Audio Files:

Extract raw pitch first using Praat/Parselmouth
Apply normalization before plotting/analysis
Choose method based on research goals:
- Prosody comparison: Mean-centered Hz or semitones
- Statistical modeling: Z-score normalization
- Cross-speaker analysis: Semitones relative to speaker mean

Best Practice Workflow:

# 1. Extract raw pitch (unnormalized)
pitch = sound.to_pitch()
pitch_values = pitch.selected_array['frequency']

# 2. Remove unvoiced frames (zeros)
voiced_frames = pitch_values[pitch_values > 0]

# 3. Apply normalization
speaker_mean = np.mean(voiced_frames)
normalized_pitch = voiced_frames - speaker_mean  # Mean-centered

# 4. Plot normalized contours

Important Notes:

Raw output is NOT normalized - you must normalize manually
Choose normalization method based on research questions
Document which method used for reproducibility
Consider voicing detection before normalization

DTW Analysis

Data Flow Summary

The DTW pipeline processes your metadata.csv and audio files to produce comprehensive analysis results without requiring intermediate CSV storage. Here's what the pipeline does:

Input Processing

Reads metadata.csv with student-teacher audio file pairs
Extracts pitch contours from WAV files using Parselmouth
Applies semitone normalization relative to each speaker's mean frequency

DTW Analysis Components

Cost Matrix Computation: Uses log-scale distance function
Optimal Path Finding: Dynamic programming algorithm
SARGAM Note Mapping: Maps frequencies to Sa, Re, Ga, Ma, Pa, Dha, Ni
Duration Metrics: Calculates student performance duration only
Cost Aggregation: Both average and maximum methods for each note pair

Output Structure

Per-Pair Results

result = {
    'pair_id': 'pair_0',
    'cost_matrix': numpy_array,           # NxM matrix of log distances
    'accumulated_cost': numpy_array,      # DTW accumulated costs
    'optimal_path': [(i,j), ...],        # List of aligned frame indices
    'total_dtw_cost': float,              # Final DTW cost
    'path_length': int,                   # Number of alignment points
    'student_duration': float,            # Duration in seconds
    'note_correspondences': {
        'student_notes': ['Sa', 'Re', ...],
        'teacher_notes': ['Sa', 'Re', ...],
        'note_pair_costs': {('Sa','Sa'): [0.02, 0.03, ...], ...}
    },
    'cost_aggregation': {
        'average': {('Sa','Sa'): 0.025, ('Sa','Re'): 0.067, ...},
        'max': {('Sa','Sa'): 0.045, ('Sa','Re'): 0.089, ...}
    }
}

Summary CSV Output

The pipeline generates dtw_analysis_results.csv with columns:

pair_id, student_file, teacher_file
total_dtw_cost, path_length, student_duration
avg_cost_Sa_to_Sa, avg_cost_Sa_to_Re, etc.
max_cost_Sa_to_Sa, max_cost_Sa_to_Re, etc.

Usage Example

# Extract pitch data directly
pitch_data = process_metadata_csv('metadata.csv', normalization_method='semitones')

# Initialize DTW analyzer  
dtw_analyzer = DTWAnalyzer(pitch_data)

# Run analysis on all pairs
results = dtw_analyzer.run_full_analysis()

# Visualize first pair
first_pair = list(results.keys())[0]
dtw_analyzer.visualize_cost_matrix(results[first_pair])

Results

Example Plots for `pair_0`

1. Raw Pitch Contour

2. Normalized Pitch Contour

3. Sargam Segmentation (Hz)

4. Sargam Segmentation (Normalized)

5. Sargam Y-Axis (Normalized)

(DTW Plots yet to be added)

Advanced Features - Not yet implemented

Pitch Analysis with Voice Activity Detection

def advanced_pitch_analysis(audio_file_path):
    """
    Enhanced pitch analysis with voice activity detection
    """
    sound = parselmouth.Sound(audio_file_path)
    
    # Extract pitch
    pitch = sound.to_pitch()
    
    # Extract intensity to help with voice activity detection
    intensity = sound.to_intensity()
    
    # Get arrays
    pitch_values = pitch.selected_array['frequency']
    intensity_values = intensity.values.T
    times = pitch.xs()
    
    # Simple voice activity detection based on intensity threshold
    intensity_threshold = np.percentile(intensity_values[intensity_values > 0], 25)
    voiced_frames = (pitch_values > 0) & (intensity_values.flatten() > intensity_threshold)
    
    return {
        'times': times,
        'pitch': pitch_values,
        'intensity': intensity_values.flatten(),
        'voiced': voiced_frames
    }

Statistical Analysis of Pitch Data

def calculate_pitch_statistics(results):
    """
    Calculate detailed statistics for pitch data
    """
    stats_data = []
    
    for result in results:
        if result['student_pitch'] is not None:
            # Student statistics
            student_voiced = result['student_pitch'][result['student_pitch'] > 0]
            teacher_voiced = result['teacher_pitch'][result['teacher_pitch'] > 0]
            
            stats = {
                'file_pair': f"{result['student_file']} vs {result['teacher_file']}",
                'student_mean_f0': np.mean(student_voiced) if len(student_voiced) > 0 else np.nan,
                'student_std_f0': np.std(student_voiced) if len(student_voiced) > 0 else np.nan,
                'student_range_f0': np.ptp(student_voiced) if len(student_voiced) > 0 else np.nan,
                'teacher_mean_f0': np.mean(teacher_voiced) if len(teacher_voiced) > 0 else np.nan,
                'teacher_std_f0': np.std(teacher_voiced) if len(teacher_voiced) > 0 else np.nan,
                'teacher_range_f0': np.ptp(teacher_voiced) if len(teacher_voiced) > 0 else np.nan,
                'voiced_percentage_student': len(student_voiced) / len(result['student_pitch']) * 100,
                'voiced_percentage_teacher': len(teacher_voiced) / len(result['teacher_pitch']) * 100
            }
            
            stats_data.append(stats)
    
    return pd.DataFrame(stats_data)

Troubleshooting

Common Issues and Solutions

Parselmouth Installation Issues:

# Try updating pip first
pip install --upgrade pip

# Then install parselmouth
pip install praat-parselmouth

# If that fails, try with conda
conda install -c conda-forge praat-parselmouth

Audio File Format Issues:
- Ensure audio files are in WAV format
- Check sampling rate (44.1kHz recommended)
- Verify file paths are correct

Memory Issues with Large Files:

# Process files in chunks for large datasets
def process_in_chunks(df, chunk_size=10):
    for i in range(0, len(df), chunk_size):
        chunk = df.iloc[i:i+chunk_size]
        # Process chunk
        yield chunk

Pitch Detection Parameter Tuning:

# Adjust parameters based on speaker characteristics
# For male speakers:
pitch_floor_male = 50
pitch_ceiling_male = 300

# For female speakers:
pitch_floor_female = 100
pitch_ceiling_female = 500

# For children:
pitch_floor_children = 150
pitch_ceiling_children = 800

Running the Complete System

Prepare your data structure

Install dependencies:

pip install praat-parselmouth pandas matplotlib numpy

Run the main script:
```
python pitch_extractor.py
```
Check outputs:
- Individual plots in pitch_plots/ directory
- Summary statistics in pitch_data_extracted.csv
- Console output for processing progress

This tutorial provides a comprehensive approach to pitch contour extraction and analysis using modern Python tools integrated with Praat's powerful acoustic analysis capabilities.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
__pycache__		__pycache__
mistake_plots		mistake_plots
new_dtw_plots		new_dtw_plots
output		output
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
RESULTS.csv		RESULTS.csv
dtw_all_cost_aggregation.csv		dtw_all_cost_aggregation.csv
dtw_analysis_results.csv		dtw_analysis_results.csv
dtw_final.csv		dtw_final.csv
dtw_final_with_times.csv		dtw_final_with_times.csv
dtw_pipeline_complete.py		dtw_pipeline_complete.py
evaluation.py		evaluation.py
f1score.py		f1score.py
final_note.py		final_note.py
metadata.csv		metadata.csv
metasam.csv		metasam.csv
notebookb7a475bf01.ipynb		notebookb7a475bf01.ipynb
notes_segmentation.py		notes_segmentation.py
notewise_mistakes.py		notewise_mistakes.py
output.wav		output.wav
pitch_normalised_contour_extractor.py		pitch_normalised_contour_extractor.py
pitchmistakes.py		pitchmistakes.py
sargam timeframe.py		sargam timeframe.py
silencetrimming.py		silencetrimming.py
tempCodeRunnerFile.py		tempCodeRunnerFile.py
user_interface.qml		user_interface.qml

kshitij-bhardwaj/2025_musiclearn_async

Folders and files

Latest commit

History

Repository files navigation