GitHub - CSCfi/smart-meeting-agent

This repository contains example scripts and data to experiment with a Smart Meeting Agent pipeline for speech recognition and downstream text processing.

📁 About the Data

The data/ folder includes two .zip files. Each zip archive contains:

.flac files – short audio segments (a few seconds each) .txt files – transcripts corresponding to each audio file

📚 Source

The data is sourced from LibriSpeech, a public-domain corpus of read speech derived from LibriVox audiobooks. LibriSpeech is designed to support research and development in automatic speech recognition (ASR).

For more information related to the dataset and its subset, please visit: https://www.openslr.org/12

Please note that the example code can be run smoothly and it will generate text for the audio.

🚀 Prerequisites

To run this starter code, you will need:

Python 3.11 or higher.

module load  biopythontools/11.3.0_3.10.6_1.76

🔧 Installation

Clone this repository
Create an environment and install dependencies. Dependencies can be listed either in requirements.txt (for local implementation) or in env.yml (for Puhti implementation).

For local usage, you can either use python virtual environment or conda environment. Example below is for python virtual environment.

  python3 -m venv .venv
  source .venv/bin/activate
  pip install -r requirements.txt

For usage on Puhti, it is recommended to use Tykky container wrapper.

module purge
module load tykky
mkdir <install_dir>
conda-containerize new --prefix <install_dir> env.yml
export PATH="<install_dir>/bin:$PATH"

🛠️ What the Python Scripts Do

➡️ transcriber.py: Defines a Transcriber class that uses the Faster-Whisper ASR model to convert audio files (.flac, .wav, etc.) into plain English text. You can specify model size (e.g., "base", "small") and language.

➡️ main.py: Runs the full pipeline: takes an audio file path as input, uses the Transcriber to generate a transcript, and prints the transcript to the console. Example usage:

python3 main.py audio.flac

🧑‍🏫 What you have to-do?

Surprise us by being creative in creating ai-agent that can listen to audio chats and convert it to text and also summarize the meeting. For simplicity, a starter code that convert audio into text has been provided for you. The dataset contains both audio and text so you can do more adventures.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

💼 Variables to tune

Feel free to change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
transcriber.py		transcriber.py

License

CSCfi/smart-meeting-agent

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages