SmolAnalyst

SmolAnalyst is an AI agent that analyzes data files by generating and executing Python snippets. Built on Hugging Face's Smolagent, it primarily runs through a CLI tool but can be integrated into other environments.

What is SmolAnalyst?

SmolAnalyst is a tool that allows you to analyze data using natural language instructions. It uses Hugging Face's Smolagent to generate Python code that processes your data files within a secure, containerized environment. For security and isolation, SmolAnalyst relies on container technology to run all analysis tasks inside containers—so you'll need to have either Podman or Docker installed on your system. Smolanalyst handles the execution environment, file management, and security aspects so you can focus on your analysis tasks.

Installation

SmolAnalyst can be installed globally from PyPI using uv, a lightning-fast Python package manager:

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install SmolAnalyst globally
uv tool install smolanalyst

This will install SmolAnalyst in an isolated environment and make the smolanalyst command available globally in your terminal.

If you prefer using traditional pip:

pip install smolanalyst

Analysis Workflow

SmolAnalyst follows a secure and efficient workflow for data analysis:

Configure: Set up your LLM backend (only needed once)
```
smolanalyst configure
```
Build: Build the container image (required once after installing a new version)
```
smolanalyst build
```
Run: Execute an analysis task with your data files
```
smolanalyst run [files] -t "your analysis task"
```

How the Workflow Works

When you run an analysis with SmolAnalyst, the following happens:

Your source files are mounted as read-only in a container
A temporary directory is mounted as writable for output files
The container runs a Smolagent session with special instructions about your files
The AI agent generates and executes Python code to analyze your data
Output files are written to the temporary directory
When the analysis completes, files from the temporary directory are copied to your current working directory
If a file with the same name already exists, a timestamp is added to the filename to prevent overwriting

This containerized approach ensures that:

Your original data remains untouched
The AI agent can only access the files you explicitly provide
The execution environment is isolated from your system for security
All generated files are properly managed and accessible after analysis

Container Support

SmolAnalyst uses containerization to provide a secure, isolated environment for data analysis. It supports both Podman and Docker:

Podman: Runs containers in rootless mode, providing enhanced security
Docker: Widely available and familiar to many users

The CLI auto-detects which container engine is installed on your system. If both are available, it prefers Podman for its enhanced security features. You can also explicitly specify which engine to use with the --engine option:

smolanalyst build --engine docker
smolanalyst run data.csv --task "analyze this dataset" --engine docker

CLI Usage

Configure the LLM

smolanalyst configure

This command will prompt you to provide the following:

model type: The backend used to access the model. Use hfapi for Hugging Face Inference API or litellm for LiteLLM.
model id: The identifier of the model you want to use.
model api key: Your API key for the selected provider.
model base: The base URL of the model API (used for local deployments like Ollama).

After completing the prompts, a config.json file will be saved to your home directory with the configuration details.

Build the container image

smolanalyst build [--engine CONTAINER_ENGINE]

Options:

--engine or -e: Container engine to use ("podman" or "docker", auto-detected if not specified)

Run an analysis

smolanalyst run [FILES] [--task TASK] [--engine CONTAINER_ENGINE]

Options:

--task or -t: Description of the analysis task to perform
--engine or -e: Container engine to use ("podman" or "docker", auto-detected if not specified)

Example Configurations

Qwen2.5-Coder-32B-Instruct via Hugging Face

A high-performance model using the Hugging Face Inference API:

{
  "type": "hfapi",
  "model_id": "Qwen/Qwen2.5-Coder-32B-Instruct",
  "api_key": "secret",
  "api_base": ""
}

Gemini 2.0 Flash Lite via LiteLLM

A lightweight model with a generous free tier:

{
  "type": "litellm",
  "model_id": "gemini/gemini-2.0-flash-lite",
  "api_key": "secret",
  "api_base": ""
}

Local Qwen2.5 via Ollama

Run a model locally using Ollama:

{
  "type": "litellm",
  "model_id": "ollama/qwen2.5-coder:32b",
  "api_key": "",
  "api_base": "http://127.0.0.1:11434"
}

Running an Analysis

To start an analysis, run the following command:

smolanalyst run [files] -t "your task description"

[files] is a list of zero or more file paths. These files will be explicitly referenced in SmolAnalyst's prompt.
Use the -t option to specify a task directly. If omitted, SmolAnalyst will prompt you to describe the task.

Example Usage

Here's a practical example of using SmolAnalyst to analyze sales data:

smolanalyst run data/sales.xlsx -t "create a list of number of boxes shipped per salesman and save it in box_shipped.xlsx"

This command will:

Mount the sales.xlsx file in the container
Ask the AI agent to analyze the data and create a list of boxes shipped per salesman
Generate and save the results in a file called box_shipped.xlsx
Copy the output file to your current directory

Execution Environment Restrictions:

SmolAnalyst currently supports importing pandas and matplotlib libraries
It can write files using these libraries, but only within the container's writable directory
Files are automatically copied to your current working directory after analysis

Technical Details

SmolAnalyst uses containerization for security and isolation. You can choose between:

Podman: Provides rootless containers for enhanced security and reduced attack surface
Docker: Widely available and familiar to many users

The container includes a Python environment with essential data analysis libraries. When you run an analysis, your files are mounted in this container, and the AI agent generates and executes Python code to process your data.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
smolanalyst		smolanalyst
.gitignore		.gitignore
.python-version		.python-version
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SmolAnalyst

What is SmolAnalyst?

Installation

Analysis Workflow

How the Workflow Works

Container Support

CLI Usage

Configure the LLM

Build the container image

Run an analysis

Example Configurations

Qwen2.5-Coder-32B-Instruct via Hugging Face

Gemini 2.0 Flash Lite via LiteLLM

Local Qwen2.5 via Ollama

Running an Analysis

Example Usage

Technical Details

About

Uh oh!

Releases

Packages

Languages

License

pmall/smolanalyst

Folders and files

Latest commit

History

Repository files navigation

SmolAnalyst

What is SmolAnalyst?

Installation

Analysis Workflow

How the Workflow Works

Container Support

CLI Usage

Configure the LLM

Build the container image

Run an analysis

Example Configurations

Qwen2.5-Coder-32B-Instruct via Hugging Face

Gemini 2.0 Flash Lite via LiteLLM

Local Qwen2.5 via Ollama

Running an Analysis

Example Usage

Technical Details

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages