SmolAnalyst is an AI agent that analyzes data files by generating and executing Python snippets. Built on Hugging Face's Smolagent, it primarily runs through a CLI tool but can be integrated into other environments.
SmolAnalyst is a tool that allows you to analyze data using natural language instructions. It uses Hugging Face's Smolagent to generate Python code that processes your data files within a secure, containerized environment. For security and isolation, SmolAnalyst relies on container technology to run all analysis tasks inside containers—so you'll need to have either Podman or Docker installed on your system. Smolanalyst handles the execution environment, file management, and security aspects so you can focus on your analysis tasks.
SmolAnalyst can be installed globally from PyPI using uv, a lightning-fast Python package manager:
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install SmolAnalyst globally
uv tool install smolanalyst
This will install SmolAnalyst in an isolated environment and make the smolanalyst
command available globally in your terminal.
If you prefer using traditional pip:
pip install smolanalyst
SmolAnalyst follows a secure and efficient workflow for data analysis:
-
Configure: Set up your LLM backend (only needed once)
smolanalyst configure
-
Build: Build the container image (required once after installing a new version)
smolanalyst build
-
Run: Execute an analysis task with your data files
smolanalyst run [files] -t "your analysis task"
When you run an analysis with SmolAnalyst, the following happens:
- Your source files are mounted as read-only in a container
- A temporary directory is mounted as writable for output files
- The container runs a Smolagent session with special instructions about your files
- The AI agent generates and executes Python code to analyze your data
- Output files are written to the temporary directory
- When the analysis completes, files from the temporary directory are copied to your current working directory
- If a file with the same name already exists, a timestamp is added to the filename to prevent overwriting
This containerized approach ensures that:
- Your original data remains untouched
- The AI agent can only access the files you explicitly provide
- The execution environment is isolated from your system for security
- All generated files are properly managed and accessible after analysis
SmolAnalyst uses containerization to provide a secure, isolated environment for data analysis. It supports both Podman and Docker:
- Podman: Runs containers in rootless mode, providing enhanced security
- Docker: Widely available and familiar to many users
The CLI auto-detects which container engine is installed on your system. If both are available, it prefers Podman for its enhanced security features. You can also explicitly specify which engine to use with the --engine
option:
smolanalyst build --engine docker
smolanalyst run data.csv --task "analyze this dataset" --engine docker
smolanalyst configure
This command will prompt you to provide the following:
model type
: The backend used to access the model. Usehfapi
for Hugging Face Inference API orlitellm
for LiteLLM.model id
: The identifier of the model you want to use.model api key
: Your API key for the selected provider.model base
: The base URL of the model API (used for local deployments like Ollama).
After completing the prompts, a config.json
file will be saved to your home directory with the configuration details.
smolanalyst build [--engine CONTAINER_ENGINE]
Options:
--engine
or-e
: Container engine to use ("podman" or "docker", auto-detected if not specified)
smolanalyst run [FILES] [--task TASK] [--engine CONTAINER_ENGINE]
Options:
--task
or-t
: Description of the analysis task to perform--engine
or-e
: Container engine to use ("podman" or "docker", auto-detected if not specified)
A high-performance model using the Hugging Face Inference API:
{
"type": "hfapi",
"model_id": "Qwen/Qwen2.5-Coder-32B-Instruct",
"api_key": "secret",
"api_base": ""
}
A lightweight model with a generous free tier:
{
"type": "litellm",
"model_id": "gemini/gemini-2.0-flash-lite",
"api_key": "secret",
"api_base": ""
}
Run a model locally using Ollama:
{
"type": "litellm",
"model_id": "ollama/qwen2.5-coder:32b",
"api_key": "",
"api_base": "http://127.0.0.1:11434"
}
To start an analysis, run the following command:
smolanalyst run [files] -t "your task description"
[files]
is a list of zero or more file paths. These files will be explicitly referenced in SmolAnalyst's prompt.- Use the
-t
option to specify a task directly. If omitted, SmolAnalyst will prompt you to describe the task.
Here's a practical example of using SmolAnalyst to analyze sales data:
smolanalyst run data/sales.xlsx -t "create a list of number of boxes shipped per salesman and save it in box_shipped.xlsx"
This command will:
- Mount the sales.xlsx file in the container
- Ask the AI agent to analyze the data and create a list of boxes shipped per salesman
- Generate and save the results in a file called box_shipped.xlsx
- Copy the output file to your current directory
Execution Environment Restrictions:
- SmolAnalyst currently supports importing pandas and matplotlib libraries
- It can write files using these libraries, but only within the container's writable directory
- Files are automatically copied to your current working directory after analysis
SmolAnalyst uses containerization for security and isolation. You can choose between:
- Podman: Provides rootless containers for enhanced security and reduced attack surface
- Docker: Widely available and familiar to many users
The container includes a Python environment with essential data analysis libraries. When you run an analysis, your files are mounted in this container, and the AI agent generates and executes Python code to process your data.