This project demonstrates solving a 0/1 Knapsack Problem (a classic optimization problem) using a Genetic Algorithm (GA) implemented with the PyGAD library.
You have a set of items, each with a volume (m³) and price ($).
Your goal is to maximize the total value of selected items while keeping the total volume ≤ 3.0 m³.
The dataset is stored in items.csv
with columns:
Item
: Name of the itemVolume_m3
: Volume in cubic metersPrice
: Price in dollars
This is a version of the 0/1 knapsack problem, where each item can be either taken (1
) or not taken (0
).
Formally:
Maximize (objective):
Subject to:
Where
-
$v_i$ : value of item i in dollars -
$w_i$ : volume of item i in m³ -
$C$ : capacity (3.0 m³ in this case) -
$x_i$ : decision variable (0/1)
The knapsack problem is NP-hard:
- Exact algorithms (Dynamic Programming, Integer Linear Programming) exist and are efficient for small/medium sizes.
- But for large-scale, multi-objective, or non-linear constraints, these methods become less practical.
Genetic Algorithms (GAs) are metaheuristics inspired by natural selection. They don’t guarantee optimality but can find high-quality solutions in complex search spaces.
- Chromosome: a candidate solution (binary vector representing selected items).
- Gene: each item (0 = excluded, 1 = included).
- Population: a collection of chromosomes.
- Fitness function: evaluates solution quality (total value, with penalties for overweight).
- Selection: chooses parents based on fitness.
- Crossover: combines parents to form new solutions.
- Mutation: introduces randomness to maintain diversity.
- Termination: stop after a number of generations or convergence.
In this repo:
- Fitness = total item value + small bonus for fuller capacity.
- Penalty is applied if the solution exceeds the capacity.
- PyGAD handles selection, crossover, and mutation automatically.
- Load dataset (
items.csv
) with items, volumes, and prices. - Define fitness function:
- If total volume ≤ 3.0 → reward based on total value.
- If total volume > 3.0 → subtract large penalty.
- Add a small utilization bonus for near-full truck usage.
- Run GA with PyGAD:
gene_space=[0,1]
ensures binary encoding.- Runs for 800 generations with mutation and crossover.
- Output best solution: items selected, total volume, total value, and fitness.
Here are the key parameters used in this project’s GA configuration:
num_generations=800
→ number of evolutionary cycles to run. More generations allow better convergence but take longer.sol_per_pop=60
→ population size (number of candidate solutions per generation). Larger = more diversity, but more computation.num_genes=len(names)
→ chromosome length, equal to the number of items in the dataset.
gene_space=[0, 1]
→ restricts each gene to 0/1 (don’t take / take item).gene_type=int
→ ensures genes are stored as integers.allow_duplicate_genes=True
→ irrelevant for this binary problem, but required for consistency.
fitness_func=fitness_func
→ evaluates solutions based on total value, with penalties for exceeding capacity.
num_parents_mating=20
→ number of parents selected each generation.parent_selection_type="sss"
→ “stochastic universal sampling”, a fair selection proportional to fitness.keep_parents=4
→ elitism: best solutions are carried forward unchanged.
crossover_type="two_points"
→ two crossover points swap genes between parents.mutation_type="random"
→ randomly flips genes.mutation_percent_genes=12
→ percentage of genes mutated in each offspring (≈12%).
random_seed=SEED
→ ensures results are repeatable across runs.
uv add pygad pandas matplotlib
uv run main.py
Picked items:
- Refrigerator A vol=0.751 m^3 price=$999.90
- Notebook A vol=0.004 m^3 price=$2,499.90
...
Total volume: 2.798 m^3 (capacity 3.0 m^3)
Total value: $24,082.46
Fitness: 24,307.09
.
├── items.csv # Input dataset
├── main.py # PyGAD solution
└── README.md # Project documentation
- Exact Methods (OR):
- Integer Linear Programming (ILP) can solve this problem optimally in milliseconds for small item counts.
- Guarantees optimal solution.
- Heuristic Methods (GA):
- Useful when the problem grows large, has multiple constraints, or objectives are not linear.
- Provides near-optimal solution quickly, without complex modeling.
- Flexible to adapt to new rules (e.g., category constraints, synergies).
- Add an OR baseline using PuLP/OR-Tools for comparison.
- Explore multi-objective optimization (maximize value, minimize unused volume, diversify item categories).
- Experiment with different crossover/mutation strategies in GA.
- Scale dataset with 100s or 1000s of items to test GA’s robustness.