Skip to content

Commit 4195921

Browse files
committed
edit introduction
1 parent 7a6cd73 commit 4195921

17 files changed

+31318
-30942
lines changed

Udeneev2025Surrogate.pdf

181 KB
Binary file not shown.

code/data_generator.ipynb

Lines changed: 11 additions & 10 deletions
Large diffs are not rendered by default.

code/dataset/arch_dicts.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

code/results/model_10_results.json

Lines changed: 3418 additions & 3418 deletions
Large diffs are not rendered by default.

code/results/model_1_results.json

Lines changed: 2845 additions & 2845 deletions
Large diffs are not rendered by default.

code/results/model_2_results.json

Lines changed: 2838 additions & 2838 deletions
Large diffs are not rendered by default.

code/results/model_3_results.json

Lines changed: 3029 additions & 3029 deletions
Large diffs are not rendered by default.

code/results/model_4_results.json

Lines changed: 3349 additions & 3349 deletions
Large diffs are not rendered by default.

code/results/model_5_results.json

Lines changed: 3463 additions & 3463 deletions
Large diffs are not rendered by default.

code/results/model_6_results.json

Lines changed: 2903 additions & 2903 deletions
Large diffs are not rendered by default.

code/results/model_7_results.json

Lines changed: 2964 additions & 2964 deletions
Large diffs are not rendered by default.

code/results/model_8_results.json

Lines changed: 2753 additions & 2753 deletions
Large diffs are not rendered by default.

code/results/model_9_results.json

Lines changed: 3158 additions & 3158 deletions
Large diffs are not rendered by default.

code/train_models.ipynb

Lines changed: 234 additions & 204 deletions
Large diffs are not rendered by default.

code/train_models.py

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
import os
2+
import json
3+
import numpy as np
4+
import torch
5+
import nni
6+
from torch.utils.data import SubsetRandomSampler
7+
from torchvision import transforms
8+
from torchvision.datasets import CIFAR10
9+
from nni.nas.evaluator.pytorch import DataLoader, Classification
10+
from nni.nas.hub.pytorch import DARTS as DartsSpace
11+
from nni.nas.space import model_context
12+
from tqdm import tqdm
13+
from IPython.display import clear_output
14+
15+
ARCHITECTURES_PATH = "dataset/arch_dicts.json"
16+
MAX_EPOCHS = 50
17+
LEARNING_RATE = 1e-3
18+
BATCH_SIZE = 256
19+
CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124]
20+
CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
21+
22+
23+
def load_arch_dicts(json_path):
24+
"""
25+
Загружает словари архитектур из JSON файла.
26+
27+
Аргументы:
28+
json_path (str): Путь к JSON файлу, содержащему словари архитектур.
29+
30+
Возвращает:
31+
dict: Словарь, содержащий конфигурации архитектур.
32+
"""
33+
with open(json_path, "r") as f:
34+
arch_dicts = json.load(f)
35+
return arch_dicts
36+
37+
38+
def get_data_loaders(batch_size=512):
39+
"""
40+
Возвращает загрузчики данных для обучения и валидации.
41+
42+
Параметры:
43+
batch_size (int): Размер батча для загрузчиков данных. По умолчанию 1024.
44+
45+
Возвращает:
46+
tuple: Кортеж, содержащий два объекта DataLoader:
47+
- search_train_loader: Загрузчик данных для обучения.
48+
- search_valid_loader: Загрузчик данных для валидации.
49+
"""
50+
transform = transforms.Compose(
51+
[
52+
transforms.RandomCrop(32, padding=4),
53+
transforms.RandomHorizontalFlip(),
54+
transforms.ToTensor(),
55+
transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
56+
]
57+
)
58+
59+
train_data = nni.trace(CIFAR10)(
60+
root="./data", train=True, download=True, transform=transform
61+
)
62+
num_samples = len(train_data)
63+
indices = np.random.permutation(num_samples)
64+
split = num_samples // 2
65+
66+
search_train_loader = DataLoader(
67+
train_data,
68+
batch_size=batch_size,
69+
num_workers=6,
70+
sampler=SubsetRandomSampler(indices[:split]),
71+
)
72+
73+
search_valid_loader = DataLoader(
74+
train_data,
75+
batch_size=batch_size,
76+
num_workers=6,
77+
sampler=SubsetRandomSampler(indices[split:]),
78+
)
79+
80+
return search_train_loader, search_valid_loader
81+
82+
83+
def train_model(
84+
architecture, train_loader, valid_loader, max_epochs=10, learning_rate=1e-3
85+
):
86+
"""
87+
Обучает модель на основе заданной архитектуры и данных.
88+
Параметры:
89+
architecture (str): Архитектура модели, которая будет использоваться.
90+
train_loader (DataLoader): DataLoader для обучающих данных.
91+
valid_loader (DataLoader): DataLoader для валидационных данных.
92+
max_epochs (int, необязательно): Максимальное количество эпох для обучения. По умолчанию 10.
93+
learning_rate (float, необязательно): Скорость обучения. По умолчанию 1e-3.
94+
Возвращает:
95+
model: Обученная модель.
96+
"""
97+
with model_context(architecture):
98+
model = DartsSpace(width=16, num_cells=3, dataset="cifar")
99+
100+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
101+
if torch.cuda.device_count() > 1:
102+
model = torch.nn.DataParallel(model) # Enable multi-GPU training
103+
104+
model.to(device)
105+
106+
evaluator = Classification(
107+
learning_rate=learning_rate,
108+
weight_decay=1e-4,
109+
train_dataloaders=train_loader,
110+
val_dataloaders=valid_loader,
111+
max_epochs=max_epochs,
112+
num_classes=10,
113+
export_onnx=False, # Disable ONNX export for this experiment
114+
fast_dev_run=False, # Should be false for fully training
115+
)
116+
117+
evaluator.fit(model)
118+
return model
119+
120+
def evaluate_and_save_results(
121+
models, architectures, batch_size=512, num_workers=6, folder_name="results"
122+
):
123+
"""
124+
Оценивает модели на тестовом наборе данных CIFAR-10 и сохраняет результаты в файлы JSON.
125+
Аргументы:
126+
models (list): Список обученных моделей.
127+
architectures (list): Список архитектур моделей.
128+
batch_size (int, необязательно): Размер батча для загрузчика данных. По умолчанию 1024.
129+
num_workers (int, необязательно): Количество потоков для загрузчика данных. По умолчанию 6.
130+
folder_name (str, необязательно): Имя папки для сохранения результатов. По умолчанию "results".
131+
Исключения:
132+
ValueError: Если количество моделей и архитектур не совпадает.
133+
Результаты:
134+
Для каждой модели создается файл JSON с результатами, содержащий:
135+
- architecture: Архитектура модели.
136+
- test_predictions: Предсказания модели на тестовом наборе данных.
137+
- test_accuracy: Точность модели на тестовом наборе данных.
138+
"""
139+
if len(models) != len(architectures):
140+
raise ValueError("Количество моделей и архитектур должно совпадать")
141+
142+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
143+
os.makedirs(folder_name, exist_ok=True)
144+
145+
transform = transforms.Compose(
146+
[
147+
transforms.ToTensor(),
148+
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261)),
149+
]
150+
)
151+
test_dataset = CIFAR10(
152+
root="./data", train=False, download=True, transform=transform
153+
)
154+
test_loader = DataLoader(
155+
test_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers
156+
)
157+
158+
for i, (model, architecture) in enumerate(zip(models, architectures)):
159+
model.to(device)
160+
model.eval()
161+
162+
test_correct = 0
163+
test_total = 0
164+
test_preds = []
165+
166+
with torch.no_grad():
167+
for images, labels in test_loader:
168+
images, labels = images.to(device), labels.to(device)
169+
outputs = model(images)
170+
_, predicted = torch.max(outputs, 1)
171+
test_preds.extend(predicted.cpu().tolist())
172+
test_correct += (predicted == labels).sum().item()
173+
test_total += labels.size(0)
174+
175+
test_accuracy = test_correct / test_total
176+
177+
result = {
178+
"architecture": architecture,
179+
"test_predictions": test_preds,
180+
"test_accuracy": test_accuracy,
181+
}
182+
183+
file_name = f"model_{i+1}_results.json"
184+
file_path = os.path.join(folder_name, file_name)
185+
186+
with open(file_path, "w") as f:
187+
json.dump(result, f, indent=4)
188+
189+
print(f"Results for model_{i + 1} saved to {file_path}")
190+
191+
192+
if __name__ == "__main__":
193+
arch_dicts = load_arch_dicts(ARCHITECTURES_PATH) # Загружаем словари архитектур
194+
search_train_loader, search_valid_loader = get_data_loaders(
195+
batch_size=BATCH_SIZE
196+
) # Получаем загрузчики CIFAR10
197+
198+
models = []
199+
architectures = []
200+
for architecture in tqdm(arch_dicts):
201+
model = train_model( # Обучаем модель
202+
architecture,
203+
search_train_loader,
204+
search_valid_loader,
205+
max_epochs=MAX_EPOCHS,
206+
learning_rate=LEARNING_RATE,
207+
)
208+
models.append(model)
209+
architectures.append(architecture)
210+
clear_output(wait=True)
211+
212+
evaluate_and_save_results(
213+
models, architectures, batch_size=BATCH_SIZE
214+
) # Оцениваем и сохраняем архитектуры, предсказания на тестовом наборе CIFAR10 и accuracy

paper/main.tex

Lines changed: 26 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,9 @@
9494
%\dedication{...}
9595

9696
\abstract{
97-
The automated search for optimal neural network architectures is a challenging computational problem, and Neural Ensemble Search (NES) is even more complex. In this work, we propose a surrogate-based approach to estimate ensemble diversity. Neural architectures are represented as graphs, and their predictions on a dataset serve as training data for the surrogate function. Using this method, we develop an efficient NES framework that enables the selection of diverse and high-performing architectures. The resulting ensemble achieves superior predictive accuracy on CIFAR-10 compared to other one-shot NES methods, demonstrating the effectiveness of our approach.
97+
The automated search for optimal neural network architectures (NAS) is a challenging computational problem, and Neural Ensemble Search (NES) is even more complex. In this work, we propose a surrogate-based approach for ensebmle creation. Neural architectures are represented as graphs, and their predictions on a dataset serve as training data for the surrogate function. Using this function, we develop an efficient NES framework that enables the selection of diverse and high-performing architectures. The resulting ensemble achieves superior predictive accuracy on CIFAR-10 compared to other one-shot NES methods, demonstrating the effectiveness of our approach.
9898
}
99+
99100
\keywords{NES, GCN, triplet loss, surrogate function.}
100101

101102

@@ -104,19 +105,37 @@
104105

105106
\section{Introduction}
106107

107-
Neural network ensembles often demonstrate better generalization ability compared to single models, especially in classification and regression tasks \cite{E_Ren_2016, Hansen1990}. However, the key factor for a successful ensemble is not only the number of models but also their architectural diversity and ability to complement each other. Selecting an optimal architecture for even a single model is a challenging task, particularly when considering data-specific constraints and computational limitations \cite{B_Swarup_2023}.
108+
Neural network ensembles often demonstrate better accuracy compared to single models, especially in classification and regression tasks \cite{E_Ren_2016, Hansen1990}. This fact gives rise to the problem of constructing an efficient ensemble of models (NES) \cite{Zaidi2021}. NES, in turn, relies on Neural Architecture Search (NAS) methods, which are extensively studied and applied to search for individual neural network architectures, such as evolutionary algorithms \cite{real2017large, real2019regularized}, reinforcement learning \cite{Zoph2017, xie2018snas, Liu2023}, and Bayesian optimization \cite{jin2019auto, kandasamy2018neural}. Selecting an optimal architecture for even a single model is a challenging task, particularly when considering data-specific constraints and computational limitations \cite{B_Swarup_2023}.
108109

109-
One approach to automating ensemble construction is Neural Ensemble Search (NES) \cite{Zaidi2021}, which aims to find the optimal combination of neural networks. NES, in turn, relies on Neural Architecture Search (NAS) methods, which are extensively studied and applied to search for individual neural network architectures \cite{Zoph2017, Baeck2018, Liu2023}. Unlike traditional NAS, which focuses on finding a single model, NES is designed to efficiently combine multiple networks into an ensemble.
110+
The simplest approach for ensemble construction is the use of DeepEns \cite{lakshminarayanan2017simple}, implemented through DARTS \cite{Liu2018}. It involves a random search for several architectures, which are then combined into an ensemble. Despite its simplicity in implementation and hyperparameter tuning, this method is computationally expensive. More sophisticated adaptation techniques are presented in some recent works \cite{pmlr-v180-shu22a, Zaidi2021, O_Chen_2021}, which are designed to efficiently combine multiple networks into an ensemble.
110111

111-
Modern NAS methods widely use surrogate functions to estimate architecture quality without requiring full model training \cite{Lu2022, Lu2020}. These functions significantly reduce computational costs, which is particularly important when searching for an optimal ensemble. For example, in \cite{Lu2022}, evolutionary algorithms were proposed in combination with surrogate models.
112+
Our research also adapts ideas from NAS for NES, specifically using a surrogate function. Some modern NAS methods widely use surrogate functions to estimate architecture quality without requiring full model training \cite{Lu2022, Lu2020, Calisto2021}. These functions significantly reduce computational costs, expanding the applicability of such methods. For example, in \cite{Lu2022}, evolutionary algorithms were proposed in combination with surrogate models for real-time semantic segmentation. In \cite{Calisto2021}, a Surrogate-assisted Multiobjective Evolutionary-based Algorithm (SaMEA) is used for 3D medical image segmentation.
112113

113-
In this work, we propose a method for constructing neural network ensembles using a surrogate function that accounts for both model classification accuracy and architectural diversity. Diversity is crucial because ensembles consisting of similar models often fail to provide a significant performance gain. To achieve this, we encode architectures and their predictions on the CIFAR-10 dataset into a latent space \cite{S_Xue_2024}. Based on the encoded dataset, we train a Graph Convolutional Network (GCN) \cite{Kipf2017}. We claim that ensembles constructed in this manner achieve higher accuracy compared to one-shot models, such as DARTS \cite{Liu2018}, or single models.
114+
In this work, we propose a method for constructing neural network ensembles using a surrogate function that accounts for both model classification accuracy and architectural diversity. Diversity is crucial because ensembles consisting of similar models often fail to provide a significant performance gain. The surrogate function is used to encode the architecture into a latent space \cite{S_Xue_2024}, which reflects both the diversity and predictive ability of the architectures. Since a neural network architecture is represented as a graph, using a Graph Neural Network (GNN) \cite{Kipf2017} as a surrogate function \cite{wen2020neural} seems natural. To train it to predict model diversity, we use Triplet Loss \cite{schroff2015facenet}, similar to \cite{S_Xue_2024}. We validate this approach on CIFAR-10, demonstrating the effectiveness of the surrogate function for predicting diversity and constructing ensembles. We claim that ensembles constructed in this manner achieve state-of-the-art accuracy compared to one-shot NES algorithms, such as DeepEns \cite{lakshminarayanan2017simple}.
114115

115116
Main Contributions:
116117

117-
1) We adapt surrogate functions for ensemble construction, taking into account both predictive performance and architectural diversity.
118+
1) We propose a method for encoding the DARTS \cite{Liu2018} search space into a representation suitable for training a Graph Neural Network (GNN), where graph nodes correspond to operations within the network.
119+
120+
2) We propose a way for training the surrogate function to predict the diversity of architectures.
121+
122+
3) We adapt surrogate functions for ensemble construction, taking into account both predictive performance and architectural diversity.
123+
124+
125+
\section{Problem statement}
126+
127+
\subsection{Neural Architecture Search}
128+
129+
Let $\mathcal{V} = {1, \dots, N}$ be the set of vertices, where $N$ is the number of vertices, and let $\mathcal{E} = \{(i, j) \in \mathcal{V} \times \mathcal{V} \mid i < j \}$ be the set of edges connecting them. Furthermore, let $\mathcal{O}$ denote the set of possible operations between vertices (e.g., pooling, convolutions, etc.).For
130+
each edge there is an operation $o \in \mathcal{O}$ that transits information from one node
131+
to another. The neural architecture search (NAS) problem can be formulated as finding an operation $o^{(i, j)} \in \mathcal{O}$ for each edge $(i, j)$.
132+
133+
Consider $\alpha \in \mathcal{A}$ as a parameter vector representing the operations assigned to edges. Then, the NAS problem can be formulated as:
134+
135+
\begin{equation} \begin{aligned} & \min_{\alpha \in \mathcal{A}} \mathcal{L}_{val}(\omega^*{\alpha}, \alpha) \\ & \text{s.t.} \quad \omega^*_{\alpha} = \arg \min_{\omega \in \mathcal{W}} \mathcal{L}_{train}(\omega, \alpha) \end{aligned} \label{eq:nas_problem} \end{equation}
136+
137+
where $\mathcal{W}$ is the set of all possible weights associated with operations for all potential edges in the architecture. The main challenge is the vast architecture search space (e.g., in DARTS \cite{Liu2018}, it is approximately $10^{25}$).
118138

119-
2) We propose a method for encoding the DARTS search space into a representation suitable for training a Graph Convolutional Network (GCN), where graph nodes correspond to operations within the network.
120139

121140
\bibliographystyle{unsrtnat}
122141

0 commit comments

Comments
 (0)