This repo is the testbed for paper GTV: Generating Tabular Data via Vertical Federated Learning which is accepted at The 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. The demo shown here is the GTV on Loan dataset. The code contains three parts: D_2_0_G_0_2
, D_2_0_G_2_0
and Evaluation
. D_2_0_G_0_2
, D_2_0_G_2_0
are two configurations of GTV, Evaluation
contains the code for evaluation of the synthetic data quality. The training process, we use D_2_0_G_0_2
as an example, the same process applies for D_2_0_G_2_0
.
With folder D_2_0_G_0_2
, there are three folders: Server
, Client1
and Client2
, following example will show you how to run GTV with one server and two clients.
Since we use Pytorch RPC as our communication method, here are some parameters we need to provide to organize the GTV. For example, to run the demo with Loan dataset with two clients, we can first enter folder Server
and type the following command:
python3 server.py -ip x.x.x.x -rank 0 -epochs 300 -batch_size 500 -n_critic 5 -world_size 3 -dataset Loan
server.py
contains the server side code of GTV, and there is no training data in server side. There are several parameters in this command. ip
is the ip of the server. rank
indicates the rank of the current runner. It's important to set rank
as 0 for running server. epoch
is self-explained, it is the training epoch number, but be careful, the epoch here equals to the notion round
in the paper. world_size
is the number of clients plus one server. For instance, here we set world_size
to 3, it means we will have two clients. dataset
is the name of training dataset that we will use in the client side.
Under Client1
and Client2
folders, we have the script client_loan.py
. For client to join the GTV, under Client1
folder, run:
python3 client_loan.py -ip x.x.x.x -rank 1 -epochs 300 -batch_size 500 -n_critic 5 -world_size 3
And under Client2
folder, run
python3 client_loan.py -ip x.x.x.x -rank 2 -epochs 300 -batch_size 500 -n_critic 5 -world_size 3
The only difference between the parameters is the -rank
. We need to make sure each node in the GTV has its own unique rank. For N
clients, their ranks should start from 1 to N without repetitions. And the ip
is the server's ip.
-
We tested our code under Ubuntu20.04. Under
/etc/hosts
, change ip127.0.1.1
to the current ip in the subnet. We need to do that both in server and client computers. -
Also needs to allow all the connections from each other. Using following commands:
Server : sudo ufw allow from [Client IP]
Client : sudo ufw allow from [Server IP]
Be careful, even though we specify the port that RPC should use is 7788 in our code, but Pytorch RPC library open more ports in the backend. Only allow port 7788 is not enough for server and client connection.
On the Linux system where we implement our experiment, one possible error is
RuntimeError: ECONNREFUSED: connection refused
This is because we use Gloo as backend. By default, Gloo backend will try to find the right network interface to use. But sometimes it's not correct. In that case, we need to override it with:
export GLOO_SOCKET_IFNAME=eth0
eth0
is your network interface name, you may need to change it. Use ifconfig
in the terminal to check the name.
The above training process will generate data under Server/generation/
folder. The output will be different files with name starts with sample_x.csv
. For instance, for a folder named sample_299.csv
, it is the synthetic data generated by GTV at 300th epoch. Each 10 round, it will generate one folder like that.
We provided an ipython notebook Loan_Evaluation_Demo.ipynb
under Evaluation
folder. Just put the generated data into corresponding folder under Evaluation/Fake_Datasets/
and check if the real data is under Evaluation/Real_Datasets/
folder, then we can run the ipynb file to conduct the test. Machine learning utility and statistical similarity are all shown within the demo.
To cite this paper, you could use this bibtex
@inproceedings{zhao2025gtv,
title={Gtv: Generating tabular data via vertical federated learning},
author={Zhao, Zilong and Wu, Han and Van Moorsel, Aad and Chen, Lydia Y},
booktitle={2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)},
pages={33--46},
year={2025},
organization={IEEE}
}