Test the Best Hybrid Partition Generated by Hierarchical Community Detection Methods with k-NN Sparsification

🎯 Project Goal

This project aims to induce the CLUS-FRAMEWORK or RANDOM FOREST in the best hybrid partition to enhance multilabel classification.

📌 How to Cite

@misc{Gatto2025,
  author = {Gatto, E. C.},
  title = {Test Hybrid Partitions using Communities Detection Methods for Multilabel Classification},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/cissagatto/TcpKnnH}}
}

🔑 Main Features

Hybrid partition testing for multilabel classification
Utilizes Hierarchical Community Detection methods
k-NN sparsification applied in clustering
Compatible with CLUS-FRAMEWORK and Random Forest classifiers
Supports multiple multilabel datasets

📊 Flowchart

Coming soon...

⚙️ Code Structure

R Folder (Main Scripts):

libraries.R → Loads required libraries
utils.R → Helper functions & preprocessing utilities
run.R → Main script execution
run-rf.R → Runs the Random Forest classifier
validateMaF1.R → validates the hybrid partitions with Macro-F1 criteria
validateSilho.R → validates the hybrid partitions with Silhouette Coeficient criteria
testMaF1.R → clus: test the best hybrid partition chosen with Macro-F1 criteria
testSilho.R → clus: test the best hybrid partition chosen with Silhoutte Coeficient criteria
test-asoc.R → random forests: test the best hybrid partition chosen with Silhouette Coeficient criteria

Examples Folder:

tcp.R → Runs the experiment
config-files.R → Configuration file template

🛠️ Preparing Your Experiment

Step 1: Dataset Setup

A file named datasets-original.csv must be placed in the project root. This file contains metadata about 90 multilabel datasets. To use a custom dataset, include it in this file with the following structure:

Parameter	Status	Description
Id	Mandatory	Unique integer identifier for the dataset
Name	Mandatory	Dataset name (follow benchmark naming conventions)
Domain	Optional	Dataset domain
Instances	Mandatory	Total number of instances
Attributes	Mandatory	Total number of attributes
Labels	Mandatory	Total number of labels
Inputs	Mandatory	Number of input attributes
Cardinality	Optional	Cardinality value
Density	Optional	Density value
Max.freq	Optional	Maximum frequency
Mean.IR	Optional	Mean imbalance ratio
AttStart	Mandatory	Column index where attributes begin
AttEnd	Mandatory	Column index where attributes end
LabelStart	Mandatory	Column index where labels begin
LabelEnd	Mandatory	Column index where labels end
xn	Mandatory	X dimension of Kohonen map
yn	Mandatory	Y dimension of Kohonen map
gridn	Mandatory	X * Y value (must be square)
max.neighbors	Mandatory	Maximum number of neighbors (Labels - 1)

📖 Click here for a detailed explanation of these properties.

Step 2: Cross-Validation Files

The experiment requires X-Fold Cross-Validation files in tar.gz format.
Download pre-generated 10-fold cross-validation files for multiple datasets here.
For a new dataset, add it to datasets-original.csv and generate cross-validation files using this repository.
The tar.gz file can be stored in any directory, with its absolute path set in the configuration file.

Step 3: Install Dependencies

Ensure Java, Python, and R dependencies are installed manually. This project does not provide automatic installation.

Recommended: Use the Conda Environment:
```
conda env create -f AmbienteTeste.yaml
```
📖 More on Conda Environments
Alternatively, use AppTainer Containers for SLURM cluster execution. Tutorial (Portuguese).

Step 4: Configuration File Setup

Create a CSV file with the following structure:

Config	Value
Dataset_Path	Absolute path to dataset tar.gz
Temporary_Path	Path for temporary processing ¹
Partitions_Path	Path to partition files
Validation	"Silhouette", "Macro-F1", etc.
Similarity	"jaccard", "rogers", etc.
Classifier	"clus" or "random-forests"
Dataset_Name	Name from datasets-original.csv
Number_Dataset	ID from datasets-original.csv
Number_Folds	Cross-validation folds
Number_Cores	Number of CPU cores to use
R_clone	1 = Upload results to cloud, 0 otherwise
Save_csv_files	1 = Save CSV files

📌 ¹ Use directories like /dev/shm, /tmp, or /scratch.

Step 5: Generate Partitions

To obtain partitions, use this repository.

📥 Download partitions here.

🖥️ Software Requirements

RStudio Version: 1.4.1106 (Ubuntu Bionic)
R Language Version: 4.1.0 ("Camp Pontanezen")

💻 Hardware Requirements

Parallel execution is highly recommended.
In our experiments, we used 10 cores.
Tested on Ubuntu 20.04.2 LTS (Focal Fossa) with an Intel Core i7-10750H processor.

▶️ Running the Experiment

Open a terminal, navigate to ~/TcpKnnH/examples, and execute:

Rscript tcp.R [absolute_path_to_config_file]

Example:

Rscript tcp.R "~/TcpKnnH/config-files/jaccard/Silhouette/random-forests/jsrf-emotions.csv"

📥 Download Results

[Click here]

🏆 Acknowledgments

This study was funded by:

CAPES (Finance Code 001)
CNPQ (Process Number 200371/2022-3)
FAPESP

📬 Contact

📧 Elaine Cecília Gatto – elainececiliagatto@gmail.com

🔗 Useful Links

Website | LinkedIn | GitHub | [YouTube](https://www

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Datasets		Datasets
Partitions		Partitions
Python		Python
R		R
Reports		Reports
config-files		config-files
docs		docs
examples		examples
man		man
utils		utils
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
TcpKnnH.Rproj		TcpKnnH.Rproj
datasets-original.csv		datasets-original.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Test the Best Hybrid Partition Generated by Hierarchical Community Detection Methods with k-NN Sparsification

🎯 Project Goal

📌 How to Cite

🔑 Main Features

📊 Flowchart

⚙️ Code Structure

R Folder (Main Scripts):

Examples Folder:

🛠️ Preparing Your Experiment

Step 1: Dataset Setup

Step 2: Cross-Validation Files

Step 3: Install Dependencies

Step 4: Configuration File Setup

Step 5: Generate Partitions

🖥️ Software Requirements

💻 Hardware Requirements

▶️ Running the Experiment

📥 Download Results

🏆 Acknowledgments

📬 Contact

🔗 Useful Links

About

Uh oh!

Releases

Packages

Languages

License

cissagatto/TcpKnnH

Folders and files

Latest commit

History

Repository files navigation

Test the Best Hybrid Partition Generated by Hierarchical Community Detection Methods with k-NN Sparsification

🎯 Project Goal

📌 How to Cite

🔑 Main Features

📊 Flowchart

⚙️ Code Structure

R Folder (Main Scripts):

Examples Folder:

🛠️ Preparing Your Experiment

Step 1: Dataset Setup

Step 2: Cross-Validation Files

Step 3: Install Dependencies

Step 4: Configuration File Setup

Step 5: Generate Partitions

🖥️ Software Requirements

💻 Hardware Requirements

▶️ Running the Experiment

📥 Download Results

🏆 Acknowledgments

📬 Contact

🔗 Useful Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages