Virtual Screening

This repository contains Python scripts for ML based preprocessing for virtual screening:

Standard.py

Standardize.py provides functions for standardization, scaling, feature selection, and PCA preprocessing techniques.

Functions Provided:

standard_preprocessing(data): Performs standardization (Z-score normalization) on the input data.
maxmin_preprocessing(data): Performs min-max scaling on the input data.
correlation_feature_selection(data, r_threshold): Performs feature selection based on correlation with the output variable.
decision_tree_feature_selection(data, threshold): Performs feature selection using a decision tree model.
PCA_preprocessing(data, com_number): Performs Principal Component Analysis (PCA) for dimensionality reduction.
RFE_feature_selection(data, n_features_to_select): Performs Recursive Feature Elimination (RFE) using a decision tree model.

Example Usage:

import pandas as pd
from Standard import *

# Load your data (assuming 'data' is your DataFrame)
df = pd.read_csv("./your_data.csv")
df = df.set_index('Name')

# Perform preprocessing pipeline
processed_data = preprocessing_pipeline(
    data=df, 
    pipeline=['standardization', 'correlation'], 
    r_threshold=0.3
)
print(processed_data)

KNN_impute.py

KNN_impute.py is intended for K-Nearest Neighbors (KNN) imputation of missing values in datasets.

Functions Provided:

-- KNN_DataFill(data): implements KNN for filling missing values in a dataset.

Usage:

import pandas as pd
from KNN_DataFill import KNN_DataFill

# Load your data
df = pd.read_csv("./your_data.csv")
df = df.set_index('Name')

# Example usage for filling missing values
missing_value_per_col = round(len(df.columns) * 0.05)
col_dict = {}

for col in df.columns:
    inter_df = intersection_cols(df.drop(columns=[col]))
    inter_df[col] = df[col]
    add_missing_df = add_missing(inter_df, col, missing_value_per_col)
    
    predictor = KNN_DataFill(
        add_missing_df, 
        col, 
        set_PCA_dist=True, 
        threshold=0.6,
        n_components=500
    )
    predictor.fit(k=10)
    filled_df = predictor.predict()
    rmse = predictor.evaluation()
    col_dict[col] = rmse

print(col_dict)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CNSData(940).xlsx		CNSData(940).xlsx
CNS_output.csv		CNS_output.csv
KNN_imput.py		KNN_imput.py
README.md		README.md
Standardize.py		Standardize.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Virtual Screening

Standard.py

Functions Provided:

Example Usage:

KNN_impute.py

Functions Provided:

Usage:

License

About

Uh oh!

Releases

Packages

Languages

garh305111/VirtualScreening

Folders and files

Latest commit

History

Repository files navigation

Virtual Screening

Standard.py

Functions Provided:

Example Usage:

KNN_impute.py

Functions Provided:

Usage:

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages