- Genomic decomposition and reconstruction of non-tumor diploid subclones (2023)
- CLonal decomposition via Expectation-Maximization algorithm established in Non-Tumor setting
- Support multiple diploid sample
- Biallelic variants (Homo, 1/1) can degrade the performance of CLEMENT.
- python 3.6.x
- matplotlib 3.5.2
- seaborn 0.11.2
- numpy 1.21.5
- pandas 1.3.4
- scikit-learn 1.0.2
- scipy 1.7.3
- palettable 3.3.0
- 
git clone https://github.com/Yonsei-TGIL/CLEMENT.git 
 cd CLEMENT
 pip3 install .
- 
pip3 install git+https://github.com/Yonsei-TGIL/CLEMENT.git 
- pip3 install CLEMENTDNA
1.0.11 (Jan 1st, 2024)
As now of 1.0.4, CLEMENT only supports standardized TSV input. Examples of input file is shown in "example" directory.
- 1st column: mutation ID (CHR_POS is recommended)
- 2nd column: label (answer), if possible. If user don't know the label (answer), just set 0
- 3rd column: Depth1,Alt1,Depth2,Alt2....,Depth_n,Alt_n * should be comma-separated, and no space permitted
- 4th column: BQ1,BQ2....,BQ_n * should be comma-separated, and no space permitted. If absent, CLEMENT set default BQ as 20.
CLEMENT [OPTIONS]   
(Mandatory) These options are regarding User's input and output format
	--INPUT_TSV		Input data whether TSV. The tool automatically detects the number of samples
	--CLEMENT_DIR 		Directory where the outputs of CLEMENT be saved
These options are regarding downsizing User's input or not
	--RANDOM_PICK 		Set this variable to user want to downsize the sample. If user don't want to downsize, set -1. (default : -1).
These options are adjusting E-M algorithm parameter
	--NUM_CLONE_TRIAL_START 	Minimum number of expected cluster_hards (initation of K) 	(default: 3)
	--NUM_CLONE_TRIAL_END 		Maximum number of expected cluster_hards (termination of K)	 (default: 5)
	--TRIAL_NO 			Trial number in each candidate cluster_hard number. DO NOT recommend over 15 (default: 5)
	--FP_PRIOR FP_PRIOR   		Prior of false positive (FP). Recommendation : <= 0.1. (default : 0.01)
	--TN_PRIOR TN_PRIOR   		Prior of true negative (TN). Recommendation : > 0.99. (default : 0.99)
	--KMEANS_CLUSTERNO		Number of initial K-means cluster. Recommendation : 5~8 for one-sample, 8-15 for larger-sample (default: 8)
	--MIN_CLUSTER_SIZE		The minimum cluster size that is acceptable. Recommendation : 1-3% of total variants number 	(default: 9)
Other options
	--MODE			Selection of clustering method. "Hard": hard clustering only,  "Both": both hard and soft (fuzzy) clustering (default: "Both")
	--MAKEONE_STRICT  	1: strict, 2: lenient, 3: most lenient (default : 1)
	--SCORING		True : comparing with the answer set, False : just visualization (default: False)
	
Miscelleneous
	--FONT_FAMILY		Font family that displayed in the plots (default : "arial")
	--VISUALIZATION		Whether produce image in every E-M step (default: True)
	--IMAGE_FORMAT		Image format that displayed in the plots (default : jpg)
	--VERBOSE		0: no record,  1: simplified record,  2: verbose record (default: 2)
${CLEMENT_DIR}"/result"
- CLEMENT_decision CLEMENT's best recommendation among hard and soft clustering.
- CLEMENT_hard_1st CLEMENT's best decomposition by hard clustering.
- CLEMENT_hard.gapstatistics.txt Selecting the optimal K in hard clustering based on gap* stastics.
- CLEMENT_soft_1st CLEMENT's best decomposition by soft (fuzzy) clustering.
- membership.txt Membership assignment of all variants to each clusters.
- membership_count.txt Count matrix of the membership assignment to each clusters.
- mixture.txt Centroid of each clusters
DIR=[YOUR_DIRECTORY]
# Example 1
CLEMENT \
	--INPUT_TSV ${DIR}"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1/1.txt" \
	--CLEMENT_DIR ${DIR}"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1" \
  	--NUM_CLONE_TRIAL_START 1 \
	--NUM_CLONE_TRIAL_END 5 
# Example 2
CLEMENT \
	--INPUT_TSV ${DIR}"/example/2.CellData/MRS_2D/M1-8_M2-4/M1-8_M2-4_input.txt" \
	--CLEMENT_DIR ${DIR}"/example/2.CellData/MRS_2D/M1-8_M2-4"  \
	--NUM_CLONE_TRIAL_START 2 \
	--NUM_CLONE_TRIAL_END 6 \
	--RANDOM_PICK 500
goldpm1@yuhs.ac


