Code implementation of ACL paper "Stepwise Reasoning Disruption Attack of LLMs"
This component modifies original questions while preserving their semantic meaning.
python QuestionModification.py \
--llm_name <model_name> \
--dataset <dataset_name> \
--few_shot <True/False> \
Generates CoT solutions for modified questions.
python GetSolutionofQuestionModified.py \
--llm_name <model_name> \
--dataset <dataset_name> \
--few_shot <True/False> \
Performs the SEED-P attack by introducing prior reasoning steps of the modified question.
python SEEDpAttack.py \
--llm_name <model_name> \
--dataset <dataset_name> \
--ratio <float> \
--few_shot <True/False> \
Evaluation the Accuracy and Attack Success Rate.
Step1: Run baseline (no attack) for comparison:
python SEEDpAttack.py \
--llm_name <model_name> \
--dataset <dataset_name> \
--ratio 0.0 \
--few_shot <True/False> \
Step 2: Compute ASR and Accuracy:
python Evaluation.py
--llm_name <model_name> \
--dataset <dataset_name> \
--ratio <float> \
--few_shot <True/False> \