Skip to content

ast-fortiss-tum/near-duplicate-detection-siamese-networks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Siamese Neural Network (SSN) For Near Duplicate Detection in Web App Model Inference

πŸš€ Get Started

Follow these steps to create and activate the Conda environment to set up the project. Use snn-ndd as the password to download resources.

1) Clone the repository

git clone https://github.com/ast-fortiss-tum/near-duplicate-detection-siamese-networks.git
cd near-duplicate-detection-siamese-networks

2) Create the Conda environment

The environment.yml already pins compatible versions.

conda env create --name snn-ndd -f environment.yml

What this does

  1. Creates a new Conda env called snn-ndd.
  2. Installs all Conda‑managed packages.

3) Activate the environment

conda activate snn-ndd

πŸ’» Apps for Web Testing Research

Applications evaluated in this work

App name Version Technology Description
Addressbook 8.2.5 PHP A simple, web-based address & phone book, contact manager, organizer.
Claroline 1.11.10 PHP A collaborative learning environment, allowing teachers or education institutions to create and administer courses through the web.
Dimeshift commit 261166d Node.js, Backbone.js Expense tracker.
MantisBT 1.1.8 PHP Bug Tracking system.
MRBS 1.4.9 PHP A Meeting Room Booking System.
Pagekit 1.0.16 PHP, Vue.js A modular and lightweight CMS.
PetClinic commit 6010d5 Java Spring, Angular Demo web application for managing a veterinary clinic.
Phoenix 1.1.0 Elixir, Phoenix framework, React, Redux Trello tribute done with Elixir, Phoenix Framework, Webpack, React and Redux.
PPMA 0.6.0 PHP A PHP Password MAnager.

πŸ“‚ Repository Structure

β”œβ”€β”€ resources/                # Downloaded baseline datasets, doms, baseline runners (jar) and doc2vec embedding models
β”œβ”€β”€ scripts/                  # Python scripts for running evaluations
β”‚   β”œβ”€β”€ rq1/
β”‚   β”‚   β”œβ”€β”€ within-app-classification/
β”‚   β”‚   β”‚   β”œβ”€β”€ bert_contrastive_classification.py
β”‚   β”‚   β”‚   β”œβ”€β”€ doc2vec_contrastive_classification.py
β”‚   β”‚   β”‚   └── markuplm_contrastive_classification.py
β”‚   β”‚   β”œβ”€β”€ across-app-classification/
β”‚   β”‚   β”‚   └── [...same naming as above...]
β”‚   β”‚   └── baseline-classification/
β”‚   β”‚       └── run_baseline.py
β”‚   β”œβ”€β”€ rq2/                   # RQ2 evaluation scripts
β”‚   β”œβ”€β”€ rq3/                   # RQ3 evaluation scripts
β”‚   └── rq4/                   # RQ4 evaluation scripts
β”œβ”€β”€ results/                  # Generated experiment results
β”‚   β”œβ”€β”€ rq1/
β”‚   β”œβ”€β”€ rq2/
β”‚   β”œβ”€β”€ rq3/
β”‚   └── rq4/
β”œβ”€β”€ baseline/                 # Baseline results
β”œβ”€β”€ models/                   # Saved trained models
β”œβ”€β”€ embedding/                # Cached embedding files
└── README.md                 

πŸ–₯️ Generate Reported Results

1) RQ1: Near-Duplicate Detection

a) SNN Evaluation

  1. Download resources resources.zip from (use password password as snn-ndd):

    https://syncandshare.lrz.de/getlink/fiY4KcSXCwSLW8g8TVmnep/resources.zip
    
    1. Unpack and move the resources folder to the base project

      unzip resources.zip -d resources

      You should now have folders:

      dataset
      scripts
      resources/
      β”œβ”€β”€ baseline-dataset/
      β”œβ”€β”€ baseline-runner/
      β”œβ”€β”€ doms/
      └── embedding-models/
      
    2. Run the evaluation

      python scripts/rq1/<evaluation_setting>-app-classification/<embedding_type>_contrastive_classification.py
      • <evaluation_setting>:

        • within
        • across
      • <embedding_type>:

        • bert (adjust the variant inside the script for ModernBERT)
        • doc2vec
        • markuplm
    3. Outputs

      • Experiment results β†’ results/rq1/
      • Trained models β†’ models/
      • Cached embeddings β†’ embedding/

Re-runs will automatically reuse any existing models or embeddings; nothing is re-trained if already present. If you want to use trained models and embeddings, please obtain them from the link below and ensure they are located at the project parent level separately (./models, ./embeddings). Then try generating the results.

https://syncandshare.lrz.de/getlink/fi14MEw1swySBLPA6emQKP/models_and_embeddings.zip

b) FragGen Evaluation

Run the evaluation from the project base directory

python scripts/rq1/baseline-classification/within_app_fraggen.py

c) Other Baseline Methods - (WEBEMBED, RTED, PDIFF) Evaluation

Run the evaluation from the project base directory

python scripts/rq1/baseline-classification/<evaluation_setting>_app_baseline.py
  • <evaluation_setting>:

    • within
    • across

2) RQ2: Model Quality

a) SNN Evaluation

Run below command from the project base directory. Results will be saved in results/rq2/

python scripts/rq2/model-quality-snn.py

b) FRAGGEN Evaluation

Run below command from the project base directory. Results will be saved in results/rq2/

python scripts/rq2/fraggen.py

c) WEBEMBED Evaluation

Run below commands from the project base directory one after the other. The first scripts generates intermediate results files in resources/csv_results_table/. The second script generates the RQ2 results and save in results/rq2/

python scripts/rq2/a-webembed.py
python scripts/rq2/b-webembed.py

d) Other Baseline (RTED and PDIFF) Evaluation

Run below commands from the project base directory one after the other. The first scripts generates intermediate results files in resources/csv_results_table/. The second script generates the RQ2 results and save in results/rq2/

python scripts/rq2/a-other-baselines.py
python scripts/rq2/b-other-baselines.py

3) RQ3: Code Coverage

To integrate the state abstraction function during crawling, you want to follow the implementation of the state vertex and adjust the endpoint for your Flask app. When the crawler needs to evaluate two states are, it can simply make a POST request to the endpoint with the necessary data, which allows the Flask app to function as the State Abstraction Function (SAF) and return the results you're looking for. Run below command to up flask app.

python scripts/rq3/saf-snn.py # SNN methods
python scripts/rq3/saf-other-baseline.py # WEBEMBED methods
python scripts/rq3/saf-other-baseline.py # RTED and PDIFF

Use this example for directly evaluating the FRAGGEN method without external SAF.

Code coverage varies depending on the application type. For PHP applications, server-side code coverage is measured, while JavaScript applications focus on client-side coverage. For crawling, we used Crawljax (5.2.4-SNAPSHOT). But it has compatibility considerations with certain versions of the Chrome driver. Alternative browsers can be used (Firefox for example) if specified in the crawl configuration. For evaluating FraGen is only possible with chrome hence we had to downgrade Chrome ( 114.0.5735.90)


In PHP applications, code coverage measurement is conducted using the Xdebug 2.2.42 extension alongside the php-code-coverage 2.2.33 library. Both crawling and code coverage measurement occur simultaneously. It is important to start code coverage before starting the crawl and to stop it after crawling completion to generate the coverage report.

Refer PHP Code Coverage README for more details.


For JavaScript applications, the process operates separately. Crawljax was executed for Dante, which generates corresponding Selenium test cases. Once crawling is complete, Dante can utilize the generated test case files to produce JUnit test cases. For measuring coverage in JavaScript applications, cdp4j 3.0.81, a Java implementation of Chrome DevTools, was used.

Refer JavaScript Code Coverage README for more details.

4) RQ4: Time Efficiency

a) SNN Training Times

SNN model training times are already recorded when you perform RQ1. In the results Excel file, you can find the training time for each model.

b) Crawling Times

After crawling the application in RQ3, a crawling report is generated. You can find the crawling time in the result.json file under statistics β†’ duration.

c) Inference Times

Inference time is measured using 1,000 randomly selected pairs from our dataset.

  • RTED and PDiff: Compute the respective distances with the Crawljax implementation, then add the classifier inference time to obtain the total inference time.
  • FragGen: Use only the Crawljax classification time as the method’s inference time.

The JAR file resources/baseline-runner/BaseLineRunner-1.0-SNAPSHOT.jarβ€”generated from the Java project linked belowβ€”is used to obtain distance-calculation times for RTED and PDiff, and total inference time for FragGen.

Baseline Runner project files

https://syncandshare.lrz.de/getlink/fiT5bUvN5DJfJ5uZ8JxgzC/baseline-runner.zip

SNN evaluation

Run below commands from the project base directory. Results will be saved in results/rq4/

python scripts/rq4/snn_inference_time.py

WEBEMBED evaluation

Run below commands from the project base directory. Results will be saved in results/rq4/

   python scripts/rq4/webembed_baseline_inference_time.py

Other baselines (FragGen, RTED, PDiff) evaluation

Run below commands from the project base directory. Results will be saved in results/rq4/

   python scripts/rq4/javabased_baseline_inference_time.py

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages