Skip to content
Change the repository type filter

All

    Repositories list

    • SafeAuto

      Public
      [ICML 2025] SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
      Python
      0610Updated Jul 17, 2025Jul 17, 2025
    • RedCode

      Public
      [NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents
      Python
      64410Updated Jul 11, 2025Jul 11, 2025
    • UDora

      Public
      [ICML 2025] UDora: A Unified Red Teaming Framework against LLM Agents
      Python
      4910Updated Jun 24, 2025Jun 24, 2025
    • PolyGuard

      Public
      Python
      0520Updated Jun 18, 2025Jun 18, 2025
    • AdvAgent

      Public
      Jupyter Notebook
      01220Updated May 28, 2025May 28, 2025
    • [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
      Python
      1913441Updated Apr 12, 2025Apr 12, 2025
    • MMDT

      Public
      Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
      Jupyter Notebook
      12210Updated Mar 15, 2025Mar 15, 2025
    • aug-pe

      Public
      [ICML 2024 Spotlight] Differentially Private Synthetic Data via Foundation Model APIs 2: Text
      Python
      104210Updated Jan 11, 2025Jan 11, 2025
    • FedGame

      Public
      Official implementation for paper "FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning" (NeurIPS 2023).
      Python
      01310Updated Oct 25, 2024Oct 25, 2024
    • VFL-ADMM

      Public
      Improving Privacy-Preserving Vertical Federated Learning by Efficient Communication with ADMM (SaTML 2024)
      Python
      1300Updated Oct 21, 2024Oct 21, 2024
    • A Comprehensive Assessment of Trustworthiness in GPT Models
      Python
      59299132Updated Sep 16, 2024Sep 16, 2024
    • helm

      Public
      Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).
      Python
      318002Updated Jun 12, 2024Jun 12, 2024
    • [CCS 2023] Unraveling the Connections between Privacy and Certified Robustness in Federated Learning Against Poisoning Attacks
      Python
      0600Updated Feb 15, 2024Feb 15, 2024
    • hf-blog

      Public
      Public repo for HF blog posts
      Jupyter Notebook
      894000Updated Jan 26, 2024Jan 26, 2024
    • Python
      0000Updated Dec 25, 2023Dec 25, 2023
    • TextGuard

      Public
      TextGuard: Provable Defense against Backdoor Attacks on Text Classification
      Python
      11200Updated Nov 7, 2023Nov 7, 2023
    • InfoBERT

      Public
      [ICLR 2021] "InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective" by Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu
      Python
      88510Updated Oct 25, 2023Oct 25, 2023
    • RAB: Provable Robustness Against Backdoor Attacks
      Python
      63925Updated Oct 3, 2023Oct 3, 2023
    • [CCS 2021] TSS: Transformation-specific smoothing for robustness certification
      Roff
      32605Updated Oct 3, 2023Oct 3, 2023
    • Federated Learning Framework Benchmark (UniFed)
      Python
      54950Updated Jun 14, 2023Jun 14, 2023
    • SecretGen

      Public
      A general model inversion attack against large pre-trained models.
      Python
      2400Updated Apr 22, 2023Apr 22, 2023
    • [NeurIPS 2021] "Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models" by Boxin Wang*, Chejian Xu*, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li.
      Python
      21200Updated Apr 3, 2023Apr 3, 2023
    • VeriGauge

      Public
      A united toolbox for running major robustness verification approaches for DNNs. [S&P 2023]
      C
      79034Updated Mar 24, 2023Mar 24, 2023
    • [NeurIPS 2022] Code for Certifying Some Distributional Fairness with Subpopulation Decomposition
      Python
      0500Updated Jan 3, 2023Jan 3, 2023
    • CoPur

      Public
      CoPur: Certifiably Robust Collaborative Inference via Feature Purification (NeurIPS 2022)
      Python
      11000Updated Dec 7, 2022Dec 7, 2022
    • Python
      0000Updated Dec 6, 2022Dec 6, 2022
    • DMLW2022

      Public
      HTML
      1100Updated Dec 3, 2022Dec 3, 2022
    • This repo keeps track of popular provable training and verification approaches towards robust neural networks, including leaderboards on popular datasets and paper categorization.
      109800Updated Oct 18, 2022Oct 18, 2022
    • Python
      0600Updated Oct 11, 2022Oct 11, 2022
    • CROP

      Public
      [ICLR 2022] CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing
      Python
      2810Updated Jun 16, 2022Jun 16, 2022