This project explores the implementation of Role-Based Access Control (RBAC) in Large Language Models (LLMs). The objective is to train an LLM on an organization's data while ensuring that users only access information they are permitted to see. By using an "Impersonation" token, the LLM can summarize data according to the user's permissions, guaranteeing data confidentiality and integrity.
- Objective: Gain a foundational understanding of RBAC principles.
- Key Topics:
- Basic concepts of RBAC (Roles, Permissions, Users, and Role Assignments) for Windows and *NIX devices.
- How RBAC is implemented in software systems.
- Resources:
- Objective: Familiarize yourself with LLMs, their architecture, and training processes.
- Key Topics:
- Overview of GPT-2 and other transformer-based models.
- How LLMs are trained and fine-tuned.
- Resources:
- Objective: Explore how to integrate RBAC into the LLM training process.
- Key Topics:
- Creating and managing "Impersonation" tokens.
- Training an LLM on company data with RBAC considerations.
- Techniques for ensuring data privacy and access control.
- Resources:
- Objective: Test the implementation and validate the effectiveness of RBAC in your LLM.
- Key Topics:
- Evaluating the model's adherence to RBAC policies.
- Measuring the performance and accuracy of the LLM in summarizing data.
- Resources:
data/
: Contains the dataset used for training and evaluation.results/
: Contains trained models and evaluation metrics./
: Contains primary ipynotebook
Contributions are welcome! Please follow the standard GitHub workflow for contributions: fork the repository, make your changes, and create a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.