-
Notifications
You must be signed in to change notification settings - Fork 1
fix(deps): update dependency pytorch-lightning to v2 [security] #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
renovate
wants to merge
1
commit into
main
Choose a base branch
from
renovate/pypi-pytorch-lightning-vulnerability
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
^1.6.0->^2.0.0GitHub Vulnerability Alerts
CVE-2024-8019
In lightning-ai/pytorch-lightning version 2.3.2, a vulnerability exists in the
LightningAppwhen running on a Windows host. The vulnerability occurs at the/api/v1/upload_file/endpoint, allowing an attacker to write or overwrite arbitrary files by providing a crafted filename. This can lead to potential remote code execution (RCE) by overwriting critical files or placing malicious files in sensitive locations.Release Notes
Lightning-AI/lightning (pytorch-lightning)
v2.4.0: Lightning v2.4Compare Source
Lightning AI ⚡ is excited to announce the release of Lightning 2.4. This is mainly a compatibility upgrade for PyTorch 2.4 and Python 3.12, with a sprinkle of a few features and bug fixes.
Did you know? The Lightning philosophy extends beyond a boilerplate-free deep learning framework: We've been hard at work bringing you Lightning Studio. Code together, prototype, train, deploy, host AI web apps. All from your browser, with zero setup.
Changes
PyTorch Lightning
Added
dump_statsflag toAdvancedProfiler(#19703)verboseto theseed_everything()function (#20108)TQDMProgressBarnow provides an option to retain prior training epoch bars (#19578)ModelSummarytable (#20159)Changed
.fit(),.evaluate(),.test()or.predict()now terminates all processes launched by the Trainer and exits the program (#19976)seed_everything(..., workers=True)(#20055)Removed
Fixed
class_pathandinit_argssince this would be a breaking change (#20068)seed_everything()(#20108)_LoggerConnector's_ResultMetricto move all registered keys to the device of the logged value if needed (#19814)_optimizer_to_devicelogic for special 'step' key in optimizer state causing performance regression (#20019)ModelSummarywhen model has distributed parameters (DTensor) (#20163)Lightning Fabric
Added
verboseto theseed_everything()function (#20108)Changed
seed_everything(..., workers=True)(#20055)Removed
Fixed
_lazy_load()function (#20121)_optimizer_to_devicelogic for special 'step' key in optimizer state causing performance regression (#20019)Full commit list: 2.3.0 -> 2.4.0
Contributors
We thank all our contributors who submitted pull requests for features, bug fixes and documentation updates.
New Contributors
Did you know?
Chuck Norris can solve NP-hard problems in polynomial time. In fact, any problem is easy when Chuck Norris solves it.
v2.3.3: Patch release v2.3.3Compare Source
This release removes the code from the main
lightningpackage that was reported in CVE-2024-5980.v2.3.2: Patch release v2.3.2Compare Source
Includes a minor bugfix that avoids a conflict with the entrypoint command with another package #20041.
v2.3.1: Patch release v2.3.1Compare Source
Includes minor bugfixes and stability improvements.
Full Changelog: Lightning-AI/pytorch-lightning@2.3.0...2.3.1
v2.3.0: Lightning v2.3: Tensor Parallelism and 2D ParallelismCompare Source
Lightning AI is excited to announce the release of Lightning 2.3 ⚡
Did you know? The Lightning philosophy extends beyond a boilerplate-free deep learning framework: We've been hard at work bringing you Lightning Studio. Code together, prototype, train, deploy, host AI web apps. All from your browser, with zero setup.
This release introduces experimental support for Tensor Parallelism and 2D Parallelism, PyTorch 2.3 support, and several bugfixes and stability improvements.
Highlights
Tensor Parallelism (beta)
Tensor parallelism (TP) is a technique that splits up the computation of selected layers across GPUs to save memory and speed up distributed models. To enable TP as well as other forms of parallelism, we introduce a
ModelParallelStrategyfor both Lightning Trainer and Fabric. Under the hood, TP is enabled through new experimental PyTorch APIs like DTensor andtorch.distributed.tensor.parallel.PyTorch Lightning
Enabling TP in a model with PyTorch Lightning requires you to implement the
LightningModule.configure_model()method where you convert selected layers of a model to paralellized layers. This is an advanced feature, because it requires a deep understanding of the model architecture. Open the tutorial Studio to learn the basics of Tensor Parallelism.Full training example (requires at least 2 GPUs).
Lightning Fabric
Applying TP in a model with Fabric requires you to implement a special function where you convert selected layers of a model to paralellized layers. This is an advanced feature, because it requires a deep understanding of the model architecture. Open the tutorial Studio to learn the basics of Tensor Parallelism.
Full training example (requires at least 2 GPUs).
2D Parallelism (beta)
Tensor Parallelism by itself can be very effective for efficient inference of very large models. For training, TP is typically combined with other forms of parallelism, such as FSDP, to increase throughput and scalability on large clusters with 100s of GPUs. The new
ModelParallelStrategyin this release supports the combination of TP + FSDP, which is referred to as 2D parallelism.For an introduction to this feature, please also refer to the tutorial Studios (PyTorch Lightning, Lightning Fabric). At the moment, the PyTorch team is reimplementing FSDP under the name FSDP2 with the aim to make it compose well with other parallelisms such as TP. Therefore, for the experimental 2D parallelism support, you'll need to switch to using FSDP2 with the new
ModelParallelStrategy. Please refer to our docs (PyTorch Lightning, Lightning Fabric) and stay tuned for future releases as these APIs mature.Training Mode in Model Summary
The model summary table that gets displayed when you run
Trainer.fit()now contains a new column "Mode" that shows the training mode each layer is in (#19468).A module in PyTorch is always either in
train(default) orevalmode.This improvement should give users more visibility into the state of their model and help debug issues, for example when you need to make sure certain layers of the model are frozen.
Special Forward Methods in Fabric
Until now, Lightning Fabric warned the user in case the forward pass of the model or a subset of its modules was conducted through methods other than the dedicated
forwardmethod of the PyTorch module. The reason for this is that PyTorch needs to run special hooks in case of DDP/FSDP and other strategies to function properly, and not running through the realforwardmethod would skip these hooks and lead to correctness issues.In Lightning Fabric 2.3, we added a feature to explicitly mark alternative forward methods so that Fabric can add the necessary rerouting behind the scenes:
Find the full example and more details in our docs.
Notable Changes
The 2.0 series of Lightning releases guarantees core API stability: No name changes, argument renaming, hook removals etc. on core interfaces (Trainer, LightningModule, etc.) unless a feature is specifically marked experimental. Here we list a few behavioral changes made in places where the change was justified if it significantly improves the user experience, improves performance, or fixes the correctness of a feature. These changes will likely not impact most users.
Skipping the training step in DDP
It is no longer allowed to skip
training_step()by returningNonein distributed training (#19918). The following usage was previously possible but would result in unpredictable hangs and timeouts in distributed training:We decided to raise an error if the user attempts to return
Nonewhen running in a multi-GPU setting.Miscellaneous Changes
prepare_data()hook inLightningModuleandLightningDataModuleis now subject to a barrier without timeout to avoid long-running tasks to be interrupted (#19448). Similarly, also in Fabric theFabric.rank_zero_firstcontext manager now uses an infinite barrier (#19448).CHANGELOG
PyTorch Lightning
Added
ModelSummaryandRichModelSummarycallbacks now display the training mode of each layer in the column "Mode" (#19468)load_from_checkpointsupport forLightningCLIwhen using dependency injection (#18105)on_exceptionhook toLightningDataModule(#19601)ModelParallelStrategyto support 2D parallelism (#19878, #19888)torch.distributed.destroy_process_groupin atexit handler if process group needs destruction (#19931)FSDPStrategy(device_mesh=...)argument (#19504)Changed
prepare_data()hook inLightningModuleandLightningDataModuleis now subject to a barrier without timeout to avoid long-running tasks to be interrupted (#19448)drop_lastfor prediction (#19678)training_step()by returningNonein distributed training (#19918)Removed
Trainer(strategy="bagua")) (#19445)Fixed
WandbLogger.log_hyperparameters()raising an error if hyperparameters are not JSON serializable (#19769)ModelCheckpoint(save_last=...)argument (#19808)epoch_loop.restartingto avoid full validation run afterLearningRateFinder(#19818)Lightning Fabric
Added
fabric consolidatein the new CLI (#19560)_FabricModule.mark_forward_method()(#19690)ModelParallelStrategyto support 2D parallelism (#19846, #19852, #19870, #19872)torch.distributed.destroy_process_groupin atexit handler if process group needs destruction (#19931)FSDPStrategy(device_mesh=...)argument (#19504)Changed
lightning run modeltofabric run(#19442, #19527)Fabric.rank_zero_firstcontext manager now uses a barrier without timeout to avoid long-running tasks to be interrupted (#19448)fabric.backward()when it is needed by the strategy or precision selection (#19447, #19493)_BackwardSyncControlcan now control what to do when gradient accumulation is disabled (#19577)Removed
Fixed
Full commit list: 2.2.0 -> 2.3.0
Contributors
We thank all our contributors who submitted pull requests for features, bug fixes and documentation updates.
New Contributors
Did you know?
Chuck Norris is a big fan and daily user of Lightning Studio.
v2.2.5: Patch release v2.2.5Compare Source
PyTorch Lightning + Fabric
Fixed
Full Changelog: Lightning-AI/pytorch-lightning@2.2.4...2.2.5
v2.2.4: Patch release v2.2.4Compare Source
App
Fixed
PyTorch
No Changes.
Fabric
No Changes.
Full Changelog: Lightning-AI/pytorch-lightning@2.2.3...2.2.4
v2.2.3: Patch release v2.2.3Compare Source
PyTorch
Fixed
WandbLogger.log_hyperparameters()raising an error if hyperparameters are not JSON serializable (#19769)Fabric
No Changes.
Full Changelog: Lightning-AI/pytorch-lightning@2.2.2...2.2.3
v2.2.2: Patch release v2.2.2Compare Source
PyTorch
Fixed
torch.compileas a decorator (#19627)save_weights_only=True(#19524)Fabric
Fixed
torch.compileas a decorator (#19627)Fabric.setup()when using FSDP (#19755)Full Changelog: Lightning-AI/pytorch-lightning@2.2.1...2.2.2
Contributors
@ankitgola005 @awaelchli @Borda @carmocca @dmitsf @dvoytan-spark @fnhirwa
v2.2.1: Patch release v2.2.1Compare Source
PyTorch
Fixed
Trainer.accumulate_grad_batchesandTrainer.log_every_n_stepsin ThroughputMonitor (#19470)Fabric
Fixed
Full Changelog: Lightning-AI/pytorch-lightning@2.2.0post...2.2.1
Contributors
@Raalsky @awaelchli @carmocca @Borda
If we forgot someone due to not matching commit email with GitHub account, let us know :]
v2.2.0: Lightning v2.2Compare Source
Lightning AI is excited to announce the release of Lightning 2.2 ⚡
Did you know? The Lightning philosophy extends beyond a boilerplate-free deep learning framework: We've been hard at work bringing you Lightning Studio. Code together, prototype, train, deploy, host AI web apps. All from your browser, with zero setup.
While our previous release was packed with many big new features, this time around we're rolling out mainly improvements based on feedback from the community. And of course, as the name implies, this release fully supports the latest PyTorch 2.2 🎉
Highlights
Monitoring Throughput
Lightning now has built-in utilities to measure throughput metrics such as batches/sec, samples/sec and Model FLOP Utilization (MFU) (#18848).
Trainer:
For the Trainer, this comes in form of a
ThroughputMonitorcallback. In order to track samples/sec, you need to provide a function to tell the monitor how to extract the batch dimension from your input. Furthermore, if you want to track MFU, you can provide a sample forward pass and theThroughputMonitorwill automatically estimate the utilization based on the hardware you are running on:The results get automatically sent to the logger if one is configured on the Trainer.
Fabric:
For Fabric, the
ThroughputMonitoris a simple utility object on which you call.update()andcompute_and_log()during the training loop:Check out our TinyLlama LLM pretraining script for a full example using Fabric's
ThroughputMonitor.The troughput utilities can report:
Improved Handling of Evaluation Mode
When you train a model and have validation enabled, the Trainer automatically calls
.eval()when transitioning to the validation loop, and.train()when validation ends. Until now, this had the unfortunate side effect that any submodules in your LightningModule that were in evaluation mode get reset to train mode. In Lightning 2.2, the Trainer now captures the mode of every submodule before switching to validation, and restores the mode the modules were in when validation ends (#18951, #18951, #18951). This improvement will help users avoid silent correctness bugs and removes boilerplate code for managing frozen layers.If you have overridden any of the
LightningModule.on_{validation,test,predict}_model_{eval,train}hooks, they will still get called and execute your custom logic, but they are no longer required if you added them to preserve the eval mode of frozen modules.Converting FSDP Checkpoints
In the previous release, we introduced distributed checkpointing with FSDP to speed up saving and loading checkpoints for big models. These checkpoints are in a special format saved in a folder with shards from each GPU in a separate file. While these checkpoints can be loaded back with Lightning Trainer or Fabric very easily, they aren't easy to load or process externally. In Lightning 2.2, we introduced a CLI utility that lets you consolidate the checkpoint folder to a single file that can be loaded in raw PyTorch with
torch.load()for example (#19213).Given you saved a distributed checkpoint, you can then convert it like so:
Read more about distributed checkpointing in our documentation: Trainer, Fabric.
Improvements to Compiling DDP/FSDP in Fabric
PyTorch 2.0+ introduced
torch.compile, a powerful tool to speed up your models without changing the code.We now added a comprehensive guide how to use
torch.compilecorrectly with tips and tricks to help you troubleshoot common issues. On top of that,Fabric.setup()will now reapplytorch.compileon top of DDP/FSDP if you are enabling these strategies (#19280).You might see fewer graph breaks, but there won't be any significant speed-ups with this. We introduced this mainly to make Fabric ready for future improvements from PyTorch to optimizing distributed operations.
Saving and Loading DataLoader State
If you use a dataloader/iterable that implements the
.state_dict()and.load_state_dict()interface, the Trainer will now automatically save and load their state in the checkpoint (#19361).Note that the standard PyTorch DataLoader does not support this stateful interface. This feature only works on loaders that implement these two methods. A dataloader that supports full fault-tolerance will be included in our upcoming release of Lightning Data - a library to optimize data preprocessing and streaming in the cloud. Stay tuned!
Non-strict Checkpoint Loading in Trainer
A feature that has been requested for a long time by the community is non-strict checkpoint loading. By default, a checkpoint in PyTorch is loaded with
strict=Trueto ensure all keys in the saved checkpoint match what's in the model's state dict.However, in some use cases it might make sense to exclude certain weights from being included in the checkpoint. When resuming training, the user would then be required to set
strict=False, which wasn't configurable until now.You can now set the attribute
strict_loading=Falseon your LightningModule if you want to allow loading partial checkpoints (#19404).Full documentation here.
Notable Changes
The 2.0 series of Lightning releases guarantees core API stability: No name changes, argument renaming, hook removals etc. on core interfaces (Trainer, LightningModule, etc.) unless a feature is specifically marked experimental. Here we list a few behavioral changes made in places where the change was justified if it significantly improves the user experience, improves performance, or fixes the correctness of a feature. These changes will likely not impact most users.
ModelCheckpoint's save-last Feature
In Lightn
Configuration
📅 Schedule: Branch creation - "" in timezone Asia/Tokyo, Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.