-
Notifications
You must be signed in to change notification settings - Fork 427
add the forge folder #1387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add the forge folder #1387
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for adding this! I have a handful of comments, but no major concerns
pipelining_fn: PipeliningFunction | None | ||
build_optimizers_fn: OptimizersBuilder | ||
build_lr_schedulers_fn: LRSchedulersBuilder | ||
build_tokenizer_fn: TokenizerBuilder | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one we don't need, right?
float8: Float8 = field(default_factory=Float8) | ||
mx: MX = field(default_factory=MX) | ||
comm: Comm = field(default_factory=Comm) | ||
fault_tolerance: FaultTolerance = field(default_factory=FaultTolerance) | ||
experimental: Experimental = field(default_factory=Experimental) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of starting minimal I would leave most of these out for now. Lmk if you disagree though (I am also happy to come back later and add them as needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that. We probably want to keep the Comm
for debugging purpose. It is used in init_distributed
if job_config.experimental.custom_import: | ||
importlib.import_module(job_config.experimental.custom_import) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will remove
self.ft_manager = init_ft_manager(job_config) | ||
# If TorchFT is enabled, the dp_rank and dp_degree, which are used for | ||
# dataloader must be changed. | ||
if self.ft_manager.enabled: | ||
dp_degree, dp_rank = self.ft_manager.get_dp_info(dp_degree, dp_rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here.. personally I would leave this out for the initial version (again, lmk if you disagree)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will remove
self.tokenizer = tokenizer = ( | ||
self.train_spec.build_tokenizer_fn(job_config) | ||
if self.train_spec.build_tokenizer_fn is not None | ||
else None | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done downstream, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, probably makes more sense to keep for now
if job_config.checkpoint.create_seed_checkpoint: | ||
init_device = "cpu" | ||
buffer_device = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the purposes of this. Is the idea to do this in lieu of a separate checkpoint conversion script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more for debugging now. I can remove
@@ -0,0 +1,351 @@ | |||
# Copyright (c) Meta Platforms, Inc. and affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping over reviewing this file since I assume it's meant more as a demonstration and not really intended to be used directly. Lmk if it warrants a closer look though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, no need to review too thoroughly for this
depending on #1384