Skip to content

add the forge folder #1387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

add the forge folder #1387

wants to merge 1 commit into from

Conversation

tianyu-l
Copy link
Contributor

depending on #1384

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 13, 2025
@tianyu-l tianyu-l requested a review from ebsmothers July 13, 2025 07:21
Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for adding this! I have a handful of comments, but no major concerns

pipelining_fn: PipeliningFunction | None
build_optimizers_fn: OptimizersBuilder
build_lr_schedulers_fn: LRSchedulersBuilder
build_tokenizer_fn: TokenizerBuilder | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one we don't need, right?

Comment on lines +37 to +41
float8: Float8 = field(default_factory=Float8)
mx: MX = field(default_factory=MX)
comm: Comm = field(default_factory=Comm)
fault_tolerance: FaultTolerance = field(default_factory=FaultTolerance)
experimental: Experimental = field(default_factory=Experimental)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of starting minimal I would leave most of these out for now. Lmk if you disagree though (I am also happy to come back later and add them as needed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. We probably want to keep the Comm for debugging purpose. It is used in init_distributed

Comment on lines +69 to +70
if job_config.experimental.custom_import:
importlib.import_module(job_config.experimental.custom_import)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will remove

Comment on lines +98 to +102
self.ft_manager = init_ft_manager(job_config)
# If TorchFT is enabled, the dp_rank and dp_degree, which are used for
# dataloader must be changed.
if self.ft_manager.enabled:
dp_degree, dp_rank = self.ft_manager.get_dp_info(dp_degree, dp_rank)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here.. personally I would leave this out for the initial version (again, lmk if you disagree)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will remove

Comment on lines +122 to +126
self.tokenizer = tokenizer = (
self.train_spec.build_tokenizer_fn(job_config)
if self.train_spec.build_tokenizer_fn is not None
else None
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done downstream, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, probably makes more sense to keep for now

Comment on lines +149 to +151
if job_config.checkpoint.create_seed_checkpoint:
init_device = "cpu"
buffer_device = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the purposes of this. Is the idea to do this in lieu of a separate checkpoint conversion script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more for debugging now. I can remove

@@ -0,0 +1,351 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping over reviewing this file since I assume it's meant more as a demonstration and not really intended to be used directly. Lmk if it warrants a closer look though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, no need to review too thoroughly for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants