Skip to content

[file_packager.py] Add --modularize to file_packager.py #24737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

lkwinta
Copy link

@lkwinta lkwinta commented Jul 18, 2025

I have added modularize option flag to file_packager.py script, like in main js generator - related to issue#24504

I think it will be helpful since in modern world of react it is inconvinent to add generated files as <script/> tags.
This pull request introduces a new --modularize option to the file_packager.py tool, enabling the generation of modularized JavaScript output. It enables to import generated JS loading stub into ES6 environment. The main module which will take effect of script execution is now passed as script argument.

Use scenario:

emcc -s MODULARIZE -s EXPORT_NAME=MainModule ...
file_packager.py --modularize ...
import MainModule from 'main_module_wasm'

// two datafiles that will be loaded at runtime with module
import { default as loadDataFile_specialName } from 'data_file_preload' 
import loadDataFile from `data_file_preload2`

var preMod = {
   // i. e. locateFile
}

var mod = MainModule(preMod)
mod.then((module) =>  {
    loadDataFile(module);
    loadDataFile_specialName(module;

    // module has now loaded data file in VFS
});

closes: #24504

@lkwinta lkwinta force-pushed the file_packager_modularize branch from c4a662f to deac6ec Compare July 22, 2025 20:00
@grzanka
Copy link

grzanka commented Jul 22, 2025

@sbc100 could you please take a look ?

if options.modularize and not options.from_emcc:
ret += '''
(() => {
var real_createModule = createModule;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assume that createModule is the string passed to -sEXPORT_NAME ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about integrating that functionality like in emcc but I decided that I would like to keep it simple for now. It shouldn't be a problem I think... If it is ES6 import it could be renamed anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed that, now the function is named by export_name option

@sbc100 sbc100 changed the title [file_packager.py] Add modularize to file_packager.py [file_packager.py] Add --modularize to file_packager.py Jul 22, 2025
@sbc100
Copy link
Collaborator

sbc100 commented Jul 22, 2025

Can you explain a little more in the PR description exactly what --modularize is doing here? It looks like its generating an ES6 module from the data files, is that right? How is the generated module supposed to interact with the program into which the files are to be loaded?

@lkwinta lkwinta requested a review from sbc100 July 22, 2025 21:29
@lkwinta
Copy link
Author

lkwinta commented Jul 22, 2025

Can you explain a little more in the PR description exactly what --modularize is doing here? It looks like its generating an ES6 module from the data files, is that right? How is the generated module supposed to interact with the program into which the files are to be loaded?

I have added some use case description.

As described in the related issue we are just trying to make the dynamic loading of large necessary dependencies inside web workers run from react application. Together with XHR lazy loading this seems to be only valid solution to our problem that might solve our problem.

ret = '''
if options.modularize:
ret = '''
var createModule = (() => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be called loadDataFile line in your example?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should add a comment or add another import as a showcase. Indeed this is called, but since that is a default export the syntax to renaming is
import { default as loadDataFile } from 'data_file_preload'
instead of
import { createModule as loadDataFile } from 'data_file_preload'

but maybe indeed we should integrate it with -s EXPORT_NAME

var createModule = (() => {

return (async function(moduleArg = {}) {
var Module = moduleArg;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this extra inner function here or can we just do:

export default loadDataFile(Module) {
  ...
}

Also, I think the function doesn't need to be async because it hooks into the Module object using addRunDependency / removeRunDependency. Although if the Module is already loaded and running that might not work, so we might need to change that too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function is async because I didn't touch inner code, i. e. web requests which return promises but I agree that it might need changing.

The inner function prevents from using new keywoard, just like in emcc ES 6 export code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need the generated code here to look like the emscripten-generate program. For example, the file loading code just take a single Module argument, I think, it doesn't need the whole moduleArg dictionary thing.

Also, the emcc output no longer contains the new keyword check. See #23960.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, my bad, I was basing on emsdk release which doesn't have those changs.

What do you suggest to pass inside that function? Only Module's FS and used stuff like locate file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the generated code expected something called Module which is the whole module object? But I could be wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also call it with your module args object and it looks like it will inject a Module.preRun in that case. i.e. it works if its passed either module args or that actual module.

Copy link
Author

@lkwinta lkwinta Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the generated code expected something called Module which is the whole module object? But I could be wrong.

Exactly, generated module want's to use modules function like locateFile and FS_CreatePath. Probably it messes with modules internal state while creating files in VFS.

You mean that I could just use it like this:

var preMod = {...}

var mod = MainModule(preMod);
 
createModule(preMod); //function from fille packager generated code

Isn't it the same as calling it with mod arg since MainModule(preMod) takes preMod by reference and extends it with fields? Od that's not the case I think we can't use it like this because we need to fill in Module's (mod) virtual FS paths.

Original use case was:

<script>
var preMod = { .... }
</script>
<script src="load_wasm"/> 
<script src="load_datafile"/>
<script>
// use Module to access wasm code
</script>

since it oprrates on the same global context IT passes whole Module dictionary.

In @kripken's demo linked in emscripten docs, DOM manipulation is used to insert <script> tags at runtime to load datafile, but we can't use it because of react and WebWorkers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to use EXPORT_NAME here.. can't you just always call the function loadDataFile? Since it exported by default I don't even think it really matter what it is called.

Why not just?

export default loadDataFile(Module) {
  ...
}

Also, as I think I said already I don't think need the inner function here do you? And I don't think anything need to be async because we never return a promise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, now I see your point, I think that I have simplified it now.

I will take a look into those tests that are failing for some reason.

@lkwinta lkwinta requested a review from sbc100 July 23, 2025 11:58
@lkwinta lkwinta force-pushed the file_packager_modularize branch from 268b91a to 6d11cc1 Compare July 23, 2025 13:42
@lkwinta lkwinta force-pushed the file_packager_modularize branch from 1b74c3e to 05f9693 Compare July 23, 2025 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using standalone file_packager
3 participants