Skip to content

Delay relocation of data/bss init after MEMC init is completed #68888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

talih0
Copy link
Contributor

@talih0 talih0 commented Feb 12, 2024

Currently, when relocating to external SRAM regions via MEMC drivers, the relocation/zero initialization is done before the MEMC drivers are initialized.

See issue #68884

This patch delays initialization after PRE_KERNEL_1 level.
The initialization is only delayed on devicetree memory regions with delay-relocation property set.

Note, MEMC drivers (i.e. Atmel SAM SMC) use POST_KERNEL level, so their initialization levels would have to be updated.

Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>
@talih0
Copy link
Contributor Author

talih0 commented Feb 12, 2024

@pdgendt, @nandojve, would you mind having a look please, as I believe it also applies to Atmel SAM SMC drivers.

@erwango
Copy link
Member

erwango commented Apr 8, 2024

I guess this would fix #43555

@cfriedt
Copy link
Member

cfriedt commented Apr 9, 2024

I guess this would fix #43555

I had a similar issue #39598 a couple of years ago. That would also be fixed by this (with the addition of a memc driver)

@erwango
Copy link
Member

erwango commented Apr 11, 2024

@talih0 Can you fix compliance issue reported by automated checks ?

Comment on lines +869 to +870
depends on ARCH_HAS_CODE_DATA_RELOCATION
depends on CODE_DATA_RELOCATION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CODE_DATA_RELOCATION already depends on ARCH_HAS_CODE_DATA_RELOCATION, no need to add it again.

Suggested change
depends on ARCH_HAS_CODE_DATA_RELOCATION
depends on CODE_DATA_RELOCATION
depends on CODE_DATA_RELOCATION

@MaureenHelm MaureenHelm requested review from danieldegrasse and removed request for decsny May 16, 2024 15:27
@danieldegrasse
Copy link
Contributor

I wonder if we should have some form of device_is_ready check done before the relocation calls are made- since the code relocation script now has access to the devicetree, perhaps there is some way we can verify the MEMC device defining the memory region was initialized correctly?

The reason I ask is that if we simply blindly copy data into memory regions that may be set up by MEMC devices, we run the risk of crashing the chip. I have seen this with iMX.RT parts (which use external SDRAM setup by the bootloader). In this case, the chip will simply crash at initialization since the SRAM region is not accessible.

I'm not sure how we would best handle this (perhaps simply triggering a fault). I just would like to avoid the case where an external memory device fails to initialize, and the application simply crashes rather than printing some form of error message

@swift-tk
Copy link
Contributor

Is cache handled at this point? I see most of cache enables are in the level of PRE_KERNEL_1. The SoCs may have a default cacheable memory map for those regions, so it is best to perform a whole cache clean operation at the end of copy.

@swift-tk
Copy link
Contributor

There is another issue I just thought about. Since stimer init call is in INIT_LEVEL_PRE_KERNEL_2, if MEMC drivers have to be initialized in INIT_LEVEL_PRE_KERNEL_1, this means that the device init can not use any timer related kernel function like k_sleep, k_busy_wait.

Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Jul 17, 2024
@github-actions github-actions bot closed this Aug 1, 2024
@jhol
Copy link
Contributor

jhol commented Apr 2, 2025

There is another issue I just thought about. Since stimer init call is in INIT_LEVEL_PRE_KERNEL_2, if MEMC drivers have to be initialized in INIT_LEVEL_PRE_KERNEL_1, this means that the device init can not use any timer related kernel function like k_sleep, k_busy_wait.

I transition my memc driver to use k_busy_wait everywhere. This was necessary because otherwise it is not possible to use the SDRAM in a CONFIG_MULTITHREADING=n application.

In my opinion, this should be the policy across all the memc drivers. I can't think of anything useful the system could do with the time, because there's not going to be any other thread the memc initialization could yield to.

In which case your PR would be a welcome improvement to Zephyr IMO.

Comment on lines +618 to +625
if os.path.isfile(args.edt_pickle):
with open(args.edt_pickle, 'rb') as f:
edt = pickle.load(f)
for node in edt.compat2nodes["zephyr,memory-region"]:
delayed_relocate[node.props["zephyr,memory-region"].val] = node.props["delay-relocation"].val

else:
edt = None
Copy link
Contributor

@jhol jhol Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want to handle the file not existing, it would be better instead to handle FileNotFoundError exceptions from open() so that there is no change of a race condition between the check and the use of the file.

But I think it's wrong to set edt = None if the file is not found. It should be set to None only if the args.edt_pickle is not None.

If the --edt_pickle option is given but the file fails to open, then it would probably best to just let the exception go out unhandled.

@erwango
Copy link
Member

erwango commented Apr 3, 2025

In which case your PR would be a welcome improvement to Zephyr IMO.

@talih0 Any plans to resurect this ? Otherwise @jhol would you be ready to taking it over ?

@jhol
Copy link
Contributor

jhol commented Apr 3, 2025

I did some work on amending the patch yesterday, and it does indeed solve the problem now. I have an SDRAM that is initialized POST_KERNEL, and the revocations are performed immediately after entering this run-level.

The main issue outstanding with the patch from my perspective is the way it reads from the EDT pickle. This design seems a little ugly to me. I would expect the information to have to somehow feed through Kconfig and CMake instead.

@jhol
Copy link
Contributor

jhol commented Apr 4, 2025

One issue I encountered doing relocations POST_KERNEL, is that certain static initialization features fail. For example, K_HEAP_DEFINE will fail because it will attempt to initialize a heap before the data has been loaded.

@talih0
Copy link
Contributor Author

talih0 commented Apr 4, 2025

@jh

In which case your PR would be a welcome improvement to Zephyr IMO.

@talih0 Any plans to resurect this ? Otherwise @jhol would you be ready to taking it over ?

@jhol, feel free to take over the patch. Thanks!

@jhol
Copy link
Contributor

jhol commented Apr 4, 2025

Happy to take it over, though it will be a few weeks before I have time to finish it off

@github-actions github-actions bot removed the Stale label Apr 5, 2025
@jhol
Copy link
Contributor

jhol commented Apr 11, 2025

One issue I encountered doing relocations POST_KERNEL, is that certain static initialization features fail. For example, K_HEAP_DEFINE will fail because it will attempt to initialize a heap before the data has been loaded.

Same problem applies to K_MEM_SLAB_DEFINE_STATIC. init_mem_slab_obj_core_list is called at PRE_KERNEL_1 initialization. The function uses STRUCT_SECTION_FOREACH to initialize memory slabs that may be not yet be available, and so causes a crash in early boot.

@VynDragon
Copy link
Contributor

Looking forward to this 👍

@jhol
Copy link
Contributor

jhol commented May 26, 2025

Looking forward to this 👍

I'll publish my latest attempt at the patch, but I'm not sure how we get around some of these issues. In the current version we can delay relocation until after memc device initialization. In practice though, the developer needs to ensure that none of the relocated memory is used before relocation occurs otherwise there will be early-boot crashes - this includes many device drivers, memory allocators and other system initialization code. In the current structure of Zephyr, this is extremely difficult and error-prone to ensure, and makes the patch rather useless.

So for my project, I abandoned the approach and instead divided my application into a Zephyr-powered bootloader which loads a main Zephyr application into RAM.

@jhol
Copy link
Contributor

jhol commented May 26, 2025

Note I've been doing some other work on the relocation code:

@VynDragon
Copy link
Contributor

Having the ability to delay the relocation is enough and not useless at all, it enables using the memc-setup memory for main app code at least, and other things as well if the user setups their priorities correctly, it's a very important capability.

@jhol
Copy link
Contributor

jhol commented May 29, 2025

Here is my current version of the patch: b3df801

The main complaint about it is that parsing the EDT pickle in gen_relocate_app.py is a rather sketchy design. I'm not sure how this could be handled better. I'm wondering if it would be better to replace the delay-relocation device-tree property with a Kconfig-driven approach.

Having the ability to delay the relocation is enough and not useless at all, it enables using the memc-setup memory for main app code at least, and other things as well if the user setups their priorities correctly, it's a very important capability.

It's certainly a capability many would like to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants