Skip to content

storage: disk_access: add disk_access_erase #75802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

JordanYates
Copy link
Contributor

@JordanYates JordanYates commented Jul 12, 2024

Add the disk_access_erase function to complement the read and write commands.

Erasing through dedicated API's is often exposed as a more efficient operation by hardware than writing 0x00 or 0xFF manually to every byte to be erased.

Note: I suspect the NVME driver can support this through NVME_OPC_WRITE_ZEROES but I am unsure how to implement that.

Copy link
Contributor

@danieldegrasse danieldegrasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the idea here, but the core difference between the disk APIs and flash APIs has always been that disk devices will not offer an explicit erase command, and flash devices will.

While it is more efficient, or even required, to erase a flash device with a specific command disk devices do not have the same restrictions

Where are you running into a case that you need to write 0xFF to the disk device to erase it? Is this a case where you are using a flash device emulator layer on top of a disk device?

@JordanYates
Copy link
Contributor Author

I get the idea here, but the core difference between the disk APIs and flash APIs has always been that disk devices will not offer an explicit erase command, and flash devices will.

While it is more efficient, or even required, to erase a flash device with a specific command disk devices do not have the same restrictions

Where are you running into a case that you need to write 0xFF to the disk device to erase it? Is this a case where you are using a flash device emulator layer on top of a disk device?

I am using SD cards that are too large for the flash API together with an exFAT filesystem. Before creating the filesystem I wish to erase the entire SD card to ensure the entire memory range is in a known state at the start. The reason I require the known state is because I am using the exFAT f_expand functionality to pre-allocate large binary files.

The existing disk APIs provide no way to achieve this as far as I can tell.

As a side bonus it also makes resetting the state of disks between tests much easier.

@JordanYates JordanYates force-pushed the 240712_disk_access_erase branch from 71912b8 to b8be785 Compare July 12, 2024 08:26
@Laczen
Copy link
Contributor

Laczen commented Jul 12, 2024

I get the idea here, but the core difference between the disk APIs and flash APIs has always been that disk devices will not offer an explicit erase command, and flash devices will.

Hi @danieldegrasse, recent changes to the flash API (#71270) will no longer guarantee this difference. I have done my best to avoid this but to no avail. I guess in future disk API will be forced to work under the flash/zsai API or vice versa.

@JordanYates JordanYates force-pushed the 240712_disk_access_erase branch from b8be785 to 5c78191 Compare July 12, 2024 11:29
@danieldegrasse
Copy link
Contributor

I am using SD cards that are too large for the flash API together with an exFAT filesystem. Before creating the filesystem I wish to erase the entire SD card to ensure the entire memory range is in a known state at the start. The reason I require the known state is because I am using the exFAT f_expand functionality to pre-allocate large binary files.

Ok, thanks for the information- so this is essentially a performance optimization to better manage erasing the SD card? I'm not opposed to using all features of the hardware we have here, just trying to determine the best way to do so. I suppose that extending the disk API to support erases does not inherently break anything, provided filesystem implementations don't start relying on it being available.

I do think we need to make it clear in the documentation what the erase will actually set the blocks of the disk to. My read of the SD spec is that the erase command will set the blocks to "logical 0", IE 0x00. Flash erase of course sets the blocks to 0xFF. I don't have an SD card to test with, but if this it is the case we need to settle on one behavior- either erase sets all values of 0xFF or to 0x00.

@JordanYates
Copy link
Contributor Author

Ok, thanks for the information- so this is essentially a performance optimization to better manage erasing the SD card? I'm not opposed to using all features of the hardware we have here, just trying to determine the best way to do so.

I wouldn't really call it a performance optimization because manually writing 0x00 or 0xFF to 256GB of flash is unlikely to finish within several hours... Whereas the erase operation is a few hundred milliseconds. The first is simply not realistic.

SD cards also apparently perform better if they start from a completely erased state, as the benchmarking documents in the SD spec all say that the card must be completely block erased before running any profiling operations.

I suppose that extending the disk API to support erases does not inherently break anything, provided filesystem implementations don't start relying on it being available.

What would be the problem with filesystems relying on it?

I do think we need to make it clear in the documentation what the erase will actually set the blocks of the disk to. My read of the SD spec is that the erase command will set the blocks to "logical 0", IE 0x00. Flash erase of course sets the blocks to 0xFF. I don't have an SD card to test with, but if this it is the case we need to settle on one behavior- either erase sets all values of 0xFF or to 0x00.

I strongly disagree, what should drivers do if they erase the other way? Manually write the opposite value to the entire memory range, which in many cases (SPI NOR flash) then can't be updated without an additional erase? The function may as well be called disk_access_brick in that case.

The erase sets the blocks to the natural erase value of the device, which is either 0x00 or 0xFF. Different brands of SD card will erase to different values, which way they erase is determined by the DATA_STAT_AFTER_ERASE bit in the SCR register. Forcing the value after erase is unworkable, the value should be query-able at runtime (See this handy feature request from 4 years ago #24417).

@danieldegrasse
Copy link
Contributor

What would be the problem with filesystems relying on it?

If filesystems rely on the erase operation and it is not implemented for all disk devices, then we have a situation where some filesystems only work on a subset of disk devices. If possible, I want to avoid this (although I know it may realistically occur while a new API is being introduced)

I strongly disagree, what should drivers do if they erase the other way? Manually write the opposite value to the entire memory range, which in many cases (SPI NOR flash) then can't be updated without an additional erase? The function may as well be called disk_access_brick in that case.

The erase sets the blocks to the natural erase value of the device, which is either 0x00 or 0xFF. Different brands of SD card will erase to different values, which way they erase is determined by the DATA_STAT_AFTER_ERASE bit in the SCR register. Forcing the value after erase is unworkable, the value should be query-able at runtime (See this handy feature request from 4 years ago #24417).

Looking at Linux's usage of the MMC erase command, it seems that erase is used when a discard or trim request is sent: https://github.com/torvalds/linux/blob/f10b439638e2482a89a1a402941207f6d8791ff8/drivers/mmc/core/block.c#L1141. Perhaps disk_access_erase can be used this same way, where filesystems can freely use it to discard unused blocks, or users can use it to manually clear them? I suppose a filesystem does not need explicit knowledge of what value will be present in a disk's erased sectors, if it is only using the erase command to discard sectors no longer in use.

@JordanYates JordanYates force-pushed the 240712_disk_access_erase branch from 5c78191 to 8bcfd2f Compare July 29, 2024 10:11
Copy link
Contributor

@danieldegrasse danieldegrasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly think we might want to name this API disk_access_discard- what are your thoughts? My reasoning is as follows:

  • The closest API linux has to this is the "BLKDISCARD" ioctl in linux/fs.h. I think matching linux here makes sense, especially since we could support this with NVME devices as well.
  • using "discard" matches better with what is really happening at the controller level (IMO), namely that the sectors likely aren't being erased, just being presented to the user as erased.

@@ -100,6 +100,16 @@ int disk_access_read(const char *pdrv, uint8_t *data_buf,
int disk_access_write(const char *pdrv, const uint8_t *data_buf,
uint32_t start_sector, uint32_t num_sector);

/**
* @brief erase data from disk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably note here that the value of the erased bytes is driver specific (or perhaps limit the acceptable values to 0xff and 0x0, but I'd prefer to leave it open ended)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One request here, otherwise I'm good with this change- we should document here that the erased data must read back as 0x0 or 0xFF. This way we align with what the test checks, and specifically exclude a disk operation like DISCARD from being implemented to satisfy this API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a command to get the info, so with implementation of erase we also should provide the IOCTL or other means to get info on what erase value there is on the device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally a chance to close the very first issue I submitted to Zephyr #24417 :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. Yeah, still assigned to me, somehow. You can have it.


* Erase test: Verifies that the driver can consistently erase sectors. This test
follows the same flow as the write test, but at each step erases the data
written to the disk and reads it back to ensure all data is 0x00 or 0xFF. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really imagine a reason that a disk would erase to anything besides 0x0 or 0xff, but if we are testing disks like this we need to document this as an API restriction in disk_access_erase. IE something like the following in the docstring:

Sectors may read as 0x00 or 0xFF after erasing, depending on the underlying device implementation

@JordanYates
Copy link
Contributor Author

namely that the sectors likely aren't being erased, just being presented to the user as erased.

What are the controllers that would behave this way? The implementations I added are all physically erasing the memory contents (to the best of my ability to verify), and I would be surprised if NVME_OPC_WRITE_ZEROES is not doing what it says on the tin.

@danieldegrasse
Copy link
Contributor

What are the controllers that would behave this way? The implementations I added are all physically erasing the memory contents (to the best of my ability to verify), and I would be surprised if NVME_OPC_WRITE_ZEROES is not doing what it says on the tin.

I don't know how the controllers themselves behave. The reason I believe the sectors wouldn't be erased is because erasing flash is typically a rather slow operation, and as you pointed out the SD erase command executes quickly.

Logically it seems like controller firmware would handle an erase request by setting some sort of flag for each sector indicating that (from the user's view) the sector was erased. That would reduce wear on the actual flash part versus actually erasing it, and improve execution time of the command.

Again, I want to emphasize this is a guess. I have not disassembled an SD card to inspect the data on the flash chip within after attempting an erase.

I'd agree that NVME_OPC_WRITE_ZEROES probably does exactly what it says on the tin- but I doubt the controller is really writing 0x0 to the underlying flash, as that would induce unnecessary wear.

My guess is the reason that Linux uses the phrase "DISCARD" over erase is that erasing for NVME and SD devices is generally just the controller "faking" an erase, by now reporting different data to the user. It may also simply be because the NVME specification uses "discard" to describe this operation. Really my preference comes down to the fact that Linux already uses "discard" to describe this, and generally following Linux makes sense. Is there a specific reason you'd prefer the "erase" name?

@JordanYates
Copy link
Contributor Author

Logically it seems like controller firmware would handle an erase request by setting some sort of flag for each sector indicating that (from the user's view) the sector was erased. That would reduce wear on the actual flash part versus actually erasing it, and improve execution time of the command.

I'm not actually sure how that would work in practice, given that erase is a byte level operation, marking entire sectors as "erased" seems like it would just make your life more complicated in the future. Doing a "lazy" erase seems unlikely for SD cards at least since the SD card association says to perform a full erase before doing any profiling. But I'm not an SD card engineer so maybe you're right.

Really my preference comes down to the fact that Linux already uses "discard" to describe this, and generally following Linux makes sense. Is there a specific reason you'd prefer the "erase" name?

Mostly because discard makes me think of a filesystem "delete" operation, which only updates the metadata of the file (especially given the BLKDISCARD definition exists in linux/fs.h). As opposed to this proposal, which is a disk level operation which at least attempts to perform an erase operation.

@de-nordic
Copy link
Contributor

de-nordic commented Aug 20, 2024

@danieldegrasse @JordanYates The Disk API can support trim and erase commands, but these differ and depend on device served, and may have different behavior on the same device. Here I will mention how eMMC defines the operations.

The TRIM operation returns write sector to device controller marking it as no longer used, and available for wear management etc. This may invoke immediate or delayed erase, but depends on several factors.
The ERASE command erases erase block group, which may consist of several write sectors.

If your intention is to pre-erase device before use, the second command is the one you are probably looking here.
The TRIM operation should be exposed for File Systems, as some of them are aware of it and will "return" write sector, to controller when no longer in use; in case of (e)MMC such write sector will be marked for erase, when such operation is possible, so it may be postponed, as it may be part of group that may not be erased at that time.

Depending on what the card is designated for, erase groups may be bigger and smaller.

Pre-earsing makes write faster: that is true, at least for erase requiring devices, because the write, on hardware level, is only flip bits from erase value to opposite; this means that if you write, for example, block of 512 bytes at once, what will happen is that the hardware will first erase the block and then write what you want; and that is not entirely true either, because you may get pre-erased block from controller. In the end you have a NAND or other flash/prom technology underneath, just one more controller between you and the "metal".

Pre-erasing also helps when you stream data continuously, but will probably just slightly help when you do random write operations, as the controller may need to copy sectors around.

Writing 0xff or 0x00 byte by byte does not help, because the device will try to erase and then write so in the worst case scenario you end up doing multiple erase cycles to single sector, or your data is slowly moved to next available sector with new 0xff added at the end. So no, writing 0xff or 0x00 does not work as erase, it is erase-then-write.

Block-oriented file systems do not care about device being pre-erased or not, they write blocks, expect to read blocks and decide on validity of data depending of position and content rather than whether it differs from erase value.

You can pick info on device erase value from device.

There is nice article here https://lwn.net/Articles/428584/ that touches on organization of SD devices and how write operations are affected.
You also need to take a look at the MMC Product Standard JESD84-A441 from JEDEC (https://www.jedec.org/system/files/docs/JESD84-A441.pdf) , as it describes the TRIM and ERASE operations and their secure variants.

So, in conclusion: if you plan to add the ERASE and TRIM commands to Disk Access, that is great, and will be useful for file systems that can use it, just take into account the differences and plan for supporting the currently know full set of them, even if you just want the base functionality for now (so the non-secure variants, for example).

@danieldegrasse
Copy link
Contributor

So, in conclusion: if you plan to add the ERASE and TRIM commands to Disk Access, that is great, and will be useful for file systems that can use it, just take into account the differences and plan for supporting the currently know full set of them, even if you just want the base functionality for now (so the non-secure variants, for example).

I agree here- I think the discussion is currently entirely around ERASE commands, not TRIM. My understanding (based on looking at the Linux kernel source) is that the BLKDISCARD IOCTL will result in an ERASE command being issued to the SD card. NVME devices will issue a dataset management command, with the "deallocate" attribute set to true.

This article seems to do a good job summarizing the different IOCTLs linux offers for discard: https://rwmj.wordpress.com/2014/03/11/blkdiscard-blkzeroout-blkdiscardzeroes-blksecdiscard/ (it may be outdated though)

I think to summarize, we could take one of two approaches here:

  • implement erase using disk_access_discard. On NVME devices we would send a dataset management command, similar to linux
  • implement erase using disk_access_erase. On NVME devices we would send a write zeroes command, like the BLKDISCARDZEROS IOCTL does.

Looking at this more, I am convinced disk_access_erase probably makes more sense. We are deviating from Linux here, but I think this API is more likely to be useful than disk_access_discard. Given that, @JordanYates could you take a look at clearing the CI failures?

@de-nordic
Copy link
Contributor

Erase then. OK, we could probably plan to modify fs_mkfs to issue the command, when requested by user, to devices that support it.

Here is additional explanation of Trim/erase/discard from the point of the MMC commands: https://matthewkipper.com/posts/emmc-wipe/, and it actually visible in the code link https://github.com/torvalds/linux/blob/f10b439638e2482a89a1a402941207f6d8791ff8/drivers/mmc/core/block.c#L1141 (taken from here #75802 (comment)), where mmc_blk_issue_trim_rq is doing TRIM, but the mmc_blk_issue_discard_rq calls the mmc_blk_issue_erase_rq with operation that has been set as preferred for given card.

The NVM actually calls the operation deallocate: https://nvmexpress.org/wp-content/uploads/NVMe-NVM-Command-Set-Specification-1.0a-2021.07.26-Ratified.pdf

@JordanYates JordanYates force-pushed the 240712_disk_access_erase branch from 8bcfd2f to cac0312 Compare August 26, 2024 11:17
@JordanYates
Copy link
Contributor Author

Given that, @JordanYates could you take a look at clearing the CI failures?

The CI failures were all unreleated as far as I can see, rebased on main

* @param[in] start_sector Start disk sector to erase
* @param[in] num_sector Number of disk sectors to erase
*/
int disk_access_erase(const char *pdrv, uint32_t start_sector,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danieldegrasse you are probably planning to design or oversee design of the other erase function like TRIM, secure erase and secure TRIM of MMC, so question to you: do you plan for additional callbacks or adding parameter to this one?
If the later is the case we could already do the type of erase parameter now, and avoid doing API change process in he future.

@@ -100,6 +100,16 @@ int disk_access_read(const char *pdrv, uint8_t *data_buf,
int disk_access_write(const char *pdrv, const uint8_t *data_buf,
uint32_t start_sector, uint32_t num_sector);

/**
* @brief erase data from disk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a command to get the info, so with implementation of erase we also should provide the IOCTL or other means to get info on what erase value there is on the device.

for (int i = 0; i < num_sector; i++) {
ret = fs_write(&ctx->file, erase_bytes, LOOPBACK_SECTOR_SIZE);
if (ret < 0) {
LOG_ERR("Failed to erase backing file: %d", ret);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth reporting the sector number? Just asking

@Laczen
Copy link
Contributor

Laczen commented Aug 27, 2024

@JordanYates, maybe a bit late, but would it not be better to provide this as part of the ioctl function ? This could avoid any confusion regarding the confusion between block_size and erase_block_size when the ioctl function would work on erase_blocks.

@JordanYates JordanYates force-pushed the 240712_disk_access_erase branch from cac0312 to bc59a3b Compare August 27, 2024 11:32
@JordanYates
Copy link
Contributor Author

@JordanYates, maybe a bit late, but would it not be better to provide this as part of the ioctl function ? This could avoid any confusion regarding the confusion between block_size and erase_block_size when the ioctl function would work on erase_blocks.

I don't feel like erasing a disk is closer to DISK_IOCTL_GET_SECTOR_COUNT, DISK_IOCTL_CTRL_SYNC, DISK_IOCTL_CTRL_INIT, etc than disk_access_read and disk_access_write.

@Laczen
Copy link
Contributor

Laczen commented Aug 27, 2024

@JordanYates, maybe a bit late, but would it not be better to provide this as part of the ioctl function ? This could avoid any confusion regarding the confusion between block_size sector-size and erase_block_size size when the ioctl function would work on erase_blocks.

I don't feel like erasing a disk is closer to DISK_IOCTL_GET_SECTOR_COUNT, DISK_IOCTL_CTRL_SYNC, DISK_IOCTL_CTRL_INIT, etc than disk_access_read and disk_access_write.

Maybe this can be used as inspiration: http://elm-chan.org/fsw/ff/doc/dioctl.html. I am unsure if disk_access_read, ..._write are working on the same block definition as disk_access_erase. If they are not the api might be a bit misleading, moving it to the ioctl could take this away as it can be defined to be working on blocks and not sectors.

@JordanYates JordanYates force-pushed the 240712_disk_access_erase branch from bc59a3b to 1568cbf Compare August 27, 2024 23:14
@danieldegrasse
Copy link
Contributor

danieldegrasse commented Apr 8, 2025

So we will have like erase sector, write sector and read sector, where there may be different sizes? I think that this is less confusing than figuring out what page means. I am ok with that.

No, sorry if I wasn't clear- the idea would be that erase sector, write sector, and read sector are all the same size in bytes- even though the erase size might be larger. The disk_access.c file would enforce the requirement that when disk_access_erase was called the erase sector count is a valid multiple (which API consumers can find using the DISK_IOCTL_GET_ERASE_BLOCK_SZ IOCTL

Copy link

github-actions bot commented Jun 8, 2025

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Jun 8, 2025
@JordanYates JordanYates removed the Stale label Jun 8, 2025
Add a function for erasing a chunk of blocks.

The SD Card Association Pt1 Simplified Physical Layer Specification
recommends to erase many blocks simultaneously in order to enhance data
throughput (4.3.5.1).

Signed-off-by: Jordan Yates <jordan@embeint.com>
Add the `disk_access_erase` command to complement the read and write
commands.

Signed-off-by: Jordan Yates <jordan@embeint.com>
Implement the `disk_access_erase` function by calling out to the lower
layer SD card drivers.

Signed-off-by: Jordan Yates <jordan@embeint.com>
Implement `disk_access_erase` by setting all bytes to 0x00, with the
same bounds checking as `disk_access_write`.

Signed-off-by: Jordan Yates <jordan@embeint.com>
Implement `disk_access_erase` for flash disks.

Signed-off-by: Jordan Yates <jordan@embeint.com>
Implement `disk_access_erase`, which requires a sector sized buffer
of 0's to provide to `fs_write`.

Signed-off-by: Jordan Yates <jordan@embeint.com>
@JordanYates
Copy link
Contributor Author

Added :

  • erase functionality to sdmmc_stm32.c
  • Erase type enum parameter to disk_access_erase
  • Offset and length validation to disk_access_erase

Add support for erasing blocks to the STM32 SDMMC driver.

Signed-off-by: Jordan Yates <jordan@embeint.com>
Ensure that `sector-size` and `erase-size` match for the test.

Signed-off-by: Jordan Yates <jordan@embeint.com>
Test the disk erase functionality.

Signed-off-by: Jordan Yates <jordan@embeint.com>
@JordanYates JordanYates force-pushed the 240712_disk_access_erase branch from 7a6b9dd to 345e4bd Compare June 13, 2025 05:24
@JordanYates
Copy link
Contributor Author

Added missing loop on stm32_sdmmc_is_card_in_transfer for the STM32 SDMMC implementation

Copy link

@JordanYates JordanYates added this to the v4.2.0 milestone Jun 14, 2025
@danieldegrasse danieldegrasse modified the milestones: v4.2.0, v4.3.0 Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants