-
Notifications
You must be signed in to change notification settings - Fork 7.7k
storage: disk_access: add disk_access_erase
#75802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
storage: disk_access: add disk_access_erase
#75802
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the idea here, but the core difference between the disk APIs and flash APIs has always been that disk devices will not offer an explicit erase command, and flash devices will.
While it is more efficient, or even required, to erase a flash device with a specific command disk devices do not have the same restrictions
Where are you running into a case that you need to write 0xFF to the disk device to erase it? Is this a case where you are using a flash device emulator layer on top of a disk device?
I am using SD cards that are too large for the flash API together with an exFAT filesystem. Before creating the filesystem I wish to erase the entire SD card to ensure the entire memory range is in a known state at the start. The reason I require the known state is because I am using the exFAT f_expand functionality to pre-allocate large binary files. The existing disk APIs provide no way to achieve this as far as I can tell. As a side bonus it also makes resetting the state of disks between tests much easier. |
71912b8
to
b8be785
Compare
Hi @danieldegrasse, recent changes to the flash API (#71270) will no longer guarantee this difference. I have done my best to avoid this but to no avail. I guess in future disk API will be forced to work under the flash/zsai API or vice versa. |
b8be785
to
5c78191
Compare
Ok, thanks for the information- so this is essentially a performance optimization to better manage erasing the SD card? I'm not opposed to using all features of the hardware we have here, just trying to determine the best way to do so. I suppose that extending the disk API to support erases does not inherently break anything, provided filesystem implementations don't start relying on it being available. I do think we need to make it clear in the documentation what the erase will actually set the blocks of the disk to. My read of the SD spec is that the erase command will set the blocks to "logical 0", IE 0x00. Flash erase of course sets the blocks to 0xFF. I don't have an SD card to test with, but if this it is the case we need to settle on one behavior- either erase sets all values of 0xFF or to 0x00. |
I wouldn't really call it a performance optimization because manually writing 0x00 or 0xFF to 256GB of flash is unlikely to finish within several hours... Whereas the erase operation is a few hundred milliseconds. The first is simply not realistic. SD cards also apparently perform better if they start from a completely erased state, as the benchmarking documents in the SD spec all say that the card must be completely block erased before running any profiling operations.
What would be the problem with filesystems relying on it?
I strongly disagree, what should drivers do if they erase the other way? Manually write the opposite value to the entire memory range, which in many cases (SPI NOR flash) then can't be updated without an additional erase? The function may as well be called The erase sets the blocks to the natural erase value of the device, which is either 0x00 or 0xFF. Different brands of SD card will erase to different values, which way they erase is determined by the |
If filesystems rely on the erase operation and it is not implemented for all disk devices, then we have a situation where some filesystems only work on a subset of disk devices. If possible, I want to avoid this (although I know it may realistically occur while a new API is being introduced)
Looking at Linux's usage of the MMC erase command, it seems that erase is used when a discard or trim request is sent: https://github.com/torvalds/linux/blob/f10b439638e2482a89a1a402941207f6d8791ff8/drivers/mmc/core/block.c#L1141. Perhaps |
5c78191
to
8bcfd2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly think we might want to name this API disk_access_discard
- what are your thoughts? My reasoning is as follows:
- The closest API linux has to this is the "BLKDISCARD" ioctl in linux/fs.h. I think matching linux here makes sense, especially since we could support this with NVME devices as well.
- using "discard" matches better with what is really happening at the controller level (IMO), namely that the sectors likely aren't being erased, just being presented to the user as erased.
@@ -100,6 +100,16 @@ int disk_access_read(const char *pdrv, uint8_t *data_buf, | |||
int disk_access_write(const char *pdrv, const uint8_t *data_buf, | |||
uint32_t start_sector, uint32_t num_sector); | |||
|
|||
/** | |||
* @brief erase data from disk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably note here that the value of the erased bytes is driver specific (or perhaps limit the acceptable values to 0xff and 0x0, but I'd prefer to leave it open ended)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One request here, otherwise I'm good with this change- we should document here that the erased data must read back as 0x0
or 0xFF
. This way we align with what the test checks, and specifically exclude a disk operation like DISCARD from being implemented to satisfy this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a command to get the info, so with implementation of erase we also should provide the IOCTL or other means to get info on what erase value there is on the device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally a chance to close the very first issue I submitted to Zephyr #24417 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh. Yeah, still assigned to me, somehow. You can have it.
|
||
* Erase test: Verifies that the driver can consistently erase sectors. This test | ||
follows the same flow as the write test, but at each step erases the data | ||
written to the disk and reads it back to ensure all data is 0x00 or 0xFF. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really imagine a reason that a disk would erase to anything besides 0x0 or 0xff, but if we are testing disks like this we need to document this as an API restriction in disk_access_erase
. IE something like the following in the docstring:
Sectors may read as 0x00 or 0xFF after erasing, depending on the underlying device implementation
What are the controllers that would behave this way? The implementations I added are all physically erasing the memory contents (to the best of my ability to verify), and I would be surprised if |
I don't know how the controllers themselves behave. The reason I believe the sectors wouldn't be erased is because erasing flash is typically a rather slow operation, and as you pointed out the SD erase command executes quickly. Logically it seems like controller firmware would handle an erase request by setting some sort of flag for each sector indicating that (from the user's view) the sector was erased. That would reduce wear on the actual flash part versus actually erasing it, and improve execution time of the command. Again, I want to emphasize this is a guess. I have not disassembled an SD card to inspect the data on the flash chip within after attempting an erase. I'd agree that My guess is the reason that Linux uses the phrase "DISCARD" over erase is that erasing for NVME and SD devices is generally just the controller "faking" an erase, by now reporting different data to the user. It may also simply be because the NVME specification uses "discard" to describe this operation. Really my preference comes down to the fact that Linux already uses "discard" to describe this, and generally following Linux makes sense. Is there a specific reason you'd prefer the "erase" name? |
I'm not actually sure how that would work in practice, given that erase is a byte level operation, marking entire sectors as "erased" seems like it would just make your life more complicated in the future. Doing a "lazy" erase seems unlikely for SD cards at least since the SD card association says to perform a full erase before doing any profiling. But I'm not an SD card engineer so maybe you're right.
Mostly because discard makes me think of a filesystem "delete" operation, which only updates the metadata of the file (especially given the |
@danieldegrasse @JordanYates The Disk API can support trim and erase commands, but these differ and depend on device served, and may have different behavior on the same device. Here I will mention how eMMC defines the operations. The TRIM operation returns write sector to device controller marking it as no longer used, and available for wear management etc. This may invoke immediate or delayed erase, but depends on several factors. If your intention is to pre-erase device before use, the second command is the one you are probably looking here. Depending on what the card is designated for, erase groups may be bigger and smaller. Pre-earsing makes write faster: that is true, at least for erase requiring devices, because the write, on hardware level, is only flip bits from erase value to opposite; this means that if you write, for example, block of 512 bytes at once, what will happen is that the hardware will first erase the block and then write what you want; and that is not entirely true either, because you may get pre-erased block from controller. In the end you have a NAND or other flash/prom technology underneath, just one more controller between you and the "metal". Pre-erasing also helps when you stream data continuously, but will probably just slightly help when you do random write operations, as the controller may need to copy sectors around. Writing 0xff or 0x00 byte by byte does not help, because the device will try to erase and then write so in the worst case scenario you end up doing multiple erase cycles to single sector, or your data is slowly moved to next available sector with new 0xff added at the end. So no, writing 0xff or 0x00 does not work as erase, it is erase-then-write. Block-oriented file systems do not care about device being pre-erased or not, they write blocks, expect to read blocks and decide on validity of data depending of position and content rather than whether it differs from erase value. You can pick info on device erase value from device. There is nice article here https://lwn.net/Articles/428584/ that touches on organization of SD devices and how write operations are affected. So, in conclusion: if you plan to add the ERASE and TRIM commands to Disk Access, that is great, and will be useful for file systems that can use it, just take into account the differences and plan for supporting the currently know full set of them, even if you just want the base functionality for now (so the non-secure variants, for example). |
I agree here- I think the discussion is currently entirely around ERASE commands, not TRIM. My understanding (based on looking at the Linux kernel source) is that the This article seems to do a good job summarizing the different IOCTLs linux offers for discard: https://rwmj.wordpress.com/2014/03/11/blkdiscard-blkzeroout-blkdiscardzeroes-blksecdiscard/ (it may be outdated though) I think to summarize, we could take one of two approaches here:
Looking at this more, I am convinced |
Erase then. OK, we could probably plan to modify fs_mkfs to issue the command, when requested by user, to devices that support it. Here is additional explanation of Trim/erase/discard from the point of the MMC commands: https://matthewkipper.com/posts/emmc-wipe/, and it actually visible in the code link https://github.com/torvalds/linux/blob/f10b439638e2482a89a1a402941207f6d8791ff8/drivers/mmc/core/block.c#L1141 (taken from here #75802 (comment)), where The NVM actually calls the operation deallocate: https://nvmexpress.org/wp-content/uploads/NVMe-NVM-Command-Set-Specification-1.0a-2021.07.26-Ratified.pdf |
8bcfd2f
to
cac0312
Compare
The CI failures were all unreleated as far as I can see, rebased on main |
include/zephyr/storage/disk_access.h
Outdated
* @param[in] start_sector Start disk sector to erase | ||
* @param[in] num_sector Number of disk sectors to erase | ||
*/ | ||
int disk_access_erase(const char *pdrv, uint32_t start_sector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danieldegrasse you are probably planning to design or oversee design of the other erase function like TRIM, secure erase and secure TRIM of MMC, so question to you: do you plan for additional callbacks or adding parameter to this one?
If the later is the case we could already do the type of erase parameter now, and avoid doing API change process in he future.
@@ -100,6 +100,16 @@ int disk_access_read(const char *pdrv, uint8_t *data_buf, | |||
int disk_access_write(const char *pdrv, const uint8_t *data_buf, | |||
uint32_t start_sector, uint32_t num_sector); | |||
|
|||
/** | |||
* @brief erase data from disk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a command to get the info, so with implementation of erase we also should provide the IOCTL or other means to get info on what erase value there is on the device.
for (int i = 0; i < num_sector; i++) { | ||
ret = fs_write(&ctx->file, erase_bytes, LOOPBACK_SECTOR_SIZE); | ||
if (ret < 0) { | ||
LOG_ERR("Failed to erase backing file: %d", ret); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth reporting the sector number? Just asking
@JordanYates, maybe a bit late, but would it not be better to provide this as part of the ioctl function ? This could avoid any confusion regarding the confusion between block_size and erase_block_size when the ioctl function would work on erase_blocks. |
cac0312
to
bc59a3b
Compare
I don't feel like erasing a disk is closer to |
Maybe this can be used as inspiration: http://elm-chan.org/fsw/ff/doc/dioctl.html. I am unsure if |
bc59a3b
to
1568cbf
Compare
No, sorry if I wasn't clear- the idea would be that erase sector, write sector, and read sector are all the same size in bytes- even though the erase size might be larger. The |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
Add a function for erasing a chunk of blocks. The SD Card Association Pt1 Simplified Physical Layer Specification recommends to erase many blocks simultaneously in order to enhance data throughput (4.3.5.1). Signed-off-by: Jordan Yates <jordan@embeint.com>
Add the `disk_access_erase` command to complement the read and write commands. Signed-off-by: Jordan Yates <jordan@embeint.com>
Implement the `disk_access_erase` function by calling out to the lower layer SD card drivers. Signed-off-by: Jordan Yates <jordan@embeint.com>
Implement `disk_access_erase` by setting all bytes to 0x00, with the same bounds checking as `disk_access_write`. Signed-off-by: Jordan Yates <jordan@embeint.com>
Implement `disk_access_erase` for flash disks. Signed-off-by: Jordan Yates <jordan@embeint.com>
Implement `disk_access_erase`, which requires a sector sized buffer of 0's to provide to `fs_write`. Signed-off-by: Jordan Yates <jordan@embeint.com>
d6cafb3
to
7a6b9dd
Compare
Added :
|
Add support for erasing blocks to the STM32 SDMMC driver. Signed-off-by: Jordan Yates <jordan@embeint.com>
Ensure that `sector-size` and `erase-size` match for the test. Signed-off-by: Jordan Yates <jordan@embeint.com>
Test the disk erase functionality. Signed-off-by: Jordan Yates <jordan@embeint.com>
7a6b9dd
to
345e4bd
Compare
Added missing loop on |
|
Add the
disk_access_erase
function to complement the read and write commands.Erasing through dedicated API's is often exposed as a more efficient operation by hardware than writing
0x00
or0xFF
manually to every byte to be erased.Note: I suspect the NVME driver can support this through
NVME_OPC_WRITE_ZEROES
but I am unsure how to implement that.