SI vs. IEC prefixes and bytes #10584

SolraBizna · 2023-11-16T06:55:16Z

SolraBizna
Nov 16, 2023

This is a continuation of a conversation with @rlidwka in Why is Bevy's Coordinate System Right-Handed Y-Up? about SI prefixes and bytes. It's less off-topic here than it was there. (The heading says "Chat that doesn't fit anywhere else", and this doesn't fit anywhere else...)

Neither "byte", and especially not "word" are SI/SI-derived units, so SI standards are not applicable here.

This is the first time I've ever heard someone claim that SI prefixes are not applicable to non-SI units. In my circles they are widely used as general prefixes, including for currency and for otherwise unitless numbers. If you have cites for that, I would love to see them. (I'm not being sarcastic, please educate me if I'm wrong! I was sure that bit, and therefore octet/byte, was part of the SI family, but Google quickly corrected me on this. It's not been so helpful on the prefix applicability question.)

Going on a tangent here: Computer science does not use SI prefixes. Current recommendation, as far as I remember, is to use "kilobyte" as a unit of storage capacity, which is shorthanded as "KB" or occasionally "KiB" for clarity, and always means 2^10 bytes. I can provide citations, but I'm afraid it has nothing to do with Bevy really. Edit: I vaguely remember "decimal kilobyte" being a unit, but it is rarely used, so doesn't have a shorthand.

Mass storage manufacturers, communications standards, Apple (starting in 2009), every graphical file manager I've used on *nix, IEC 60027-2 (published in 1999), and every single lawsuit that has hinged on these prefixes all hold the opposite view. [Edit: It would be remiss not to add that Microsoft has still not gone along with IEC 60027-2, and many (but not all) command-line programs I've used on *nix have not done so either.]

For the record, "KiB" is the IEC 60027-2 abbreviation for "kibibyte".

As for the strangeness of using the "word" unit to count memory: in the PDP-7's era, it was common to measure storage in units of whatever word size was applicable to that architecture. In the PDP-7's case, that meant 18 bits per word. Its memory was not in any sense addressable in units smaller than a word, so this all made perfect sense at the time. Thus, the PDP-7 was marketed as having "4 kilowords", meaning 4096 words. (IEC 60027-2 was still almost half a century away.)

And if you think all that's bad... If you said "byte" at the time, you had a decent chance of being taken as meaning anywhere from 5 up to 9 bits, assuming the other person had even heard the term before. If you were lucky, it matched the bits-per-column of a punched card system that person was accustomed to. If you've ever wondered why RFCs often use the term "octet" instead of "byte", this is why; "octet" never means anything but eight bits. ...Except that some people have confusingly used "octet" to mean one octal digit (three bits), which... ugh, I'm still mad about that.

Random relevant fact: a double-sided high-density Sony floppy disk does not hold either 1.44 megabytes or 1.44 mebibytes, it actually holds 1440KiB (1 474 560 bytes). So not only are both meanings used in computer science, sometimes they're both used in the same unit!

rlidwka · 2023-11-17T08:49:53Z

rlidwka
Nov 17, 2023

Actual standards in use

And I was about to cite JEDEC 100B.01, unfortunately, it's behind a paywall of some sort now, so I guess I'm not doing that (it defines both KB and KiB whenever it's used as a unit of storage capacity as 1024 bytes).

IEC vs JEDEC is being discussed a lot, so I think I'll just skip it.

About "kibiwords"

I was objecting to the use of "kibiword", because this introduces an ambiguity where there wasn't any. It is called kilowords, you can check any dictionary or any literature on the subject. And it is factor of 2^10, because it is all but impossible to have a memory unit where 0x3e7 is a valid address but 0x3e8 is not.

By saying "kibiword", you are acting as if there are decimal kilowords you need to make distinction with. But there simply couldn't be decimal kiloword for technical reasons.

A bit of history and anti-capitalist rant

With bytes it is a little bit more difficult, because decimal prefixes are actually used by hard drive manufacturers. In fact, the whole distinction between GB and GiB was invented by them for the sole purpose of avoiding getting their asses sued for false advertisement.

It's no different than milk manufacturers who started using kilograms instead of liters. Funnily enough, even the margin is almost the same (1 kilobyte is 1.024 decimal kilobytes, 1 liter of milk is 1.035 kilograms). If you see it in a supermarket, you'll perceive no difference (that says 1, this says 1). But 3% adds up.

This is called "shrinkflation". One of the ways to screw over customers without visibly increasing the price (another being intentional quality decline, I don't remember a term for it). There is no technical reason for having decimal kilobytes, so I bet if 10th power of 2 was just a little bit below 1000, this whole discussion wouldn't exist.

So, in my opinion, if you ever see a person use "kilobytes" and mean 1000, they are up to no good.

On applicability of metric prefixes

Prefixes "kilo", "mega", etc. are a part of natural language, where they can be used to mean:

exactly a factor of thousand
approximately a factor of thousand (e.g. 4K standard of screen resolution refers to exactly 3840)
or they can simply mean "a lot" (the word "megacorporation" certainly doesn't mean million of anything).

Metric prefixes are only well-defined when used with SI units. When used outside of SI, you'd be wise to double-check the meaning for each individual unit. In particular, it does not make any sense to use metric (base-10) units as is when the area you are describing is not using decimal system.

Computers and computer science are not base-10. The bases used are 2, 8, 16. If you use decimal kilobytes, you'd get rounding errors where you otherwise wouldn't, thus it makes sense to expect that kilobytes aren't actually decimal.

Another example that comes to mind here is: time. Clocks aren't base-10 either, the bases used are 12 and 60. You can of course start counting time in kiloseconds, but don't expect to be well-understood by others. This example may sound ridiculous, but people tried to have 100 seconds in a minute (even was official French government standard for 6 months, if wikipedia is correct on this).

My position on the matter

Suggestion here is: treat all units with SI prefixes as separate units. And remember, which units are used in which context with which multiplier.

Just memorize that kilometers are spelled km, and mean 1000 meters. Kilobytes are spelled KB (with capital K) and mean 1024 bytes. Kilograms are spelled kg, and are not a derivative unit at all (it's the other way around, gram is defined as 1/1000th of kg, for whatever reason). And kiloseconds aren't a thing anywhere except possibly astronomy.

Measuring storage space in decimal kilobytes is as ridiculous as measuring distance in "kibimeters". Different science area, different unit = different conventions.

5 replies

SolraBizna Nov 18, 2023
Author

(IEC vs JEDEC)

I couldn't find official JEDEC sources outside the paywall either, but I stumbled on several mentions that JEDEC uses SI-style names when not abbreviating but encourages IEC-style abbreviations for them. This half measure bothers me.

In any event, I certainly would have caused more confusion if I had written "Kiw" or "Kiword" instead of "kibiword", so I can't use the "Ki" compromise there.

… it is all but impossible to have a memory unit where 0x3e7 is a valid address but 0x3e8 is not.

By saying "kibiword", you are acting as if there are decimal kilowords you need to make distinction with. But there simply couldn't be decimal kiloword for technical reasons.

There is no technical reason for having decimal kilobytes …

… the whole distinction between GB and GiB was invented by [drive manufacturers] for the sole purpose of avoiding getting their asses sued for false advertisement …

So, in my opinion, if you ever see a person use "kilobytes" and mean 1000, they are up to no good.

Measuring storage space in decimal kilobytes is as ridiculous as measuring distance in "kibimeters". Different science area, different unit = different conventions.

A Commodore 64 Datasette could store 90 kilobytes in a 30-minute cassette. 90 000 bytes, not 92 160.

The IBM 650 typically shipped with two kilowords of drum memory—kilowords, not kibiwords. 2000, not 2048. On that system, 0x7cf was a valid address and 0x7d0 was not. Customer who replaced an aging "4 kiloword" IBM 650 with a brand-new "4 kiloword" PDP-7 were pleasantly surprised at the 96 "bonus words" (after they got over not having to optimize their machine code for drum rotation anymore).

There's no technical reason that we should have to actually build a memory in units of powers of two, and that goes double when it's not made out of individually-manufactured components storing individual bits. Continuous memory has always been measured with decimal prefixes. The weirdness of the Sony floppy size comes from this. 2.88 kilosectors. Exactly 2880 sectors of 512* bytes each. Marketing GB instead of GiB is undeniably shrinkflationally convenient for hard drive manufacturers (and TB instead of TiB even more so), but it's not why they started doing it.

*My memory has faded on the details and Google is unhelpful, but these 512-byte sectors contained 8192 bits spent to encode 4096 data bits (the 512 bytes programs see) and a clock signal, plus some extra bits before and after the data for tracking and error correction. Dividing it into 512-byte sectors was convenient for most software, but the hardware didn't care either way, and some software used alternative encodings that didn't work that way. And if you've ever played an original PlayStation game, you've used a system that mixed 2048 and 2336 byte sectors on the same medium, within the same filesystem.

Suggestion here is: treat all units with SI prefixes as separate units. And remember, which units are used in which context with which multiplier.

Can you think of any context outside of computers where "kilo", "mega", etc. were used, with exact meaning, as prefixes to mean something other than multiples of 10^3? (I can't, but that doesn't mean they don't exist.)

(replying to nothing specifically:)

I cut my teeth in an environment where "ton" was three different units depending on context, and that's rubbish. If someone quotes me a figure in unspecified tons, I make a best faith effort to figure out which one they mean, and if I can't, and the exact value matters, I'll ask. If I quote a figure to someone else, I'll always use metric tons and I'll always say "metric tons", so no one can ever be confused about which ton I mean.

I've also worked with people who had the opposite "," vs. "." polarity from me, so I don't write "12,345.67" or "12.345,67", I write "12 345.67". No one will see that non-breaking space and treat it as the radix point, and unless I have exactly three digits after the radix point, even someone used to using "." for grouping will realize what I mean without having to ask.

If someone says "eight gigabytes" and the exact value matters I'll try to figure out whether they mean 2^30 or 10^9. If they're talking about a DRAM module, 2^30 is a safe guess. If they're looking at a file manager, now I need to know whose logo is attached to their OS. If the next words out of their mouth are "per second", an entire new dimension of marginally-critical matters of interpretation arise. Are they reading directly from a Windows Explorer file transfer dialog (2^30), or a network monitoring tool (10^9)? Did they take a quoted gigabits/second figure from a network monitoring tool and divide by eight (correct if it's a framed rate), or by ten (correct if it's a raw symbol rate and 8b10 encoding is in use)? Cherry on top, none of these possibilities can swing the exact number by anywhere near the casual, routine "I rounded it" margin of ±50%. So if the exact value matters, I'll just ask, and it almost never does, so I almost never do.

At no point in that tree is there a node where I say "Excuse me, did you mean gibibytes? As you should well know, IEC 60027-2 (published in 1999) …" and "correct" their usage. I've been talking about my own usage this whole time! In some contexts, "kilo" can be used to "correctly" mean 10^3 or 2^10, and that is enough reason for me to avoid using it in those contexts.

Before IEC 60027-2, the prefix "kilo" meant either 10^3 or 2^10 based on context, and there wasn't a term I could use that would never mislead anyone. After IEC 60027-2, "kilo" is still ambiguous, but now I can use "kibi" when I mean 2^10 and nobody will think I could mean 10^3. Twenty-five years and tens of thousands of invocations in, this is the first time I've ever had it called into question.

When I'm describing the crystal frequency of a quartz watch, I sometimes say it's tuned to "32 KiHz" (kibihertz) because that's a more convenient representation than the common "32.768 kHz" (kilohertz). I might expect someone not to have seen Ki/kibi before and not know what it is, but if they have, I can rest easy knowing that they will definitely take it to mean 2^10 and not some other value. (This is not a hypothetical example, I've used kibihertz at least half a dozen times and was instantly understood every time.)

If I'm describing a program that uses roughly 10 000 000 bytes of memory I'll say it uses "ten megabytes". If I'm describing a computer with 16 777 216 bytes of address space I can say it can address "exactly sixteen mebibytes". If it's not a convenient power of two and the exact value matters, I won't use a prefix at all, I'll just give a figure in bytes. This is common practice among my friends and colleagues as well.

If I were in a situation were kibimeters provided a convenient representation of a distance, I might use them. I can't say for sure because I've never been in such a situation.

I could have avoided all of this rigamarole by just saying the PDP-7 shipped with 4096 words of memory. Next time I will.

Asides

1 kilobyte is 1.024 decimal kilobytes, 1 liter of milk is 1.035 kilograms

I didn't know that one. That's a little uncanny. (Insert canned milk joke here?)

Residents of the US get some far more blatant number massaging in food.

… intentional quality decline, I don't remember a term for it

Shrinkflation gets used to describe this too, and that usage didn't bother me until you brought it to my attention. Maybe we could call it suckflation?

I might've inserted a rant about planned obsolescence or right to repair here, but I think anyone who enters the discussion page of a Rust-all-the-way-down game engine is going to be in complete agreement on those topics. (I'm aware of the irony of writing this post on a system with a 24-step battery replacement procedure that had to be reverse engineered.)

If you use decimal kilobytes, you'd get rounding errors where you otherwise wouldn't, thus it makes sense to expect that kilobytes aren't actually decimal.

By rounding errors, do you mean inexact display, or inexact calculation? Your display is inexact as soon as you're chopping off digits. It's true that dividing by a power of 10 gives inexact results in a binary system of math, but inexact results still must be rounded correctly. When your phone's map app displays a distance of "1296m" as "1.3km", that's not a rounding error, that's rounding. I would expect just as many incorrect rounds for any given multiplier, just because most people don't think about rounding when they do math on computers.

There is a commonly cited example, that goes something like:

Because computers don't think in decimal, they can't even represent "0.1". They have "0.1000000000000000055511151231257827021181583404541015625" instead.

Decimal 0.1's binary representation has an infinite number of digits, but for any given floating point format and rounding mode, there is one particular representation it exactly specifies. If you input "0.1" as a double precision value, and the software displays "0.1000000000000000055511151231257827021181583404541015625", the software is wrong. That result is seen if you use this simple algorithm for converting the fractional part to decimal:

while n != 0
  n := n * 10
  output_digit(floor(n) % 10)
  n := n - floor(n)

but this algorithm is not what the standards specify. The correct value to display is "0.1", because that's the minimum number of decimal digits necessary to exactly specify that specific double-precision value. The other 54 digits aren't adding any additional precision, they just making things clunkier. Which is why, if you open a calculator app (or your preferred REPL) and enter "0.1", it displays "0.1".

Prefixes "kilo", "mega", etc. are a part of natural language, where they can be used to mean:

exactly a factor of thousand

approximately a factor of thousand (e.g. 4K standard of screen resolution refers to exactly 3840)

or they can simply mean "a lot" (the word "megacorporation" certainly doesn't mean million of anything).

Re 2: I have seen screens described as 4K with resolutions as low as 2560x1440 and as high as 6144x(I don't remember). The first "4K display" I ever saw had the extremely weird resolution of 3020x1698. There are ways to justify any point on this spectrum. I don't go around "correcting" people when they say 4K, but I don't use the term myself—I quote an exact resolution instead. And I have never once heard anyone pronounce "4K" as "four kilo", even though "kilo" is where the "K" comes from.

Re 3: "mega" and "giga" meant "very large" before the SI prefixes were invented, and did not lose that meaning just because they gained another. ~~Likewise, "kilo" still means "thousand" in French.~~ [Edit: not sure enough about this to assert it]

Re my example above about tons, "ton" also has a colloquial meaning of "a lot". This is not an imprecise invocation of a standard unit, this is the word having an existing meaning that doesn't attempt to be a unit. I don't see any ambiguity there. I do sometimes replace my own casual usage of "ton" with "metric ton", but I do that because it's funny, not because of unit activism.

The word "theory" has a specific meaning in science, and a broader meaning in other contexts. When someone says "I have a theory", I don't respond with "Wait. Do you mean an explanation that has been proven through repeated and systematic testing, or did you actually mean to say hypothesis?" But I will say "hypothesis" myself. (Is a pattern emerging?)

You can of course start counting time in kiloseconds, but don't expect to be well-understood by others. This example may sound ridiculous, but people tried to have 100 seconds in a minute (even was official French government standard for 6 months, if wikipedia is correct on this).

The worst part of that was the redefinition of the second. One decimal second was equal to 0.864 customary seconds. I'm glad the attempt failed, but I do have to admire the audacity of it.

A Deepness in the Sky features a spacefaring human civilization that uses kiloseconds, megaseconds, gigaseconds, etc. It never quite stops feeling silly, but it sort of works in a non-planetbound context. The fact that the human circadian rhythm naturally drifts between 86 and 88 "ksec", and the issues that would arise from the inconvenience of or variance within that number, are simply never addressed in the text.

...

In conclusion, thanks for taking the time and effort to communicate your thoughts on these matters. And also thanks for reading and/or skimming the ranting of a grumpy old fart who has just decided to start pronouncing KiCAD as "one point oh two four kiloCADs" just to troll people.

SolraBizna Nov 18, 2023
Author

Residents of the US get some far more blatant number massaging in food.

Entertaining and informative YouTube video on this topic: https://www.youtube.com/watch?v=mxNPpte_6m4

rlidwka Nov 19, 2023

You won this argument.

If people are using KiHz, which I didn't know, binary prefix becomes required for disambiguation. And once you have that, retroactively applying it to KB is logical next step.

I don't particularly like the naming of it, calling them "binary kilobytes" vs "decimal/metric kilobytes" is clearer imho. But that's not an argument against prefix itself.

I couldn't find official JEDEC sources outside the paywall either, but I stumbled on several mentions that JEDEC uses SI-style names

I found that its actually available free of charge, but it forces you to register.

But the relevant quotes from that standard are all quoted verbatim here:

https://www.jedec.org/standards-documents/dictionary/terms/mega-m-prefix-units-semiconductor-storage-capacity

SolraBizna Nov 19, 2023
Author

I don't particularly like the naming of it, calling them "binary kilobytes" vs "decimal/metric kilobytes" is clearer imho. But that's not an argument against prefix itself.

Well, those terms seem perfectly clear to me too, so... 👍

Bonus: "decimal kilobyte" or "metric kilobyte" is a way to disambiguate a 10^3 kilobyte, which is not otherwise possible. And "ten point nine six metric megabytes" sounds badass.

I found that its actually available free of charge, but it forces you to register.

I am amused that neither of us is willing to register just to download this, even though it's free of charge.

(jedec.org link)

That is the least self-confident dictionary definition I have ever seen. o_o

SinsOfSeven Jun 15, 2025

Moved from the other thread.

Going on a tangent here: Computer science does not use SI prefixes. Current recommendation, as far as I remember, is to use "kilobyte" as a unit of storage capacity, which is shorthanded as "KB" or occasionally "KiB" for clarity, and always means 2^10 bytes. I can provide citations, but I'm afraid it has nothing to do with Bevy really. Edit: I vaguely remember "decimal kilobyte" being a unit, but it is rarely used, so doesn't have a shorthand.

Just a little follow-up on this particular topic. Generally speaking people confuse these terms, the standard terms are actually base 10, but they the usage is actually so common for "KB" to mean 1024 bytes, that you'll often see it used that way in software. If you want to be specific about the binary, say Kibibyte, Mebibyte, etc to specify you mean the binary version.
https://en.wikipedia.org/wiki/Byte

There should probably also be an implied DaB,DaiB,HB,HiB though speaking of 10s and 100s of bytes doesn't work well with the 10.24 and 102.4, which is probably why they didn't bother, because you wouldn't be able to use a computer to represent .24 of a byte. But if you wanted to use them to talk about data throughput or something you probably could.

Uh oh!

SI vs. IEC prefixes and bytes #10584

Uh oh!

Uh oh!

SolraBizna Nov 16, 2023

Replies: 1 comment · 5 replies

Uh oh!

rlidwka Nov 17, 2023

Actual standards in use

About "kibiwords"

A bit of history and anti-capitalist rant

On applicability of metric prefixes

My position on the matter

Uh oh!

Uh oh!

SolraBizna Nov 18, 2023 Author

Asides

Uh oh!

SolraBizna Nov 18, 2023 Author

Uh oh!

rlidwka Nov 19, 2023

Uh oh!

SolraBizna Nov 19, 2023 Author

Uh oh!

Uh oh!

SinsOfSeven Jun 15, 2025

SolraBizna
Nov 16, 2023

Replies: 1 comment 5 replies

rlidwka
Nov 17, 2023

SolraBizna Nov 18, 2023
Author

SolraBizna Nov 18, 2023
Author

SolraBizna Nov 19, 2023
Author