PDA

View Full Version : File Size vs "size on disk"


liltaz
11-25-2002, 04:18 PM
When I click on a file and go to properties to see the file size, I see two sizes. I see size and "size on disk" what is the difference? The size on disk is ALWAYS bigger and sometimes as much as twice as big.

What is size on disk? Why is it bigger?

Diesel
11-25-2002, 05:03 PM
Size refers to the files actual byte count. Size on disk refers to the amount of cluster allocation the file is taking up.

Drives are organized by fixed-size units called clusters. The size of a cluster depends on several factors, the most crucial being file system and partition size.

Some background to help you understand:
Back in the days of DOS and Win95, the only usuable file system was FAT16 (also known as FAT). Because hard drives were so small back then (1GB was considered HUGE), FAT's limitations were minor, one of which was the limitation that the maximum partition size was 2GB.
With FAT16, the cluster size within a partition was determined by the size of the partition. Here is a chart showing the default cluster size for various FAT16 partition sizes:

Drive Size FAT Type Sectors Cluster
(logical volume) Per Cluster Size
----------------- -------- ----------- -------
0 MB - 15 MB 12-bit 8 4K
16 MB - 127 MB 16-bit 4 2K
128 MB - 255 MB 16-bit 8 4K
256 MB - 511 MB 16-bit 16 8K
512 MB - 1023 MB 16-bit 32 16K
1024 MB - 2048 MB 16-bit 64 32K

Now, with small drives, this wasn't a problem because partitions were small. Once drives started growing, the cluster size issue became problematic.

Here's why:
When you write a file to disk, the smallest unit it can take up is 1 cluster. If you have a 1024MB (1GB) partition, and the default cluster size is 32KB, a 1KB file will take up a 16k cluster.
Since you can't have more than one file occupying a cluster, the remainder of that cluster is considered wasted space. In this example, a 1KB file is taking up 32KB of disk space, and wasting 31KB of that space.
Say you have a 68KB file. That file will be written acrosss several clusters, since a cluster is smaller than the actual file. Thus, a 68KB file will take up 2 full 32KB clusters, and 4KB of a third cluster, while also wasting 28KB of that third cluster. While the file is only 68KB, it's taking up 96KB of disk space.

As you can see, with a lot of small files, you can waste a ton of space on such a drive.

To remedy the situation, MS unveiled FAT32 with the introduction of Windows 98. FAT32 not only allowed for significantly larger partition sizes, but also reduced the default cluster size dramatically. Under FAT16, a 2GB partition used a default cluster size of 64KB. Under FAT32, that was reduced to 4KB. A 16GB partition only used 8KB clusters, and up to 32GB uses only 16KB clusters.
As you can see, the savings on such large drives can be significant, resulting in more usuable drive space.

With Windows XP, the home user was introduced to NTFS. Under NTFS, a 1GB partition uses 2KB clusters, and anything above that up to 2TB (terabytes) only uses 4KB clusters.

So, the reason you're seeing that difference is because when you select properties, the Space On Disk reading is accounting for the wasted space attributable to the cluster size on the disk.

Note: MS has a chart with the cluster sizes listed on TechNet (http://www.microsoft.com/technet/treeview/default.asp?url=/TechNet/prodtechnol/winxppro/reskit/prkc_fil_lxty.asp).

liltaz
11-26-2002, 10:50 AM
Thank you, Dan!

I'm guessing there's no way to get my wasted space back? Defragmenting or something?

Diesel
11-26-2002, 11:10 AM
Well, it's not really an issue of "getting it back", since that's just how the space is used. There's not really any way around it.

Defragmenting might help somewhat, since it rewrites the files to take up contigous clusters, but it's not going to be able to get past the fact that pieces that don't fit completely into a particular cluster are not going to reclaim the slack.

What file system are you using on the partition in question?

The only way you'd be able to reclaim the space, so to speak, is to use a more efficient file system like FAT32 or NTFS.

liltaz
11-26-2002, 01:09 PM
FAT 32 I believe...

Diesel
11-26-2002, 02:26 PM
Well, your only other real option is to make a change to the file system, which will require formatting the hard drive.

You can force a particular cluster size using the format command. However, you will need to copy your data to another location, perform the format, then restore it to the original location.
Also, there's sometimes a slight performance hit associated with the smaller cluster sizes because the drive has to do more work to read the data across several clusters, thus, more system overhead.

Really, that's your only option, other than just living with it.

azraeldarkchyld
12-20-2004, 07:38 AM
When I click on a file and go to properties to see the file size, I see two sizes. I see size and "size on disk" what is the difference? The size on disk is ALWAYS bigger and sometimes as much as twice as big.

What is size on disk? Why is it bigger?

Currently, checking the size of all the files I have on C drive I get the following results...

SIZE: 107 GB (115,937,379,363 bytes)
SIZE ON DISK: 107 GB (115,808,757,886 bytes)

So, my obvious question is, with the above explanation stating that the 'Size on disk' is always larger because of wasted space due to clusters... how can the size on disk be smaller... over a hundred megs smaller. I mean surely it is the size on disk that counts, once you are out of space on your hard disk you are out of space completely. This seems to imply that I must be able to get more than my 111 Gigs before my hard drive is too full to save anything.

My ideas were
a) a buffer so that your hard drive is never completly full.
b) some of the size is including what is on RAM at the time.
c) compressed and rar'd files - but this doesn't make sense because compressing an 11KB word document down to 1 KB uses 4KB on disk... in other words the cluster explantation from above.

In reality I have no idea and would LOVE someone to tell me how that works...
:eek:

Diesel
12-20-2004, 09:20 AM
NTFS file-system level compression, most likely.

If file and folder compression is enabled, Windows can compress rarely used files automatically. It LOVES to do this with the Disk Space Cleanup wizard, if you run that in lieu of doing disk space management on your own. It also does this for Windows Update files.

The compressed files and folders will show up as blue by default in Windows Explorer, and you can quickly see them in C:\WINDOWS on an XP system. If you keep up with Windows Update, you should have a slew of blue names in the folder starting with $NTUninstallKB.

On a system that hasn't been rebuilt in a while, these directories show up as follows:
Size: 329MB (345,425,335 bytes)
Size on disk: 216MB (227,331,556)

Now, to expand out to try to explain a possible reason why you're seeing this discrepancy on your drive...
Note that just in the patches and service pack files detailed above, there's a difference of over 113MB.
Now, keep in mind that with the NTFS default cluster size of 4KB, the MOST that could possibly be wasted in any cluster is 4KB (1 cluster occupied by a 0 byte file). You're not going to have too many instances of this type of waste over the course of your hard drive, so figure you're typically wasting 1-2KB per file, on average. Over the course of a whole hard drive, there is no way you're going to have enough waste to completely offset the 113MB difference. 13MB is possible, although extreme, which could explain the difference you're seeing.

As for your ideas:
a) Buffers don't work that way. They simply hold the data temporarily while waiting for the drive to get the write head in position. This is a performance increaser, and takes place in a matter of milliseconds. It is handled entirely by the drive itself, and the OS is blissfully unaware that it even exists.
b) The OS reads the disk information from the file system. The RAM is not part of the disk system, and does not factor into the file system information.
c) With ZIP, RAR, and other compressed file formats, the compression is occuring at the file level, so the files are included at their full size, and does not account for the compression taking place within the archive. All it sees is the ZIP or RAR file, and reports the space usage of that file. This differs from what I explained above, since the compression Windows does is at the file system level, not at the file level where ZIP and RAR take place.

Monster
12-22-2004, 04:21 PM
BTW, FAT32, FAT16 and FAT partitions can also be compressed. On Windows 95, 98 and ME, there's a menu called "Start" -> "Accessories" -> "System Utilities" and there should be a "Compress Drive ..." menu item. (On Windows 3.1 even there should be something similar called "DriveSpace")

Compressing a huge drive that was previously uncompressed can take hours, but it's worth it! :)

There should also be alternative file systems like NTFS or ReiserFS for Windows 95/98/ME -- however you can't just convert the partition, I think.

On Windows NT, 2000 and XP, you can convert a FAT32 partition into a NTFS partition without data loss. Look in the Windows Help Center (on XP) or in Windows Help (NT/2000) for a description how to do this.

To compress an uncompressed NTFS drive on NT/2000/XP, just right-click on the drive under "My Computer" select "Properties" then check the "compress files" box and hit "Apply" or "OK". This way you can compress entire partitions.

(btw, NTFS is not as safe as Microsoft wants to suggest us -- recently I had so much trouble with it on a damaged hard drive, that I began using GNU/Linux and ReiserFS as an alternative; while Windows would not even boot anymore and destroyed the entire NTFS partition, GNU/Linux was able to repair itself time after time -- I was able to rescue all my data onto a new drive; I will never use Windows again, because I lost a lot of important data in the past months. NTFS is just as unreliable as FAT.)

Diesel
12-22-2004, 06:09 PM
Some good points, Monster, but it should be pointed out that DriveSpace and NTFS file-system compression are two completely different things.

DriveSpace simply compresses a volume, regardless of the file system. EVERYTHING on that volume gets compressed, and when it's accessed, it needs to get decompressed. This introduces a HUGE performance hit on all data accesses to that volume.
This also would not explain the issues that azraeldarkchyld is seeing, because the volume would appear larger than it physically is, and and the disk information would appear normally for files on that disc. It doesn't compress the files and folders on the disc in the same way, so only the volume would look larger, but the data on it would appear normally sized (IOW, it won't differentiate with Size on disk the same way).

NTFS compression happens at the file system level, not the volume level, so the compression happens on-the-fly on specific files and folders, and not the entire volume. This means that while rarely accessed files and folders are compressed, the rest of the data on the drive remains in it's normal, uncompressed state. As such, the performance hit only occurs when you try to access older compressed files, and performance remains normal when accessing regularly used files and folders.

Volume-wide compression such as DriveSpace, generally speaking, is a BAD thing. Should get corrupted, you will lose all data on the entire volume, not just individual files and folders.

NTFS file and folder compression is only slightly safer. If something gets corrupted, you only lose data in those portions where the corrpution occurred.

Even Microsoft does not recommend applying NTFS or DriveSpace compression wholesale to an entire partition. It's generally a very bad practice, and has been for years. It puts your data at an unnecessary risk by adding an extra layer of potential failure. With disk space as cheap as it is, compression is completely unnecessary in all but the MOST extreme of circumstances.

Monster> I'm not surprised that you ended up losing all of your data, since you apparently are enamored with drive compression across the whole drive. Again, even MS doesn't recommend this configuration... unfortunately, you found out the hard way exactly why it's not recommended. Textbook example.

Monster
12-22-2004, 11:55 PM
What I learnt was that NTFS is a toy filing system which isn't suitable for real-world applications.

I don't care about "do's and dont's" with a filing system, which is supposed to work correctly under all conditions.

Besides, I did not know about the NTFS limitations that you told me, and I would expect XP to alert me if I compress an entire hard drive, in that case. There's no mention of such a limitation in the XP Online Help Center.

Performance-wise, compressing an entire partition has speed benefits, because generally nowadays the hard drive access time is far slower than CPU processing time. Compression/decompression has no significant CPU overhead. If you have plenty of files on a hard drive, you can save a lot of space.

Both NTFS and FAT have the grave design error that they do not checksum disk blocks, and hence errors in disk blocks cannot be detected. There are no mirror copies of important blocks.

If the IFS kit for Windows wouldn't be unnecessarily expensive, I would've developed a new file system for Windows already. Now, since I'll be using only GNU/Linux in the future, there'll be only a version for that, but I doubt there'll be a necessity. ReiserFS is a journalling file system that can actually keep its promises and it will continue to get better.

NTFS does not keep its promised data safety, which was -- for me -- a reason to buy Windows XP in the first place. Now that this proved to be wrong as well (like other false statements made in XP advertisements, like "your computer will be faster"), there's no reason for me to use Windows XP anymore.

This incident has just proven to me again that Microsoft is unable to write an operating system that is suitable for daily use. Just look at all the BS you have to go thru as a Windows user, like reinstalling the system every now and then, having to reboot now and again, having to use virus scanners and defragmenters, having to patch the OS every other day, ... just to mention a few. To me, Windows and especially Microsoft Office are the biggest software project failures in history, still commercially successful, but a failure nonetheless. Because they cannot keep fooling their users like that forever ... someday it will fall back on them. Can you say "Quality Control"?

I had XP right from the beginning, which means I've seen a lot of untested code, like the so-called "multiuser" capability, which doesn't exist. I tried to work using a user with limited rights, but a lot of desktop applets wouldn't play along. This gave a ridiculuous image of a "professional" operating system.

Then I already knew why some people called "Windows NT" a "toy operating system". A former customer of my former employer would've said "banana software -- matures at the customer's place".

If a small software company would deliver software as bad as Microsoft's, customers would call and complain every day until all problems were fixed. Just imagine what would happen if there's ever a class action lawsuit filed against Microsoft, then boom, no more Microsoft.

Diesel
12-23-2004, 05:52 AM
Sorry, I just find it kinda laughable that you find NTFS unsuitable, and a "toy filing system", when the vast majority of Fortune 500 companies find it perfectly fine for their data.

NTFS is not something new to XP, as you seem to imply. It's been around since the NT 3.51 days, then NT4, 2000, and now 2003. It's the default file system on ALL of MS's server platform, which just happens to be the platform that the vast majority of businesses, small, large, and global, all run.
In my professional career, I've overseen, collectively, several thousand Windows servers, every last one of them running NTFS. Know how many times we've lost a whole partition due to corruption? Once.

Just because yours managed to get screwed up using it in a config that no one recommends doesn't mean that it doesn't work. Every single person who's ever taken an MCSE certification exam knows you shouldn't use disk compression or file system compression across an entire volume.

Best practices is a perfect compendium of things that you're able to do with an OS and it's utilities, but probably shouldn't. Don't blame XP for not holding your hand through a process that it's able to perform. If you have the ability to write a new file system, you certainly have the ability to look up a best practices document.

Monster
12-23-2004, 11:35 PM
Problem is, that end users don't have access to MCSE's. What you're saying sounds like Hitchhiker's Guide to the Galaxy to me: "This is Prostetnic Vogon Jeltz of the Galactic Hyperspace Planning Council," the voice continued. "As you will no doubt be aware, the plans for development of the outlying regions of the Galaxy require the building of a hyperspatial express route through your star system, and regrettably your planet is one of those scheduled for demolition. The process will take slightly less that two of your Earth minutes. Thank you." ... "There's no point in acting all surprised about it. All the planning charts and demolition orders have been on display in your local planning department on Alpha Centauri for fifty of your Earth years, so you've had plenty of time to lodge any formal complaint and it's far too late to start making a fuss about it now."

It might be, that for server farms fostered by MCSE's, a Windows server is a viable solution, but for end users, Windows has become increasingly unbearable in the past years, and not just because of all the security holes. Just take a look around you.

As an end customer, I feel tricked by Microsoft into using their operating systems for many years.

Master Noodle
12-23-2004, 11:43 PM
HOT DAMN! This is one confusing thread!

Diesel
12-24-2004, 09:48 AM
Problem is, that end users don't have access to MCSE's.

It might be, that for server farms fostered by MCSE's, a Windows server is a viable solution, but for end users, Windows has become increasingly unbearable in the past years, and not just because of all the security holes. Just take a look around you.

As an end customer, I feel tricked by Microsoft into using their operating systems for many years.

The item you're neglecting to account for in your logic is that the settings that MS recommends are set that way BY DEFAULT. You had to actually go in and change them yourself to bring them out of recommended spec, and then you're blaming MS because of your mistake.

With the various server farms I've worked with, in terms of the file systems, it's literally a "set-it-and-forget-it" solution. You choose the file system. You format it. You use it.

Funny thing is that it's exactly the same way on XP Home and Pro.

YOU were the one who decided to tweak things outside of recommended specs, and when it all blew up on you... of course, it was MS's fault.

Monster
12-24-2004, 04:12 PM
Well, I don't have to buy an OS if I have to leave everything at default settings. That's just ridiculuous. If an option exists -- bam! I use it! Shocks and horrors! MS better learn how to write software that actually works.

I know the kind of attitude tho, it's very widespread in the computer pro arena.

I'm a different kind of person than those -- I want to write software that actually works, is easy to use and foolproof.

I've been a programmer for 23 years now, and I'm upset how some people work in that field. Why not put quality at the top of the feature list?

It's a simple matter, really.

BRiT
12-24-2004, 11:19 PM
It's not the developers' fault, it's the fault of the project managers and the others in upper management who decide when a product is finished. If you really were a developer that worked for a real company on real products, you would know this by now.

Diesel
12-25-2004, 01:03 AM
If an option exists -- bam! I use it!

That's the problem right there.
Again, Best Practices. Every OS has a published list available, as do many applications.
Just because they give you options doesn't mean that you should just go ahead and use them without knowing the ramifications.

Aros
12-25-2004, 03:59 AM
I bought a 160GB Hard Disk, which showed up as a 137GB Hard Disk. Quite a difference, there was no way to fix this. It's not just a visual error, I can't store more then (about) 137GB on it.

Monster
12-25-2004, 04:55 AM
It's not the developers' fault, it's the fault of the project managers and the others in upper management who decide when a product is finished. If you really were a developer that worked for a real company on real products, you would know this by now.

I'm in Germany, and my employers never gave me the option of failing the product or project schedule.

For small companies like ours, meeting the deadlines is essential for survival, and also fixing bugs immediately at customer's request.

BRiT
12-25-2004, 11:58 AM
I bought a 160GB Hard Disk, which showed up as a 137GB Hard Disk. Quite a difference, there was no way to fix this. It's not just a visual error, I can't store more then (about) 137GB on it.

That's the fault of Marketting, and why most people feel marketting is worthless. They typically mislead consumers by using the bigger is better ploy despite the reality of the situations. There's lies, damned lies, and then marketting.

WaterB
12-25-2004, 03:03 PM
I bought a 160GB Hard Disk, which showed up as a 137GB Hard Disk. Quite a difference, there was no way to fix this. It's not just a visual error, I can't store more then (about) 137GB on it.


Aros.. that's a different issue. You should look on your motherboard's manufacturer site to see if they have a new bios that will allow you to see past 137GB

BRiT
12-25-2004, 07:39 PM
Doh. I totaly forgot about that issue. However, even with a newer bios, a 160Gb drive will only provide 149Gb of useable storage because 160 Marketting GB (base 1000) = 149 Real GB (base 1024). Hence my comment about marketting being evil.

Diesel
12-25-2004, 08:45 PM
Just as an FYI, what BRiT is talking about is the difference between Gigabytes (real base 1024) vs. Gibibytes (marketing base 1000).

Drive manufacturers measure the capacity in Gibibytes for marketing purposes to inflate the perceived capacity of a drive, so a 160GB drive is actually marketed using 160*1000*1000, which comes out to 160,000,000 bytes.
However, when the OS recognizes the drive, it recognizes it using base 1024, or Gigabytes, so 160,000,000 bytes /1024/1024 = 152.59GB (Gigabytes). Add overhead for filesystem formatting, and you get roughly 149 Gigabytes.

For such a confusing issue, hopefully that makes things clearer.

Monster
12-26-2004, 02:31 AM
The international SI system defines "giga" to mean "billion of", i.e. 1 giga-byte would be 1,000,000,000 bytes. However, in the computer world, factor 1024 is often used instead of factor 1000 (for kilobytes), so 1 giga-byte in the computer world would be 1,073,741,824 bytes. There are various writing conventions that are being used to differentiate between the two, like "Gigabyte", "GByte" or "GB". The latter two usually refer to 1024 (for kilobytes). An uppercase "B" usually means bytes, while a lowercase "b" usually means bits. So, "1 Gb" would be one gigabit, not gigabyte. With bits, factors of 1000 are always being used.

However, the reason why 160 GB drives cannot be fully used is in the operating system. I had the case a couple of months ago, when I was installing a 160 GB drive, and only after making an install CD of Windows XP with Service Pack 2 (how that can be done can be found on the XPSP2 Microsoft pages), Windows XP SP2 recognized a 158 GB drive during XP installation. An unpatched XP shows far lower sizes. GNU/Linux also cannot use the full size yet. (BIOS setting: LBA)

Diesel
12-26-2004, 11:36 AM
However, the reason why 160 GB drives cannot be fully used is in the operating system. I had the case a couple of months ago, when I was installing a 160 GB drive, and only after making an install CD of Windows XP with Service Pack 2 (how that can be done can be found on the XPSP2 Microsoft pages), Windows XP SP2 recognized a 158 GB drive during XP installation. An unpatched XP shows far lower sizes. GNU/Linux also cannot use the full size yet. (BIOS setting: LBA)

Your first statement is inaccurate. In addition to explaining why the OS doesn't see the full size of the drive in your first paragraph, the OS does not determine how much of the space can be used. That's determined by the file system, and how much overhead the formatting of that file system introduces.

As for why your SP2 install worked... that's because MS didn't support 48-bit LBA until Service Pack 1. In order for Windows XP to recognize drives >137GB, you needed to have BIOS support and OS support. BIOS support is determined by the motherboard manufacturer, and the OS support was added in SP1.
http://support.microsoft.com/default.aspx?scid=kb;en-us;303013

Further, GNU/Linux basically ignores the BIOS output in terms of drive size support. If you use Linux fdisk, you can input the physical drive parameters manually, and it presents the full size of the drive to the OS. I have a 200GB drive running on my RH9 box with no problems. Auto-detection probably would not have worked properly.

chisholm
12-23-2005, 07:35 AM
I have a web site which I plan to up-load to a free web hosting service. Their disk space limit is 50 MB. My system runs Windows 2000 Professional. Windows 2000 currently reports the folder (and all the subfolders and files in it) "size" as 41 MB, but reports the "size on disk" as 48 MB.

When I've finished the site, the folder (and all the subfolders and files in it) "size" will still be less than 50 MB but the "size on disk" may exceed 50 MB.
In this case, would I be exceeding the allowed disk space on the web server, which is 50 MB?

Diesel
12-23-2005, 07:53 AM
Size on disk on your system *should* be the same as theirs, assuming they use default cluster sizes when formatting their drives.
Only way to tell is to try it.

chisholm
12-24-2005, 09:18 AM
I have another web site, hosted by my ISP as part of their standard service package. In this case, their disk space limit is 5 MB. Windows 2000 reports the disk space occupied by this same web site, on my HDD, as follows:-

Size: 4.18 MB
Size on disk: 7.85 MB
Thus it would appear that the actual space occupied by it, on my ISP's web server, corresponds to the lower figure.

I'd be interested to hear from anyone else who can throw some light on this issue of actual space occupied on a web server versus the hosting service's stated disk space limit.


Merry Christmas, everyone - or happy Hanukkah, or whatever, according to your religion.

Robert T. Chisholm

Diesel
12-24-2005, 11:05 AM
I'd be interested to hear from anyone else who can throw some light on this issue of actual space occupied on a web server versus the hosting service's stated disk space limit.

It's no different than what's occuring on your local disk.
It depends on what file system they're using, the size of the volume they have your data stored on, and the block size they're using in the formatting of that volume.

With 4k blocks, you'll have a minimum of wasted space. If they have you on a 2TB volume where they've set the block size to 64k, you'll have a great amount of wasted space. So, the size of the data and the size of the data on disk will vary greatly.

chisholm
12-25-2005, 09:41 AM
O.K., "Diesel", many thanks for the insight.

From this, it's obvious that the only way to answer the question is to try up-loading the finished web site to the hosting provider and see what happens.

Merry Christmas!

Robert T. Chisholm

baysurfer
09-03-2008, 02:21 PM
Hi, just curious, which file size do hosting companies use to assess charges to their customers? I assume the 'size on disk' but just wanted to validate. Thanks.:shrug:

BRiT
09-05-2008, 06:38 PM
It should always be size on disk, since their charge is based on how much space on disc is used. This usually isn't an issue for a hosting company which uses some form of Unix as the OS.

Junglizm
09-06-2008, 04:37 PM
Just as an FYI, what BRiT is talking about is the difference between Gigabytes (real base 1024) vs. Gibibytes (marketing base 1000).

Marketing has nothing to do with it. Engineers use base 10 (SI/decimal prefix) as their standard, while OSes report disk size in base 2 (binary prefix). What your OS is reporting is actually MiB (http://en.wikipedia.org/wiki/Mebibyte) and GiB (http://en.wikipedia.org/wiki/Gibibyte) rather than MB and GB.

Diesel
09-06-2008, 05:25 PM
Junglizm> The origin of MiB and GiB is based in the marketing used by the hard drive manufacturers, in that they used a measurement not in common usage, which would give them the ability to show inflated capacity numbers than what were used by pretty much every computer system in existence (and is still that way today). Note that (according to the article you referenced) the actual measurement definitions weren't approved by the IEC until 1998, long after the measurement had become commonplace from a manufacturing standpoint.

By that point, it had become so commonplace that hard drive manufacturers were utilizing the base 10 measurement, yet computer users were seeing significantly less space due to the OS (and all other computing measurements, for that matter) were using the the base 2 measurement, that a definition came about to help differentiate the two.

For the sake of argument to explain the "why", let's say a hard drive maker is coming up against the limits of areal density at the time, which is a constant struggle that they'll always be involved in. They can either push out a drive listed as 100GB with 100,000,000,000 bytes of capacity using one measurement, or push even harder against the areal density limit to come out with the same drive, same 100GB capacity, with 107,374,182,400 bytes by using the commonly accepted measurement.
If you're a hard drive manufacturer, it's a simple decision to not try to force over 7 billion extra bytes onto a set of platters that you're already struggling to get up to the right capacity.

Also note (from the Megabyte article (http://en.wikipedia.org/wiki/Megabyte) ) that Megabyte was and still is commonly calculated from using the base 2-based value of 1024 (or 2^10).

For historical reference, while the terms MiB and GiB have been approved by the IEC since 1998, they really haven't gained popular acceptance until the last 3-4 years. When this question was originally posed in 2002, the terms weren't even common knowledge (or common usage) among most computer scientists, much less end-users.

Junglizm
09-06-2008, 08:08 PM
Yes, the MiB/GiB designations were created to explain the discrepancy, but that's not why the discrepancy exists. Engineers have long used base 10 as their standard system, being human rather than computers. The manufacturers simply sell media based on their system, rather than how the OS reports it. It doesn't really make sense to do it any other way, since they ARE producing drives of the proper size. If anything, OSes should be reporting in the same format the engineers use, not the other way around.

The origin of MiB and GiB is based in the marketing used by the hard drive manufacturers, in that they used a measurement not in common usageHow exactly is base-10 uncommon in engineering or bandwidth/storage measurements?

Also note (from the Megabyte article ) that Megabyte was and still is commonly calculated from using the base 2-based value of 1024 (or 2^10). From the article.

A megabyte is a unit of information or computer storage equal to either 10^6 (1,000,000) bytes or 2^20 (1,048,576) bytes, depending on context. In rare cases, it is used to mean 1000×1024 (1,024,000) bytes. It is commonly abbreviated as Mbyte or MB (compare Mb, for the megabit). The term megabyte was coined in 1970.[1]

1. 1,000,000 bytes (10002, 106): This is the definition recommended by the International System of Units (SI) and the International Electrotechnical Commission IEC. This definition is used in networking contexts and most storage media, particularly hard drives, Flash-based storage, and DVDs, and is also consistent with the other uses of the SI prefix in computing, such as CPU clock speeds or measures of performance.
2. 1,048,576 bytes (10242, 220): This definition is most commonly used in reference to computer memory, but most software that display file size or drive capacity, including file managers also use this definition. See Consumer confusion (in the "gigabyte" article).

The base 10 method is consistent in networking, storage and clock speed - the latter of which predated the computer by some 50 years. This is not a marketing scheme, it's simply a longstanding engineering convention.

Diesel
09-06-2008, 09:23 PM
If it's not a matter of marketing, then PLEASE explain to me how, to this day, when used in the context of hard drives, they are marketed (as in "the box is labelled...") using GB, but the sizes are based on GiB?

In terms of base-10 being consistently used in networking contexts, I suggest it STRONGLY depends on the context. IP addressing schemes are 100% binary in nature, and are only displayed in decimal or hex for user-friendliness. In terms of bandwidth measurements, it's almost always been my experience that terms like Kb and Mb are again using base-2 calculations, never base-10.
So right there, the consistency argument goes right out the window.

Clock speed is something that's time-based, so it makes practically no sense to use base-2 calculations in that regard. It's only common sense that particular sector uses base-10.

In fact, I'll go so far as to say that, based on my years of experience in IT, the ONLY area where the base-10 calculations are being referenced consistently is in the engineering sector. Everyone else in the real world pretty much consistently uses base-2 calculations for storage and networking.
Unless, of course, they're ignorant to the difference. Or they work in the marketing department.

Junglizm
09-07-2008, 12:10 AM
Dan, in network engineering the SI prefixes are very common.

Cisco, Juniper, Adtran, Foundry et al will tell you that:

1000b = 1kb
1000B = 1KB
1000000b = 1Mb
1000000000b = 1Gb
etc

Again, it is generally only the OS that reports in base-2.

If it's not a matter of marketing, then PLEASE explain to me how, to this day, when used in the context of hard drives, they are marketed (as in "the box is labelled...") using GB, but the sizes are based on GiB?
Because the storage is created in GBs - kilo, mega and giga mean 10^3, 10^6 and 10^9, respectively - these are conventions that go back to the metric system and engineering measurements of oscillation, watts, joules etc. It's the OS that reports the capacity in KiB, MiB or GiB. The engineers are using the same system they always have. Perhaps this is not the best way to describe binary storage, but it's certainly not a marketing scheme; it's a holdover from pre-binary systems. kilo, mega and giga (http://en.wikipedia.org/wiki/Kilo-) were borrowed by the binary system to (incorrectly) approximate their base-2 numbers.

http://en.wikipedia.org/wiki/Giga
When referring to computing information units, such as gigabit or gigabyte, giga- can sometimes mean 1,073,741,824 (2^30), (Though such use [is incorrect (http://physics.nist.gov/cuu/pdf/sp811.pdf) [pdf]) and is better used only to denote strictly 1,000,000,000 (10^9).
http://en.wikipedia.org/wiki/Kilo-
Officially adopted in 1795 (though in common use before that), it comes from the Greek χίλιοι ("khilioi"), meaning thousand.

Arizona Landscaping - Internet Marketing - Debt Consolidation - Renegade Motorhomes