addaon 7 hours ago

An okay overview of some high level context for on-disk storage, but it's perhaps more useful to say that disk hardware (and memory hardware) present an abstraction of a bunch of bits. Even for DRAM, there isn't a one-to-one mapping between capacitors the fab etches into the silicon and bits that your software can access at a given physical address. At the lowest level, defective rows are bypassed and remapped. At the next level up, ECC means that a single bit can never be (reliably) pointed at on its own -- instead, the data of, say, 64 bits is smeared across 72 capacitors. For disks, this gets even worse, both because the hardware itself is less reliable and because the slow speed allows more and more tricks to be played. A bunch of bits get mapped to a bunch of blocks, but blocks get remapped, bits within blocks get error corrected, multiple bits are stored in a single physical element, etc.

  • yapyap 6 hours ago

    I imagine the OPs article is pointed at people more novice to the world of computers and his approach of bits while not perfect is good enough, better than confusing the reader IMO. While this would probably be useful for people more deeply already into the world of computers, I doubt the people who get what you are talking about would need a reminder of what’s on their disks. It’s handy to keep in mind who is being written for.

    • analog31 4 hours ago

      My advice to the novice is to learn architecture at the level of something like an 8-bit PC, and to think of more advanced features as solutions to problems inherent in the systems of that era. Alternatively, an 8-bit microcontroller such as an 8051 has a similarly primitive architecture.

ggm 3 hours ago

Most of the complications can be learned after you get comfortable with a basic model. It is entirely true things have got more complicated but the key concepts and most importantly (to me) the language of what disks are comes from their history. The whole block/sector/inner/outer and cache/written and addressing models, comes from the realities of spinning objects. We didn't inherit very many concepts from mercury delay lines in the longer term, but we did from core memory because addressing models "made sense" in the X/Y plane model they exposed and we carried some of that into the future, and into disk sector/block models.

Shingled, SMR, CVR, checksums, RAID, RAM backed, the impact of VM models, L1 and L2 cache, unified file buffer caches.. its all addons which assume you have the basic language around disk "concepts"

Liftyee 6 hours ago

For my previously-shallow level of understanding, this was an insightful article that showed me a little of how the filesystem actually works. I'm vaguely aware of abstractions at the hardware level (especially with solid state memory controllers, wear-levelling...) but that's another layer of abstraction down from that explained here. I'll learn the magic of working around nanoscale physics another day.

The author seems to have a number of explanations of this quality. I've put the one about git submodules on my reading list.