What explains the APFS sibling volumes architecture ( / & Data )

As a system & security administrator I started to install a lot of Unixes, 20 years ago with a dual volume for security purpose, inside critical infrastructures:

volume                  mount options
------------------------------------------------
/                       ro
/var                    rw, nosuid, nodev

Everything which could be end user or admin modifiable and to be referenced from / was defined through simple symbolic links:

/tmp        --> /var/tmp
/home       --> /var/home
/local      --> /var/local
/opt        --> /var/opt
/private    --> /var/private

And through many tests, and real attacks pressure of every day, with such a configuration, even as root, it was impossible to damage the system. Many attacks struck us ( ~ 20 / day )… none succeeded ( at least as I was aware of, and as I wasn't fired ).

Why did Apple chose a rather more complex way similar architectures with the 2 volumes:

volume                  mount options
------------------------------------------------
/                       ro
/System/Volumes/Data    rw, nosuid, nodev

with a new concept of firmlinks which is not compatible with any other Unix FS, which brought Apple to put fundamental components of their new APFS outside of the FS internals ( in plain old files ) and which is rather very tricky to understand and to manage for system and security administrator?

To give just one example of an highly deceiving point: it isn't now possible to make a quick carbon copy of a volume with tools as simple as cp or rsync because of new extended attributes.

Real life teach us everyday that complexity is one of the biggest enemy of performance and security.

What are the advantages of this sibling volumes architecture? ( I am not talking here of the real internal advantages of APFS versus HFS and traditionnal Unix UFS or ZFS, which I much easily grasped and verified in real life. ).

Answered by DTS Engineer in 798802022

What are the advantages of this sibling volumes architecture?

I can't give an authoritative answer, but I can give you a general answer. Basically, there are are two main issues:

A) At a very high level, any change to the overall file system layout is inherently very risky. It would certainly simplify the overall architecture if we could consolidate EVERTYHING into two ("read only"/"read write") or (possibly) three ("RO system, RW system, RW user"), but it would also require a completely restructuring the entire system layout. That's going to break a lot of stuff and I'm not sure if justifies the benefit (particularly given #2).

B) On the technical side, there is an issue with an architecture that's based purely on symbolic links:

/                       ro
/var                    rw, nosuid, nodev

The issue here is, once multiple volumes are involved, how does the "/" volume determine what volume it'd link "target"? The obvious answer would be to use the volume name, however, that won't work because:

  • The "/" volume should include NO user information AT ALL and, ideal, should minimize any kind of "configuration specific" data.

  • We want the data and system volume to be strongly linked so that, for example, volume name collisions won't change the data volume of a configured system volume.

One thing to note here is both of these issues would still be there even if you completely reorganized the hierarchy to rigidly segregate data into exactly two "buckets". "How do I find these directories?" Simply chanages to "how do I find var?"

These issues are why "volume groups" were created- the volume groups connects a system and data volume with each other, so that the boot system "knows" to treat them as a matched "set". With that construct, a firmlink is basically a symbolic link that's referenced "through" the volume group, instead of solely through it's simple path.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

What are the advantages of this sibling volumes architecture?

I can't give an authoritative answer, but I can give you a general answer. Basically, there are are two main issues:

A) At a very high level, any change to the overall file system layout is inherently very risky. It would certainly simplify the overall architecture if we could consolidate EVERTYHING into two ("read only"/"read write") or (possibly) three ("RO system, RW system, RW user"), but it would also require a completely restructuring the entire system layout. That's going to break a lot of stuff and I'm not sure if justifies the benefit (particularly given #2).

B) On the technical side, there is an issue with an architecture that's based purely on symbolic links:

/                       ro
/var                    rw, nosuid, nodev

The issue here is, once multiple volumes are involved, how does the "/" volume determine what volume it'd link "target"? The obvious answer would be to use the volume name, however, that won't work because:

  • The "/" volume should include NO user information AT ALL and, ideal, should minimize any kind of "configuration specific" data.

  • We want the data and system volume to be strongly linked so that, for example, volume name collisions won't change the data volume of a configured system volume.

One thing to note here is both of these issues would still be there even if you completely reorganized the hierarchy to rigidly segregate data into exactly two "buckets". "How do I find these directories?" Simply chanages to "how do I find var?"

These issues are why "volume groups" were created- the volume groups connects a system and data volume with each other, so that the boot system "knows" to treat them as a matched "set". With that construct, a firmlink is basically a symbolic link that's referenced "through" the volume group, instead of solely through it's simple path.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

The "/" volume should include NO user information AT ALL and, ideal, should minimize any kind of "configuration specific" data.

This is perfectly achieved with a simple directory /var inside / volume (RO ) and a bucket of symbolinc links ( the traditionnal one ) pointing to directories inside the var volume ( RW ) mounted on /var, as exposed in my OQ.

As a concrete example, there is a need to clearly separate system applications ( RO ) and user installed applications ( RW ), this is achieved throught the use of 2 different clearly distinct directories and one preconfigured symbolic link inside / ( RO ⇒ non modifiable ):

User visible path     real path to        volume
------------------------------------------------
/Applications         /Applications       /
/local/Applications   /var/Applications   /var


And that's all folks!

Thus my OQ stands: why didn't Apple choose such a basic, simple and efficient construct with just 2 basic volumes?

Real life teach us everyday that complexity is one of the biggest enemy of performance and security.

Thus my OQ stands: why didn't Apple choose such a basic, simple and efficient construct with just 2 basic volumes?

I'll try and summarize my earlier answer.

  1. The size and skill range of our user base and app ecosystem meant that any file system reorganization was going to be extremely disruptive.

  2. We needed a construct that would tightly bind the system and data volume in a way that a simple symbolic link would not. The construct we created for that were firmlinks. Note, given our security and technical requirements, that construct would have been necessary even if we reorganized the file system layout.

  3. Firmlinks were then used to avoid that major file system reorganization.

Note that #1 here is extremely critical. Moving directories around may be the simplest solution, but that doesn't matter if it breaks critical apps that users rely on, not to mention generating mass confusion and support calls.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

What explains the APFS sibling volumes architecture ( / & Data )
 
 
Q