TODO: backtick x86 arch to arch(10284) man number

title: “The Architecture of Sadness: Determining the Subprocess Architecture Rules for MacOS Rosetta” date: 2024-02-10 categories: [‘Troubleshooting’, ‘Guides’] draft: true meta: true description: TODO #

Note: In this post, x86 and “Intel” are synonymous with the x86_64 CPU architecture compiler identifier code. arm64, “ARM”, and “M1”/“M2” (in the context of architectures) are synonymous with the aarch64 identifier code.

Recap #

Awhile back, I wrote a post about my journey to diagnosing an issue regarding dtrace when spawned by a cargo plugin on MacOS.

In short, what I found was that a workstation-provisioning tool I used had installed cargo as an x86-only binary. When cargo spawned its plugin subprocess, that plugin subprocess spawned dtrace (a universal binary) as x86. An issue occurred when dtrace tried to attach to a traced subprocess that was ARM-only, which makes sense given that DTrace was running as x86.

Copied from that post, the stack of subprocesses and compile/runtime architectures involved look like this:

my shell (zsh): ARM-only, run as ARM
    \_ cargo: x86-only, run as x86
        \_ ~/.cargo/bin/flamegraph: ARM-only, run as arm
            \_ sudo: universal, run as x86
                \_ dtrace: universal, run as x86
                    \_ program being profiled, ARM-only, run as ARM

Fixing the issue involved doing either one of:

Installing cargo as an ARM binary instead of Intel. I did this as a short-term fix and everything worked.
Updating the code of ~/.cargo/bin/flamegraph to to run arch -arch arm64 sudo dtrace ... instead of just sudo dtrace, which would “force” the architecture of the spawned program to ARM. I put up a PR to do this in flamegraph, which worked when I tested it–even when using an x86-only cargo.

That brings us to the present. Hypothesis obtained and tested, fix implemented. Case closed.

Just One More Thing #

Hang on a sec, though. Something doesn’t quite add up here. Like, observed behavior doesn’t lie, sure … but the theory as to why this behavior is occurring isn’t consistent. I was assuming that the architecture “launch preference” of subprocesses is inherited from the architecture that their parent processes are run as, as if it were an environment variable.

But that’s clearly not happening, since the flamegraph program is and always has been ARM-only on my system. So the preference is clearly being inherited by sudo/dtrace from cargo, being passed down from the grandparent cargo process.

And if architecture preference is inherited from the grandparent, why isn’t it inherited from the great-grandparent, my ARM-only shell?

zac@atropos ~ ∴ file $SHELL
/opt/homebrew/bin/zsh: Mach-O 64-bit executable arm64

What are the actual rules here?

In this post, I’ll document my journey to enumerating the MacOS subprocess architecture preference rules both experimentally and via MacOS sources.

Assembling the Tools #

To figure out what the behavior is here, I want three programs:

A program that spawns as a subprocess whatever arguments are passed to it, compiled as x86-only.
The same program, compiled as ARM-only.
A program that prints out the architecture that it’s being run as, compiled as a universal binary.

Before I start writing code, I need to make sure I can build Intel binaries in the first place. To do that, I install a Rust toolchain for x86:

∴ rustup target add x86_64-apple-darwin
info: downloading component 'rust-std' for 'x86_64-apple-darwin'
info: installing component 'rust-std' for 'x86_64-apple-darwin'

I remember fifteen years ago when cross compiling required hours of work, arcane knowledge, and lots of luck. Sure is easier these days! Now let’s write some code.

First, the spawner:

use std::env;
use std::os::unix::process::ExitStatusExt;
use std::process::{exit, Command};

fn main() {
    // Skip the first argument, the name of *this* program.
    let mut args = env::args().skip(1);
    let exitstatus = Command::new(args.next().expect("At least one arg needed"))
        .args(args)
        .spawn()
        .unwrap()
        .wait()
        .unwrap();
    exit(exitstatus.into_raw());
}

Does it work?

cargo run --bin spawn echo foo bar
   Compiling untitled v0.1.0 (/Users/zac/Desktop/Projects/Personal/interviewing)
    Finished dev [unoptimized + debuginfo] target(s) in 0.14s
     Running `target/debug/spawn echo foo bar`
foo bar

Sweet! Let’s build two of them: spawn_arm and spawn_intel, where the names correspond to the architecture which the spawner is compiled for rather than the architecture it necessarily spawns processes as.

∴ cargo build --target=x86_64-apple-darwin --bin spawn
∴ file target/x86_64-apple-darwin/debug/spawn
target/x86_64-apple-darwin/debug/spawn: Mach-O 64-bit executable x86_64
∴ mv target/x86_64-apple-darwin/debug/spawn /usr/local/bin/spawn_intel
∴ cargo build --target=aarch64-apple-darwin --bin spawn
∴ file target/aarch64-apple-darwin/debug/spawn
target/aarch64-apple-darwin/debug/spawn: Mach-O 64-bit executable arm64
∴ mv target/x86_64-apple-darwin/debug/spawn /usr/local/bin/spawn_arm
∴ spawn_intel
thread 'main' panicked at src/bin/spawn.rs:7:30:
At least one arg needed

Great, two spawners on my path. Now, onto the “print my architecture” program:

fn main() {
    #[cfg(target_arch = "x86_64")]
    println!("Running as x86_64");
    #[cfg(target_arch = "aarch64")]
    println!("Running as arm64");
}

We’ll call that program printarch.

Now, let’s make a universal binary out of it. Cargo doesn’t natively support this yet, but the feature request thread for adding that ability has instructions on how to do it by hand.

First, we compile it for ARM and Intel:

∴ cargo build --target=aarch64-apple-darwin --bin printarch
   Compiling untitled v0.1.0 (/Users/zac/Desktop/Projects/Personal/interviewing)
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s
∴ cargo build --target=x86_64-apple-darwin --bin printarch
   Compiling untitled v0.1.0 (/Users/zac/Desktop/Projects/Personal/interviewing)
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s

Looks good. Now make a universal binary out of the two halves using the MacOS-supplied lipo tool:

∴ lipo -create -output printarch target/aarch64-apple-darwin/debug/printarch target/x86_64-apple-darwin/debug/printarch
(zbox.py.interviewing) zac@atropos ~/Desktop/Projects/Personal/interviewing ∴ file printarch
printarch: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
printarch (for architecture x86_64):	Mach-O 64-bit executable x86_64
printarch (for architecture arm64):	Mach-O 64-bit executable arm64
∴ mv printarch /usr/local/bin/printarch

Awesome. Now, does printarch accurately answer which architecture it’s run as?

∴ printarch
Running as arm64
∴ arch -arch arm64 ./getarch
Running as arm64
∴ arch -arch x86_64 ./getarch
Running as x86_64

Cool, that’s all our tools done. Let’s get to experimenting!

You might be wondering why I didn’t use the arch tool for this. It both can print out the architecture (just arch with no arguments), and can spawn child processes at the selected architecture via e.g. arch -arch arm64 my_child_process. Two reasons why I didn’t do that: for the arch-printing, arch actually can’t print its own arch when running under arch (that is, arch -arch arm64 arch fails), and for the spawning I explicitly don’t want to set the architecture of spawned processes every time; I want to observe what the default arch preferences are.

Experiments #

We know the “default” arch (presumably because it’s native, or because my shell’s ARM) is ARM. Let’s check the spawners:

zac@atropos ~ ∴ spawn_intel printarch
Running as x86_64
zac@atropos ~ ∴ spawn_arm printarch
Running as arm64

As expected. What about a sandwich of “intel -> arm -> universal”?

∴ spawn_intel spawn_arm printarch
Running as x86_64

That’s the bastard! The architecture preference is inherited from the grandparent, not the parent.

But I can force it back with arch, right?

∴ spawn_intel arch -arch arm64 printarch
Running as arm64

Right. Now onto the interesting stuff. Can arch replace that secret architecture preference behavior for all processes below it, or what? Let’s try “intel spawner -> arch to force arm64 -> arm spawner -> universal”:

∴ spawn_intel arch -arch arm64 spawn_arm printarch
Running as arm64

Huh. Okay, it seems like arch does reset the architecture preference. In other words, subprocesses looking “up” for a preference of what arch to launch as will look “up” until they find arch or …. what, exactly? It couldn’t be just “until they find arch or an intel binary”, right?

∴ arch -arch arm64 spawn_arm spawn_intel printarch
Running as x86_64

… right?

Okay, that’s just lame. And you really can’t reset it? Not even With Feeling (tm)?

∴ arch -arch arm64 spawn_arm spawn_intel spawn_arm spawn_arm spawn_arm spawn_arm spawn_arm spawn_arm printarch
Running as x86_64

Huh. Huh.

And just for completeness, “arch sandwiches” don’t change things?

∴ spawn_intel arch -arch arm64 spawn_arm spawn_intel arch -arch arm64 spawn_arm spawn_intel printarch
Running as x86_64

As expected, given the weird rules we’re operating under. What were those again?

Observed Rules #

This is an example of what we in the business refer to with the advanced software engineering term dumb as shit behavior.

In short, this is the asymmetry that’s confusing:

∴ spawn_intel spawn_arm printarch
Running as x86_64
∴ spawn_arm spawn_intel printarch
Running as x86_64

The rules for launching a universal binary on an ARM mac appear to be:

Search up the process tree, nearest/parent first.
1. If at PID 1, use its architecture (the native arch, unless someone really crazy is running launchd via Rosetta somehow).
2. If the PID is arch -arch $something, use $something architecture.
3. If the PID is an x86_64 binary, use x86_64.
4. Else, check the next PID up the tree.

Step 3 is the weird bit. Without step 3, I’d say that behavior is defensible{{ }}It’d also be fine if arch -arch overriding didn’t propagate and arch wrapping was needed each time, but I suspect that would have created lots of “calico” (and therefore either hard to debug or broken outright) environments back when ARM was new, when people used suites of multiple programs that ran in lots of shell/interpreter wrapping layers.{{ <\ sidenote> }}.

Is it just an `x86_64` thing? #

How about other non-native architectures? Unfortunately, the only non-native architecture supported in Rosetta 2 on an M1 Mac is x86_64, so running a 32-bit x86 program fails no matter how it is launched. Similarly, 32-bit ARM has never been supported on any Mac as far as I know, so that’s right out.

I could test this behavior on older environments:

Rust toolchains built against system frameworks pre-XCode-13
Pre-MacOS-11 environments
A 64-bit Intel Mac running a Rust toolchain from before the removal of 32-bit Intel targets to see if x86/x86_64 mismatches trigger the same weirdness

…but I’m not gonna. Because lazy. Also, those are rarer environments than my relatively mainstream M1 Mac. Unlike the hardware mismatch between ARM and Intel, the rarity of each environment can be resolved with updates, so their presence will diminish much faster with time than the existence of x86_64 Macs and x86_64 binaries.

Other methods of overriding arch-selection: ARCHPREFERENCE #

man arch documents the ARCHPREFERENCE environment variable, which can be used to override architecture launch-attempt precedence for universal binaries. At first I was confused why this env variable was not working, but upon closer reading of the manpage it is only used in the presence of arch-the-command. So this works:

∴ ARCHPREFERENCE=x86_64,arm64 arch printarch
Running as x86_64

…but this doesn’t:

∴ ARCHPREFERENCE=x86_64,arm64 printarch
Running as arm64

I wonder which takes higher precedence, the presence of an x86_64 binary in the parent-PID chain or the environment variable?

∴ ARCHPREFERENCE=arm64,x86_64 arch spawn_intel printarch
Running as x86_64

Okay, Intel binaries take precedence over … uh … the precedence. They don’t poke “through” arch, presumably?

∴ ARCHPREFERENCE=arm64,x86_64 spawn_intel arch printarch
Running as arm64

Okay, so the above theory is still sound. ARCHPREFERENCE only affects the behavior of the arch command, which doesn’t seem to taint the PID chain for (or against) Intel any differently than our home-rolled spawn_intel does. Just to be double sure of that, let’s make a universal spawner, run it as intel, and make sure that pollutes the spawn tree in the same way:

∴ lipo -create -output spawn_universal $(which spawn_arm) $(which spawn_intel)
∴ arch -arch x86_64 spawn_universal printarch
Running as x86_64
∴ arch -arch x86_64 spawn_universal spawn_arm printarch
Running as x86_64
∴ arch -arch x86_64 spawn_universal arch -arm64 spawn_arm printarch
Running as arm64

Blech. Okay.

Other methods of overriding arch-selection: Symlinks to `arch` #

There’s a third way of overriding arch externally (according to the manpage):

Rename your Universal executable or move it out of your PATH.
Make a .plist file whose name matches your original binary’s prefix in one of several locations, I chose ~/Library/archSettings/printarch.plist. That .plist should contain a precedence-list of architecture-launch values, as well as the full path to your new/renamed universal binary, like this:

<plist version="1.0">
<dict>
   <key>ExecutablePath</key>
   <string>/path/to/renamed/or/moved/printarch</string>
   <key>PreferredOrder</key>
   <array>
           <string>x86_64</string>
           <string>arm64</string>
   </array>
   <key>PropertyListVersion</key>
   <string>1.0</string>
</dict>
</plist>

Make a symlink from the arch binary itself to somewhere on your PATH with the same name as your original binary, e.g. ln -s $(which arch) /usr/local/bin/printarch.

This works, and always runs printarch in x86_64 mode:

∴ printarch
Running as x86_64
∴ spawn_arm printarch
Running as x86_64
∴ arch -arch arm64 printarch
arch: posix_spawnattr_setbinpref_np only copied 4 of 5

At first I was confused as to how to avoid the error in the last command and force the .plist-overridden command back to ARM, especially considering I had copied the .plist file verbatim (changing only ExecutablePath) from the example in man arch. However, upon re-reading that same manpage, I saw the following architecture list:

i386     32-bit intel
x86_64   64-bit intel
x86_64h  64-bit intel (haswell)
arm64    64-bit arm
arm64e   64-bit arm (Apple Silicon)

Even though at least one of those is totally unavailable on my system, and even though the example .plist contains <string>arm64</string>, and even though literally nothing else in any component of this investigation cares about the distinction between arm64 and arm64e, I figured I’d try that, because it said Apple Silicon in big friendl letters, and because why not?

After changing <string>arm64</string> to <string>arm64e</string> in ~/Library/archSettings/printarch.plist, I get:

∴ arch -arch arm64 printarch
Running as arm64

So the explicit commandline flag takes precedence over the .plist file, so long as the .plist file says arm64e.

Just for added confusion and hilarity, it turns out that (regardless of what the .plist says), specifying arm64e on the commandline is just silently ignored{{ }}It may be that rustc just won’t emit programs at the special arm64e architecture, and that if I pistol-whipped the tier 3 support for arm64e into place some of this confusion would disappear. But I’m not holding my breath.{{ </ sidenote >}}, even though that posix_spawnattr_setbinpref_np error occurs if it’s not set in the .plist:

arch -arch arm64e printarch
Running as x86_64

I am … not a fan of this arch-overriding-at-a-distance system. Fundamentally, the pattern of “stash or rename your binary and then make a symlink to a magic system program in its place, which indirects architecture selection via a specially-named magic prefs file” is extremely unusual, and frustrating due to its PATH-dependence. And that’s before we consider the fact that the tooling is inconsistent with regards to architecture naming (hell, triply inconsistent if we count both the arm64/arm64e annoyance and the broadly silly arm64/aarch64 Linux/Mac nomenclatural fracas), the documentation is sparse, and when it exists it contains ambiguous verbiage, nonexistent options for the hardware, and incorrect examples.

Better alternatives exist: xattrs on binaries or directories containing them, environment variables that take effect on any exec (not just the one performed by arch), or the placement of binaries on paths rooted in specific (hardcoded or system-wide preference-specified) directories would all have been simpler and easier-to-understand ways of addressing the problem of persistent architecture-selection overrides.

Finding Proof #

Anyway, back to the core bad behavior, the “tainting” of process trees by the presence of a single x86_64 binary.

Nothing in the arch manual discusses this behavior. Similarly, the terse Rosetta 2 developer docs don’t discuss it either. There’s some additional discussion of setting architecture priority and forcing Rosetta 2 on or off in the Universal binary docs, but all of that seems to deal with yet another system for architecture-preference-overrides–this time one that only works for .app bundles rather than free-floating binaries. That system seems likely to be the one that drives the “Open using Rosetta” Finder checkbox in the info menu of .apps. That’s not relevant to our CLI-only struggles, though.

So is that it? We settle for observed-behavior and call this theory confirmed? It’s not like we can check the source code of a closed-source OS, right?

…right?

Into the XNUexus #

Thanks to a sterling comment on my questions related to this on StackOverflow, I was directed to the MacOS kernel source–which is open, despite most of the MacOS userland and drivers being closed source. In addition to the complex legal history of Mach, XNU, and Darwin making this openness possible, I also think it makes business sense: Apple’s competitive edge is largely to be found in their hardware and their GUI/Application userland. As a result, their drivers and much of userspace are closed-source. But is Apple in any way competitively disadvantaged from putting (or, more accurately, leaving) their kernel under a FOSS licence? I don’t think so.

Reading through the docs, I found that when arch(2) launches the program it wraps, at the libc level, the functions posix_spawnattr_getbinpref_np(3) and posix_spawnattr_setbinpref_np(3) are being called before exec(3)/posix_spawn(2) to modify MacOS-specific process-launch flags. However, those functions’ documentation doesn’t say anything about where a process’s default value for spawn-binpref comes from.