TODO: backtick x86 arch to arch(10284) man number
title: “The Architecture of Sadness: Determining the Subprocess Architecture Rules for MacOS Rosetta” date: 2024-02-10 categories: [‘Troubleshooting’, ‘Guides’] draft: true meta: true description: TODO #
Note: In this post, x86 and “Intel” are synonymous with the x86_64
CPU architecture compiler identifier code. arm64
, “ARM”, and “M1”/“M2” (in the context of architectures) are synonymous with the aarch64
identifier code.
Recap #
Awhile back, I wrote a post about my journey to diagnosing an issue regarding dtrace
when spawned by a cargo
plugin on MacOS.
In short, what I found was that a workstation-provisioning tool I used had installed cargo
as an x86-only binary. When cargo spawned its plugin subprocess, that plugin subprocess spawned dtrace
(a universal binary) as x86. An issue occurred when dtrace
tried to attach to a traced subprocess that was ARM-only, which makes sense given that DTrace was running as x86.
Copied from that post, the stack of subprocesses and compile/runtime architectures involved look like this:
my shell (zsh): ARM-only, run as ARM
\_ cargo: x86-only, run as x86
\_ ~/.cargo/bin/flamegraph: ARM-only, run as arm
\_ sudo: universal, run as x86
\_ dtrace: universal, run as x86
\_ program being profiled, ARM-only, run as ARM
Fixing the issue involved doing either one of:
- Installing
cargo
as an ARM binary instead of Intel. I did this as a short-term fix and everything worked. - Updating the code of
~/.cargo/bin/flamegraph
to to runarch -arch arm64 sudo dtrace ...
instead of justsudo dtrace
, which would “force” the architecture of the spawned program to ARM. I put up a PR to do this inflamegraph
, which worked when I tested it–even when using an x86-onlycargo
.
That brings us to the present. Hypothesis obtained and tested, fix implemented. Case closed.
Just One More Thing #
Hang on a sec, though. Something doesn’t quite add up here. Like, observed behavior doesn’t lie, sure … but the theory as to why this behavior is occurring isn’t consistent. I was assuming that the architecture “launch preference” of subprocesses is inherited from the architecture that their parent processes are run as, as if it were an environment variable.
But that’s clearly not happening, since the flamegraph
program is and always has been ARM-only on my system. So the preference is clearly being inherited by sudo
/dtrace
from cargo
, being passed down from the grandparent cargo process.
And if architecture preference is inherited from the grandparent, why isn’t it inherited from the great-grandparent, my ARM-only shell?
zac@atropos ~ ∴ file $SHELL
/opt/homebrew/bin/zsh: Mach-O 64-bit executable arm64
What are the actual rules here?
In this post, I’ll document my journey to enumerating the MacOS subprocess architecture preference rules both experimentally and via MacOS sources.
Assembling the Tools #
To figure out what the behavior is here, I want three programs:
- A program that spawns as a subprocess whatever arguments are passed to it, compiled as x86-only.
- The same program, compiled as ARM-only.
- A program that prints out the architecture that it’s being run as, compiled as a universal binary.
Before I start writing code, I need to make sure I can build Intel binaries in the first place. To do that, I install a Rust toolchain for x86:
∴ rustup target add x86_64-apple-darwin
info: downloading component 'rust-std' for 'x86_64-apple-darwin'
info: installing component 'rust-std' for 'x86_64-apple-darwin'
I remember fifteen years ago when cross compiling required hours of work, arcane knowledge, and lots of luck. Sure is easier these days! Now let’s write some code.
First, the spawner:
use std::env;
use std::os::unix::process::ExitStatusExt;
use std::process::{exit, Command};
fn main() {
// Skip the first argument, the name of *this* program.
let mut args = env::args().skip(1);
let exitstatus = Command::new(args.next().expect("At least one arg needed"))
.args(args)
.spawn()
.unwrap()
.wait()
.unwrap();
exit(exitstatus.into_raw());
}
Does it work?
cargo run --bin spawn echo foo bar
Compiling untitled v0.1.0 (/Users/zac/Desktop/Projects/Personal/interviewing)
Finished dev [unoptimized + debuginfo] target(s) in 0.14s
Running `target/debug/spawn echo foo bar`
foo bar
Sweet! Let’s build two of them: spawn_arm
and spawn_intel
, where the names correspond to the architecture which the spawner is compiled for rather than the architecture it necessarily spawns processes as.
∴ cargo build --target=x86_64-apple-darwin --bin spawn
∴ file target/x86_64-apple-darwin/debug/spawn
target/x86_64-apple-darwin/debug/spawn: Mach-O 64-bit executable x86_64
∴ mv target/x86_64-apple-darwin/debug/spawn /usr/local/bin/spawn_intel
∴ cargo build --target=aarch64-apple-darwin --bin spawn
∴ file target/aarch64-apple-darwin/debug/spawn
target/aarch64-apple-darwin/debug/spawn: Mach-O 64-bit executable arm64
∴ mv target/x86_64-apple-darwin/debug/spawn /usr/local/bin/spawn_arm
∴ spawn_intel
thread 'main' panicked at src/bin/spawn.rs:7:30:
At least one arg needed
Great, two spawners on my path. Now, onto the “print my architecture” program:
fn main() {
#[cfg(target_arch = "x86_64")]
println!("Running as x86_64");
#[cfg(target_arch = "aarch64")]
println!("Running as arm64");
}
We’ll call that program printarch
.
Now, let’s make a universal binary out of it. Cargo doesn’t natively support this yet, but the feature request thread for adding that ability has instructions on how to do it by hand.
First, we compile it for ARM and Intel:
∴ cargo build --target=aarch64-apple-darwin --bin printarch
Compiling untitled v0.1.0 (/Users/zac/Desktop/Projects/Personal/interviewing)
Finished dev [unoptimized + debuginfo] target(s) in 0.12s
∴ cargo build --target=x86_64-apple-darwin --bin printarch
Compiling untitled v0.1.0 (/Users/zac/Desktop/Projects/Personal/interviewing)
Finished dev [unoptimized + debuginfo] target(s) in 0.12s
Looks good. Now make a universal binary out of the two halves using the MacOS-supplied lipo
tool:
∴ lipo -create -output printarch target/aarch64-apple-darwin/debug/printarch target/x86_64-apple-darwin/debug/printarch
(zbox.py.interviewing) zac@atropos ~/Desktop/Projects/Personal/interviewing ∴ file printarch
printarch: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
printarch (for architecture x86_64): Mach-O 64-bit executable x86_64
printarch (for architecture arm64): Mach-O 64-bit executable arm64
∴ mv printarch /usr/local/bin/printarch
Awesome. Now, does printarch
accurately answer which architecture it’s run as?
∴ printarch
Running as arm64
∴ arch -arch arm64 ./getarch
Running as arm64
∴ arch -arch x86_64 ./getarch
Running as x86_64
Cool, that’s all our tools done. Let’s get to experimenting!
You might be wondering why I didn’t use the
arch
tool for this. It both can print out the architecture (justarch
with no arguments), and can spawn child processes at the selected architecture via e.g.arch -arch arm64 my_child_process
. Two reasons why I didn’t do that: for the arch-printing,arch
actually can’t print its own arch when running underarch
(that is,arch -arch arm64 arch
fails), and for the spawning I explicitly don’t want to set the architecture of spawned processes every time; I want to observe what the default arch preferences are.
Experiments #
We know the “default” arch (presumably because it’s native, or because my shell’s ARM) is ARM. Let’s check the spawners:
zac@atropos ~ ∴ spawn_intel printarch
Running as x86_64
zac@atropos ~ ∴ spawn_arm printarch
Running as arm64
As expected. What about a sandwich of “intel -> arm -> universal”?
∴ spawn_intel spawn_arm printarch
Running as x86_64
That’s the bastard! The architecture preference is inherited from the grandparent, not the parent.
But I can force it back with arch
, right?
∴ spawn_intel arch -arch arm64 printarch
Running as arm64
Right. Now onto the interesting stuff. Can arch
replace that secret architecture preference behavior for all processes below it, or what? Let’s try “intel spawner -> arch to force arm64 -> arm spawner -> universal”:
∴ spawn_intel arch -arch arm64 spawn_arm printarch
Running as arm64
Huh. Okay, it seems like arch
does reset the architecture preference. In other words, subprocesses looking “up” for a preference of what arch to launch as will look “up” until they find arch
or …. what, exactly? It couldn’t be just “until they find arch
or an intel binary”, right?
∴ arch -arch arm64 spawn_arm spawn_intel printarch
Running as x86_64
… right?
Okay, that’s just lame. And you really can’t reset it? Not even With Feeling (tm)?
∴ arch -arch arm64 spawn_arm spawn_intel spawn_arm spawn_arm spawn_arm spawn_arm spawn_arm spawn_arm printarch
Running as x86_64
Huh. Huh.
And just for completeness, “arch
sandwiches” don’t change things?
∴ spawn_intel arch -arch arm64 spawn_arm spawn_intel arch -arch arm64 spawn_arm spawn_intel printarch
Running as x86_64
As expected, given the weird rules we’re operating under. What were those again?
Observed Rules #
This is an example of what we in the business refer to with the advanced software engineering term dumb as shit behavior
.
In short, this is the asymmetry that’s confusing:
∴ spawn_intel spawn_arm printarch
Running as x86_64
∴ spawn_arm spawn_intel printarch
Running as x86_64
The rules for launching a universal
binary on an ARM mac appear to be:
- Search up the process tree, nearest/parent first.
- If at PID 1, use its architecture (the native arch, unless someone really crazy is running launchd via Rosetta somehow).
- If the PID is
arch -arch $something
, use$something
architecture. - If the PID is an
x86_64
binary, usex86_64
. - Else, check the next PID up the tree.
Step 3 is the weird bit. Without step 3, I’d say that behavior is defensible{{ arch -arch
overriding didn’t propagate and arch
wrapping was needed each time, but I suspect that would have created lots of “calico” (and therefore either hard to debug or broken outright) environments back when ARM was new, when people used suites of multiple programs that ran in lots of shell/interpreter wrapping layers.{{ <\ sidenote> }}.
Is it just an x86_64
thing?
#
How about other non-native architectures? Unfortunately, the only non-native architecture supported in Rosetta 2 on an M1 Mac is x86_64
, so running a 32-bit x86 program fails no matter how it is launched. Similarly, 32-bit ARM has never been supported on any Mac as far as I know, so that’s right out.
I could test this behavior on older environments:
- Rust toolchains built against system frameworks pre-XCode-13
- Pre-MacOS-11 environments
- A 64-bit Intel Mac running a Rust toolchain from before the removal of 32-bit Intel targets to see if
x86
/x86_64
mismatches trigger the same weirdness
…but I’m not gonna. Because lazy. Also, those are rarer environments than my relatively mainstream M1 Mac. Unlike the hardware mismatch between ARM and Intel, the rarity of each environment can be resolved with updates, so their presence will diminish much faster with time than the existence of x86_64
Macs and x86_64
binaries.
Other methods of overriding arch-selection: ARCHPREFERENCE #
man arch
documents the ARCHPREFERENCE
environment variable, which can be used to override architecture launch-attempt precedence for universal binaries. At first I was confused why this env variable was not working, but upon closer reading of the manpage it is only used in the presence of arch
-the-command. So this works:
∴ ARCHPREFERENCE=x86_64,arm64 arch printarch
Running as x86_64
…but this doesn’t:
∴ ARCHPREFERENCE=x86_64,arm64 printarch
Running as arm64
I wonder which takes higher precedence, the presence of an x86_64
binary in the parent-PID chain or the environment variable?
∴ ARCHPREFERENCE=arm64,x86_64 arch spawn_intel printarch
Running as x86_64
Okay, Intel binaries take precedence over … uh … the precedence. They don’t poke “through” arch
, presumably?
∴ ARCHPREFERENCE=arm64,x86_64 spawn_intel arch printarch
Running as arm64
Okay, so the above theory is still sound. ARCHPREFERENCE
only affects the behavior of the arch
command, which doesn’t seem to taint the PID chain for (or against) Intel any differently than our home-rolled spawn_intel
does. Just to be double sure of that, let’s make a universal spawner, run it as intel, and make sure that pollutes the spawn tree in the same way:
∴ lipo -create -output spawn_universal $(which spawn_arm) $(which spawn_intel)
∴ arch -arch x86_64 spawn_universal printarch
Running as x86_64
∴ arch -arch x86_64 spawn_universal spawn_arm printarch
Running as x86_64
∴ arch -arch x86_64 spawn_universal arch -arm64 spawn_arm printarch
Running as arm64
Blech. Okay.
Other methods of overriding arch-selection: Symlinks to arch
#
There’s a third way of overriding arch
externally (according to the manpage):
- Rename your Universal executable or move it out of your
PATH
. - Make a
.plist
file whose name matches your original binary’s prefix in one of several locations, I chose~/Library/archSettings/printarch.plist
. That.plist
should contain a precedence-list of architecture-launch values, as well as the full path to your new/renamed universal binary, like this:
<plist version="1.0">
<dict>
<key>ExecutablePath</key>
<string>/path/to/renamed/or/moved/printarch</string>
<key>PreferredOrder</key>
<array>
<string>x86_64</string>
<string>arm64</string>
</array>
<key>PropertyListVersion</key>
<string>1.0</string>
</dict>
</plist>
- Make a symlink from the
arch
binary itself to somewhere on yourPATH
with the same name as your original binary, e.g.ln -s $(which arch) /usr/local/bin/printarch
.
This works, and always runs printarch
in x86_64
mode:
∴ printarch
Running as x86_64
∴ spawn_arm printarch
Running as x86_64
∴ arch -arch arm64 printarch
arch: posix_spawnattr_setbinpref_np only copied 4 of 5
At first I was confused as to how to avoid the error in the last command and force the .plist
-overridden command back to ARM, especially considering I had copied the .plist
file verbatim (changing only ExecutablePath
) from the example in man arch
. However, upon re-reading that same manpage, I saw the following architecture list:
i386 32-bit intel
x86_64 64-bit intel
x86_64h 64-bit intel (haswell)
arm64 64-bit arm
arm64e 64-bit arm (Apple Silicon)
Even though at least one of those is totally unavailable on my system, and even though the example .plist
contains <string>arm64</string>
, and even though literally nothing else in any component of this investigation cares about the distinction between arm64
and arm64e
, I figured I’d try that, because it said Apple Silicon
in big friendl letters, and because why not?
After changing <string>arm64</string>
to <string>arm64e</string>
in ~/Library/archSettings/printarch.plist
, I get:
∴ arch -arch arm64 printarch
Running as arm64
So the explicit commandline flag takes precedence over the .plist
file, so long as the .plist
file says arm64e
.
Just for added confusion and hilarity, it turns out that (regardless of what the .plist
says), specifying arm64e
on the commandline is just silently ignored{{ rustc
just won’t emit programs at the special arm64e
architecture, and that if I pistol-whipped the tier 3 support for arm64e
into place some of this confusion would disappear. But I’m not holding my breath.{{ </ sidenote >}}, even though that posix_spawnattr_setbinpref_np
error occurs if it’s not set in the .plist
:
arch -arch arm64e printarch
Running as x86_64
I am … not a fan of this arch-overriding-at-a-distance system. Fundamentally, the pattern of “stash or rename your binary and then make a symlink to a magic system program in its place, which indirects architecture selection via a specially-named magic prefs file” is extremely unusual, and frustrating due to its PATH
-dependence. And that’s before we consider the fact that the tooling is inconsistent with regards to architecture naming (hell, triply inconsistent if we count both the arm64
/arm64e
annoyance and the broadly silly arm64
/aarch64
Linux/Mac nomenclatural fracas), the documentation is sparse, and when it exists it contains ambiguous verbiage, nonexistent options for the hardware, and incorrect examples.
Better alternatives exist: xattrs
on binaries or directories containing them, environment variables that take effect on any exec
(not just the one performed by arch
), or the placement of binaries on paths rooted in specific (hardcoded or system-wide preference-specified) directories would all have been simpler and easier-to-understand ways of addressing the problem of persistent architecture-selection overrides.
Finding Proof #
Anyway, back to the core bad behavior, the “tainting” of process trees by the presence of a single x86_64
binary.
Nothing in the arch
manual discusses this behavior. Similarly, the terse Rosetta 2 developer docs don’t discuss it either. There’s some additional discussion of setting architecture priority and forcing Rosetta 2 on or off in the Universal binary docs, but all of that seems to deal with yet another system for architecture-preference-overrides–this time one that only works for .app
bundles rather than free-floating binaries. That system seems likely to be the one that drives the “Open using Rosetta” Finder checkbox in the info menu of .app
s. That’s not relevant to our CLI-only struggles, though.
So is that it? We settle for observed-behavior and call this theory confirmed? It’s not like we can check the source code of a closed-source OS, right?
…right?
Into the XNUexus #
Thanks to a sterling comment on my questions related to this on StackOverflow, I was directed to the MacOS kernel source–which is open, despite most of the MacOS userland and drivers being closed source. In addition to the complex legal history of Mach, XNU, and Darwin making this openness possible, I also think it makes business sense: Apple’s competitive edge is largely to be found in their hardware and their GUI/Application userland. As a result, their drivers and much of userspace are closed-source. But is Apple in any way competitively disadvantaged from putting (or, more accurately, leaving) their kernel under a FOSS licence? I don’t think so.
Reading through the docs, I found that when arch(2)
launches the program it wraps, at the libc
level, the functions posix_spawnattr_getbinpref_np(3)
and posix_spawnattr_setbinpref_np(3)
are being called before exec(3)
/posix_spawn(2)
to modify MacOS-specific process-launch flags. However, those functions’ documentation doesn’t say anything about where a process’s default value for spawn-binpref
comes from.