The Architecture of Annoyance: Diagnosing issues involving Cargo and DTrace on MacOS
This post is a lengthy narration of some debugging/resolution I did involving MacOS, dtrace
, Rust, and cross-architecture binary execution. It has deliberately not been edited for brevity by removing red herrings (threads I pulled on that turned out to be unrelated), both because those other threads might be useful to some readers, and because I think it is important to explain your mistakes as well as your successes.
If you’re having a similar issue to the one discussed here and just want it fixed, jump to the tl;dr at the end.
The problem #
I wanted to use flamegraph-rs to profile a program on MacOS 13. When I followed the quick-start guide, I ended up with an error:
> cargo install flamegraph
> cargo flamegraph --root --bin profile-me
Finished release [optimized + debuginfo] target(s) in 0.00s
dtrace: invalid probe specifier profile-997 /pid == $target/ { @[ustack(100)] = count(); }: "/usr/lib/dtrace/darwin.d", line 26: syntax error near "uthread_t"
failed to sample program
That was annoying. I spend too much a lot of time customizing my pathing and shell environment, so I hypothesized that it might be a misconfiguration of my laptop. No such luck; after backing out changes and trying with a more-or-less pristine shell profile, the issue remained.
SIP #
I then double checked my SIP status. I’ve used dtrace
(which flamegraph-rs
is relying on for its profiling ingest) before on this laptop, and know that it needs certain parts of SIP disabled (at least, it does in order to work fully with all probes and with all binaries on the host), so I had done that previously. Had it been subsequently re-enabled due to an OS reinstall or upgrade?
> csrutil status
System Integrity Protection status: disabled.
Nope, still disabled.
Research: DTrace mach_kernel
symbols
#
Next, I tried Google. invalid probe specifier profile dtrace
turned up some promising issues (1, 2) with flamegraph-rs
. Those seemed to be related to the flamegraph
CLI, and mitigateable by passing certain arguments to bypass buggy code. No such luck, the error remained.
syntax error near "uthread_t"
yielded a couple of hits, including one person on the hellsite having the same issue. No fixes, though.
Giving up on a path that wasn’t returning more than handfuls of search results per query, I opened /usr/lib/dtrace/darwin.d
. This appears to be the dtrace stdlib equivalent for MacOS. While I don’t understand the syntax in detail, the code looked relatively straightforward. The error was on line 26, right at the top of the very large file:
> sed -n 26p /usr/lib/dtrace/darwin.d
inline uthread_t uthread = (mach_kernel`uthread_t)(&curthread[1]); /* ` */
That appeared to be looking up a symbol in mach_kernel
’s … namespace? Import? Library? Not sure of the DTrace-specific term here, but that definitely seemed to be addressing the uthread_t
symbol in the mach_kernel
container, which was not defined in /usr/lib/dtrace/darwin.d
.
As a brief digression: at this point I noticed that both flamegraph-rs
and my Rust distribution were a lot newer than that file, so I figured it couldn’t hurt to upgrade MacOS to Sonoma, version 14. I didn’t expect this would resolve the issue, but I upgraded anyway just to be on the safe side and get most of the tooling onto versions of platform components that it had likely been developed and tested against.
The upgrade went without issue, and sure enough, the problem remained. Back to the DTrace files. Googling dtrace mach_kernel uthread_t
turned up a blast from the past forum thread from the bad old OSX 10.5 hackintosh days. There was a time when I visited insanelymac forums daily to try to stabilize my highly questionable OSX-where-it-didn’t-belong installations.
Anyway, the thread indicated that mach_kernel
referenced in DTrace’s stdlib was indeed a sort of intrinsic import (aka what would be in the prelude in Rust, or builtins in Python). DTrace allows you to list all such available symbols:
> sudo dtrace -l | wc -l
482916
Neat! Do any of them contain mach_kernel
?
> sudo dtrace -l | grep mach_kernel | wc -l
0
The forum users in 2007 indicated that they had resolved missing-symbol issues by placing mach_kernel.ctfsys
on their filesystem such that the symbols could be resolved. Did I have that?
> sudo find / | grep mach_kernel.ctfsys | wc -l
0
Nope. OK Google, how do I get it? Yet again, lots of results from the early 2000s. Some folks indicated that if you’d deleted that file you shouldn’t reboot your Mac or it would fail to boot because you had ripped out its heart removed its kernel. Others indicated that you had to reveal the hidden file. But none of their suggestions worked for me–no surprise, given that nearly two decades of OS changes had taken place singe those suggestions were authored.
So why weren’t the kernel symbols available? Searching down that rabbit hole led me to this thread, which tersely discusses DTrace mach_kernel
symbols’ absence on an older version of MacOS, Monterey. Apparently that version of MacOS stopped shipping with some developer tools that contained kernel symbols for import use by DTrace.
Despite the old version of MacOS discussed, this firmed up my hypothesis that MacOS at some point stopped automatically including libraries/headers needed for DTrace to function in some important ways, and probably still did not include them as of MacOS 13/14. Apparently those files are available in the “Kernel Debug Kit”, or KDK, package from Apple. That package requires an Apple developer account (the free kind, not the $100 kind) to download and install, which is annoying–why can’t it be included with the XCode Command Line Tools, or the full distribution of XCode? Regardless, this was still an easy solution to try.
I created a MacOS developer account and installed the KDK, then tried symbol enumeration again:
> sudo dtrace -l | grep mach_kernel | wc -l
7018
That looked a lot better. Did flamegraph-rs
work?
> cargo flamegraph --root --bin profile-me
Finished release [optimized + debuginfo] target(s) in 0.01s
dtrace: invalid probe specifier profile-997 /pid == $target/ { @[ustack(100)] = count(); }: "/usr/lib/dtrace/darwin.d", line 26: syntax error near "uthread_t"
failed to sample program
Nope. That was frustrating. Was DTrace only partially able to resolve those symbols? Did it need some sort of update to its search path?
Peeling away layers: direct DTrace invocation #
It was time to get more info about how flamegraph-rs
is invoking DTrace. Maybe verbose mode would help?
> cargo flamegraph --verbose --root --bin profile-me
build command: "cargo" "build" "--release" "--bin" "profile-me" "--message-format=json-render-diagnostics"
Finished release [optimized + debuginfo] target(s) in 0.00s
command "sudo" "dtrace" "-x" "ustackframes=100" "-n" "profile-997 /pid == $target/ { @[ustack(100)] = count(); }" "-o" "cargo-flamegraph.stacks" "-c" "/Users/zac/profile_me/target/release/profile-me"
dtrace: invalid probe specifier profile-997 /pid == $target/ { @[ustack(100)] = count(); }: "/usr/lib/dtrace/darwin.d", line 26: syntax error near "uthread_t"
failed to sample program
Nice, a copy of the sudo dtrace ...
command it was running internally. I tried that command manually:
> sudo dtrace -x ustackframes=100 -n 'profile-997 /pid == $target/ { @[ustack(100)] = count(); }' -o cargo-flamegraph.stacks -c /Users/zac/profile_me/target/release/profile-me
dtrace: description 'profile-997 ' matched 1 probe
123
dtrace: pid 10042 has exited
Now that was interesting. It worked correctly when invoked manually, but failed when run via cargo flamegraph
. This started feeling a lot more like the first thing I suspected about customized shell environment getting in the way, and less like an issue around obscure kernel symbol packages (spoiler: the KDK/kernel symbols were indeed a red herring. See the bottom of this post for a tl;dr on how to fix this issue).
What’s the difference? #
I looked around the flamegraph-rs
source and didn’t see it obviously messing with any env variables for the std::process::Command
object through which it spawned sudo dtrace
, so it theoretically should have been invoked more or less the same way as my sudo dtrace
shell command. There were a few ways I could think of that cargo flamegraph
’s invocation of dtrace
could be different from my own:
sudo dtrace
was invoked fromcargo
, a program other than an interactive shell.sudo dtrace
was invoked in a directory different from the place I ran it on my shell.sudo dtrace
was invoked with a customized environment and my cursory searching theflamegraph-rs
source code had missed where that was happening.sudo
did something weird/different when invoked via a program other than an interactive shell, and that different propagated to thesudo
-invokeddtrace
program and messed it up.- Something else (sandboxing,
chroot
ing, setuid, non-CLOEXEC
file descriptors passed to the spawned process, and so on).
Does invoking sudo dtrace
from something other than a shell cause issues?
#
I tackled those in order. For the first one, I wrote a simple Rust program that ran the command verbatim, largely copied from the flamegraph-rs
source:
use std::process::Command;
fn main() {
let mut command = Command::new("sudo");
command.arg("dtrace");
let dtrace_script = format!(
"profile-{} /pid == $target/ \
{{ @[ustack(100)] = count(); }}",
997
);
command.arg("-x");
command.arg("ustackframes=100");
command.arg("-n");
command.arg(&dtrace_script);
command.arg("-o");
command.arg("cargo-flamegraph.stacks");
command.arg("-c");
command.arg("/Users/zac/profile_me/target/release/profile-me");
let exit_status = command.spawn().unwrap().wait.unwrap();
println!("Success? {}", exit_status.success());
}
Sure enough, that repro’d:
> cargo run --bin dtrace-spawner
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/dtrace-spawner`
dtrace: invalid probe specifier profile-997 /pid == $target/ { @[ustack(100)] = count(); }: "/usr/lib/dtrace/darwin.d", line 26: syntax error near "uthread_t"
Success? false
What the hell is special about cargo
?
#
I’d removed the flamegraph-rs
layer of wrapping. How about cargo
itself? I ran the program that spawned dtrace
manually:
> ./target/debug/dtrace-spawner
dtrace: description 'profile-997 ' matched 1 probe
123
dtrace: pid 20984 has exited
Success? true
Ah ha! Something about cargo
’s internal invocation machinery (of cargo plugins or of regular package binaries) was the culprit.
A different environment #
cargo
verbose mode, tell me what you’re doing?
> cargo run -vv --bin dtrace-spawner
...irrelevant
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `CARGO=/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/cargo CARGO_MANIFEST_DIR=/path/to/project CARGO_PKG_AUTHORS='' CARGO_PKG_DESCRIPTION='' CARGO_PKG_HOMEPAGE='' CARGO_PKG_LICENSE='' CARGO_PKG_LICENSE_FILE='' CARGO_PKG_NAME=untitled CARGO_PKG_README='' CARGO_PKG_REPOSITORY='' CARGO_PKG_RUST_VERSION='' CARGO_PKG_VERSION=0.1.0 CARGO_PKG_VERSION_MAJOR=0 CARGO_PKG_VERSION_MINOR=1 CARGO_PKG_VERSION_PATCH=0 CARGO_PKG_VERSION_PRE='' DYLD_FALLBACK_LIBRARY_PATH='/path/to/project/target/debug/deps:/path/to/project/target/debug:/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/aarch64-apple-darwin/lib:/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/lib:/Users/zac/lib:/usr/local/lib:/usr/lib' target/debug/dtrace-spawner`
dtrace: invalid probe specifier profile-997 /pid == $target/ { @[ustack(100)] = count(); }: "/usr/lib/dtrace/darwin.d", line 26: syntax error near "uthread_t"
Success? false
Okay, it was definitely setting a bunch of env variables that could change subcommand behavior. Most of them looked related to Cargo itself, though DYLD_FALLBACK_LIBRARY_PATH
looked suspicious. Before I got into that, though, I wanted to know: was cargo allowing pre-existing env variables from my shell through, or was it running the internal command with an empty env (as if it were wrapped in env -i
)?
I updated my program’s main
to dump the env (sure, I could have used std::env
, but this was quicker):
let mut command = Command::new("env");
command.spawn().unwrap().wait().unwrap();
Then I compared the output of that command when run on its own and inside cargo run
:
> diff <(cargo run --quiet --bin print-env | sort) <(./target/debug/print-env | sort)
1,17d0
< CARGO=/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/cargo
< CARGO_HOME=/Users/zac/.cargo
< CARGO_MANIFEST_DIR=/path/to/project
< CARGO_PKG_AUTHORS=
< CARGO_PKG_DESCRIPTION=
< CARGO_PKG_HOMEPAGE=
< CARGO_PKG_LICENSE=
< CARGO_PKG_LICENSE_FILE=
< CARGO_PKG_NAME=untitled
< CARGO_PKG_README=
< CARGO_PKG_REPOSITORY=
< CARGO_PKG_RUST_VERSION=
< CARGO_PKG_VERSION=0.1.0
< CARGO_PKG_VERSION_MAJOR=0
< CARGO_PKG_VERSION_MINOR=1
< CARGO_PKG_VERSION_PATCH=0
< CARGO_PKG_VERSION_PRE=
24d6
< DYLD_FALLBACK_LIBRARY_PATH=/path/to/project/target/debug/deps:/path/to/project/target/debug:/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/aarch64-apple-darwin/lib:/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/lib:/Users/zac/lib:/usr/local/lib:/usr/lib
62,64d43
< RUSTUP_HOME=/Users/zac/.rustup
< RUSTUP_TOOLCHAIN=nightly-aarch64-apple-darwin
< RUST_RECURSION_COUNT=1
84c63
< __CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0
---
> __CF_USER_TEXT_ENCODING=0x0:0:0
Okay, that confirmed that cargo run
wasn’t internally stripping things out of the environment. I was actually a little bit surprised by that, since I could see a build system wanting to maximize determinism in that way. Regardless, my leading suspect for the issue was now the fact that cargo run
was setting DYLD_FALLBACK_LIBRARY_PATH
, where my regular shell didn’t have it set at all. Could that library search path be affecting how and where dtrace
was able to access the symbols it needed for tracing?
I tried running dtrace
standalone with just that env set:
> export DYLD_FALLBACK_LIBRARY_PATH=/path/to/project/target/debug/deps:/path/to/project/target/debug:/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/aarch64-apple-darwin/lib:/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/lib:/Users/zac/lib:/usr/local/lib:/usr/lib
> sudo dtrace -x ustackframes=100 -n 'profile-997 /pid == $target/ { @[ustack(100)] = count(); }' -o cargo-flamegraph.stacks -c /Users/zac/profile_me/target/release/profile-me
dtrace: description 'profile-997 ' matched 1 probe
123
dtrace: pid 24926 has exited
Damn. It worked with DYLD_FALLBACK_LIBRARY_PATH
overridden, so that wasn’t it. Could it be another environment variable? I tried forcibly setting all of the variables I had observed in cargo run
but not my regular shell before running the working dtrace
invocations:
> export CARGO=/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/cargo
> export CARGO_HOME=/Users/zac/.cargo
> export CARGO_MANIFEST_DIR=/path/to/project
> export CARGO_PKG_AUTHORS=
> export CARGO_PKG_DESCRIPTION=
> export CARGO_PKG_HOMEPAGE=
> export CARGO_PKG_LICENSE=
> export CARGO_PKG_LICENSE_FILE=
> export CARGO_PKG_NAME=untitled
> export CARGO_PKG_README=
> export CARGO_PKG_REPOSITORY=
> export CARGO_PKG_RUST_VERSION=
> export CARGO_PKG_VERSION=0.1.0
> export CARGO_PKG_VERSION_MAJOR=0
> export CARGO_PKG_VERSION_MINOR=1
> export CARGO_PKG_VERSION_PATCH=0
> export CARGO_PKG_VERSION_PRE=
> export DYLD_FALLBACK_LIBRARY_PATH=/path/to/project/target/debug/deps:/path/to/project/target/debug:/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/aarch64-apple-darwin/lib:/Users/zac/.rustup/toolchains/nightly-aarch64-apple-darwin/lib:/Users/zac/lib:/usr/local/lib:/usr/lib
> export RUSTUP_HOME=/Users/zac/.rustup
> export RUSTUP_TOOLCHAIN=nightly-aarch64-apple-darwin
> export RUST_RECURSION_COUNT=1
> export __CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0
> ./target/debug/dtrace-spawner
dtrace: description 'profile-997 ' matched 1 probe
123
dtrace: pid 28382 has exited
Success? true
> sudo dtrace -x ustackframes=100 -n 'profile-997 /pid == $target/ { @[ustack(100)] = count(); }' -o cargo-flamegraph.stacks -c /Users/zac/profile_me/target/release/profile-me
dtrace: description 'profile-997 ' matched 1 probe
123
dtrace: pid 28464 has exited
So it wasn’t the environment at all. Bummer. What else could be different about cargo run
? Barring wacky stuff like sandboxing (is seccomp
or pledge
even a thing on MacOS? Would it be anything short of insane for cargo
to do that even if it were?), the first thing I can think of is the working directory.
Are the working directories the same? #
I updated my main
to print out the directory via println!(std::env::current_dir())
, but it was the same (and correctly inherited from the shell) inside and outside cargo run
.
Is it sudo
?
#
This was getting really weird. Why would dtrace
malfunction when called inside cargo
, but not when called from any other program?
Ah, but I wasn’t running dtrace
. I was running sudo dtrace
. What if I moved sudo
into the shell, and invoked dtrace
directly in my command? I converted my dtrace invoker to not prepend sudo
, like so:
use std::process::Command;
fn main() {
let mut command = Command::new("dtrace");
let dtrace_script = format!(
"profile-{} /pid == $target/ \
{{ @[ustack(100)] = count(); }}",
997
);
command.arg("-n");
command.arg(&dtrace_script);
...
…and invoked it via cargo run
. Sure enough, it failed with DTrace requires additional privileges
. That was expected, let’s try with sudo
:
> sudo cargo run --bin dtrace-spawner
dtrace: invalid probe specifier profile-997 /pid == $target/ { @[ustack(100)] = count(); }: "/usr/lib/dtrace/darwin.d", line 26: syntax error near "uthread_t"
Success? false
Wat. It was cargo causing things to break. What was cargo run
doing?
Breathe in, breathe out #
Now, at this point I was investigating purely because I was curious (and annoyed, stubborn, and invested). I had a working means of generating dtrace
output for my program, and was able to feed that into the flame grapher of my choice without using flamegraph-rs
, and if time were of the essence I could have stopped investigating this awhile ago.
But that would have been too pragmatic no fun.
At this point I’d been at this for awhile. I walked around the block, asked some friends and colleagues about the issue, Googled a bit, and gathered my thoughts. Nothing obvious emerged, but it was worth recapping where I was at:
- If I invoked
sudo dtrace ... /my/program
directly in the shell, it worked. - If I invoked that same exact command via
process::Command
via a Rust program, it worked. - If I invoked that
dtrace
-spawning Rust program viacargo run
, or (I hypothesized) ifflamegraph-rs
performed that invocation ofdtrace
inside its Cargo plugin machinery, it failed. So Something (tm) was different about thecargo
command invocation environment, and that Something (tm) propagated to anycargo
-invoked code that itself spawneddtrace
.
At this point I was running out of ideas. Before I started down even deeper rabbit holes (reading cargo
source, dtrace
source, trying on a different installation of MacOS, stuff like that), I decided to really thoroughly check my work.
Checking my assumptions: environment variables #
First assumption: the environment did not influence the behavior of dtrace
. Rather than playing around with export
ing stuff in my shell, I wanted to verify that very firmly by running sudo dtrace
in a fully empty environment. I updated my Rust program to do the following:
let mut command = Command::new("sudo");
command.arg("/usr/bin/env");
command.env_clear();
let exit_status = command.spawn().unwrap().wait().unwrap();
println!("Success? {}", exit_status.success());
No environment variables whatsoever. Sure enough, that indicated very few environment variables set; only things that came from sudo
like HOME
, SUDO_USER
, and so on.
Okay, so let’s try running dtrace
with a similarly-stripped environment:
let mut command = Command::new("sudo");
command.env_clear();
command.arg("/usr/sbin/dtrace");
let dtrace_script = format!(
"profile-{} /pid == $target/ \
{{ @[ustack(100)] = count(); }}",
997
);
command.arg("-x");
command.arg("ustackframes=100");
command.arg("-n");
command.arg(&dtrace_script);
command.arg("-o");
command.arg("cargo-flamegraph.stacks");
command.arg("-c");
command.arg("/Users/zac/profile_me/target/release/profile-me");
let exit_status = command.spawn().unwrap().wait().unwrap();
println!("Success? {}", exit_status.success());
Same error.
Checking my assumptions: working directory #
Next assumption to check: the working directory was the same. Was it really? Did I truly trust std::env::current_dir()
? I switched to using the pwd
command instead:
let mut command = Command::new("sudo");
command.arg("/bin/pwd");
command.env_clear();
let exit_status = command.spawn().unwrap().wait().unwrap();
println!("Success? {}", exit_status.success());
Sure enough, it printed the same directory both inside and outside of cargo run
, which matched std::env::current_dir()
. But was it really the same directory? I was distrusting everything at this point.
let dir = env::current_dir().unwrap();
let metadata = std::fs::metadata(dir).unwrap();
println!("inode: {}", metadata.st_ino());
The inode matched inside and outside of cargo
, inside and outside of sudo
, sudo cargo
, etc. So no sneaky chroot
.
What now? #
Before venturing into the unknown, I wanted to check one last set of ground truths: were my installations of cargo
, sudo
, dtrace
, and any other programs involved in this issue correct? I started probing the installation binaries of each application I was interacting with just to make sure.
And I hit pay dirt.
To be honest, I only performed this check because I wasn’t sure if (for example) I had installed a hacked-up/patched version of cargo
/dtrace
sometime long ago and forgotten to uninstall it. While that didn’t turn out to be the case, I’m really glad I checked this.
Making sure the binaries are correct #
I’m on an M1 (that is, ARM) Mac laptop with Rosetta installed, so I can run x86 binaries too.
I started by checking the location, provenance (install date/checksum, and whether it matched with other Macs I had access to) of each program involved. I’ll skip most of the details as to how I did that, as they’re irrelevant, but I’ll include the output for probing the file
type of each involved binary:
> file /Users/zac/profile_me/target/release/profile-me
/Users/zac/profile_me/target/release/profile-me: Mach-O 64-bit executable arm64
That means that the program I’m profiling is an ARM-only binary, which matches my MacOS architecture. Good so far…
> file $(which sudo)
/usr/bin/sudo: setuid cannot open: Permission denied
Weird, but making a “sudo
sandwich” (not to be confused with sudo
make me a sandwich) works:
> sudo file $(which sudo)
/usr/bin/sudo: setuid Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/usr/bin/sudo (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/bin/sudo (for architecture arm64e): Mach-O 64-bit executable arm64e
That’s a universal (ARM- and x86-compatible) binary, so that seems OK. How about dtrace
?
> file $(which dtrace)
/usr/sbin/dtrace: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/usr/sbin/dtrace (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/sbin/dtrace (for architecture arm64e): Mach-O 64-bit executable arm64e
Same deal. And cargo
?
> file $(which cargo)
/Users/zac/.cargo/bin/cargo: Mach-O 64-bit executable x86_64
Son of a–
There it was. The cargo
binary was x86-only.
Putting the pieces together #
Cargo can build Rust artifacts for any of the Rust toolchains/targets you have installed, so the binary architecture of the cargo
program itself had never mattered to me until now. But now it mattered a lot.
See, the way Rosetta/binary compatibility on MacOS works (if you have rosetta installed, which I do) is this:
- When you launch a program, MacOS tries to run that program on the architecture of the launching environment. In other words, if you compile
/bin/bash
asx86_64
architecture, open a shell, and runls
, MacOS tries to launch anx86_64
ls
binary (using Rosetta to run it on the CPU if the CPU is not x86) if it can. - If the binary in question (in this example, whichever
ls
is found first onPATH
) does not support that preferred architecture, MacOS will run it as whatever architecture it does support. - You can override the architecture the OS launches programs at in a couple of ways: via environment variables or by running binaries wrapped in the
arch
program. Soarch -arch arm64 ls
forces the use ofls
as anarm64
binary.
This gets complicated when it comes to programs that run other programs, like shells, programs that invoke other programs (like my process::Command
usage), sudo
, debuggers like dtrace
… and cargo
.
In those situations, you can think of it as if a magical environment variable were set when wrapper programs launch that says “what architecture should programs launched by this wrapper prefer to run as?”. No such variable exists, but the effect is the same, and the value is chosen based on the architecture for which the wrapper program was compiled.
Now, that imaginary varaible-but-not is not a “force this architecture” bit; it merely specifies a preference. If a program-spawning binary for x86_64
invokes a binary on the filesystem that is ARM-only, then the wrapped binary is run in ARM mode.
However, if the program-spawner invokes a “universal” binary (one which contains program code for multiple architecutres), it prefers to launch that binary in the architecture matching the architecture of the spawner.
The resulting matrix of “spawner architecture/override vs what spawned binaries run as” looks like this:
binary is universal | binary is x86 | binary is ARM | |
---|---|---|---|
launcher is x86 | binary run as x86 | binary run as x86 | binary run as ARM |
launcher is ARM | binary run as ARM | binary run as x86 | binary run as ARM |
launcher forces x86 | binary run as x86 | binary run as x86 | error |
launcher forces ARM | binary run as ARM | error | binary run as ARM |
Note that this behavior is totally unaffected by the native hardware architecture of the system, only the architecture of the binaries being run. That is going to be important.
In this case, both sudo
and dtrace
are universal binaries. My /Users/zac/profile_me/target/release/profile-me
program-to-profile is ARM-only, because that’s the native architecture of my Mac and what my Rust build toolchain is configured to emit by default.
And cargo
is x86_64
-only
(1)At no point here do we care about the architecture for which cargo
builds packages. Cargo, regardless of its binary architecture, works just fine at invoking the Rust toolchain such that output Rust program binaries of any architecture (selected by the user) can be produced, cross-compiling or not. For the present issue, we are only interested in cargo
as a command runner: i.e. when cargo flamegraph
or cargo run
spawn pre-existing binaries on the system, what architecture do they run as?
, so the failure pathology is this:
cargo
is launched as anx86_64
-only binary. Again, there’s nothing otherwise special about Cargo-the-app in this scenario. It’s just a program that launches other programs.- Internally,
flamegraph-rs
’scargo
plugin is built and integrated withcargo
via normal Rust binary compilation, making an ARM-only binary in~/.cargo/bin/flamegraph
. cargo flamegraph
spawns~/.cargo/bin/flamegraph
from within the x86-onlycargo
program. That binary is ARM-only, so it is run in ARM mode.~/.cargo/bin/flamegraph
then spawnssudo
, which is a universal binary and can run as either architecture.- MacOS, bafflingly, chooses to run
sudo
as x86 because the outercargo
program is also x86. sudo
then runsdtrace
, which is also a universal binary.- DTrace is invoked as x86 because
sudo
is x86. - DTrace then tries to attach its debugging probes to an ARM-only binary (the program being profiled).
- Because DTrace’s internal kernel integration is highly architecture/symbol/binary layout specific, this fails: x86
dtrace
assumes it’ll be attached tox86
programs, which is not the case here. - This causes a DTrace startup error:
"/usr/lib/dtrace/darwin.d", line 26: syntax error near "uthread_t"
. In this context, that error is saying “architecture mismatch: x86 DTrace has no idea how to initialize for a running non-x86 process”.
The hierarchy looks like this:
my shell (zsh): ARM-only, run as ARM
\_ cargo: x86-only, run as x86
\_ ~/.cargo/bin/flamegraph: ARM-only, run as arm
\_ sudo: universal, run as x86
\_ dtrace: universal, run as x86
\_ program being profiled, ARM-only, run as ARM
And because DTrace’s architecture needs to match the architecutre of the program it’s hooked up to, a failure occurs.
The Quick Fix #
Now that I knew the issue, two options for a fix presented themselves: a quick one and a PR to flamegraph
.
The quick fix was pretty straightforward.
First, I got rid of my x86 cargo
and associated tooling:
> brew uninstall rustup
Uninstalling /opt/homebrew/Cellar/rustup-init/1.26.0_1... (9 files, 6.7MB)
> rm -rf ~/.rustup ~/.cargo
> cargo
zsh: command not found: cargo
> rustc
zsh: command not found: rustc
Goodbye, Rust! But not for long:
> brew update
Already up-to-date.
> brew install rustup
...
Please run `rustup-init` to initialize `rustup` and install other Rust components.
==> Summary
🍺 /opt/homebrew/Cellar/rustup-init/1.26.0_1: 9 files, 6.7MB
> rustup-init
...
rustup-init
had a lot of output and prompts, but when it finished I had a minty fresh aarch64-apple-darwin
rust toolchain installed.
But had it worked?
> file $(which cargo)
/Users/zac/.cargo/bin/cargo: Mach-O 64-bit executable arm64
Whew. Now we go all the way back to the start:
> cargo install flamegraph
> cargo flamegraph --root --bin profile-me
Finished release [optimized + debuginfo] target(s) in 0.00s
dtrace: description 'profile-997 ' matched 1 probe
Running as arm64
dtrace: pid 71881 has exited
writing flamegraph to "flamegraph.svg"
Finally! It worked! I’d post the flamegraph if it mattered, but after spending all day on this I don’t want to deal with whatever is causing my CMS’s image transformation pipeline to choke on that file. So just take my word for it.
How did I get here? #
But how did I end up with an x86-only cargo
binary on an ARM Mac in the first place?
After poking around a bit, I discovered that my workstation provisioning scripts run using Puppet, a common automation tool. Puppet automates a bunch of stuff, like brew install
ing packages I use, running rustup init
and rustup target add ...
and so on.
It turns out that part of the invocation machinery for Puppet, called Puppet Bolt is installed on MacOS as a binary .dmg
file. That .dmg
is, at the time of this writing, only available as x86. Meaning that whenever I used Puppet to manage Rust/rustup
, Puppet was running in an environment that preferred x86 execution for sub-sub-sub-processes, causing rustup
to build cargo
for x86 only.
Puppet and Bolt are not special here. They’re just commands that run other commands, and so on down the chain. The problem is more general: the presence of an x86 invoker anywhere in the tree encourages the use (and, in the case of compilers for binaries like cargo
, the creation) of x86 binaries for all processes “below” that invoker.
It’s quite possible (I’d even say “likely”) that many folks are affected by issues like this regardless of whether they provision with Puppet or just shell commands–after all, what if the old version of bash/zsh/$SHELL you’re using is x86 only? The same preference propagation will occur to commands it spawns.
This isn’t a new class of problem. At one point, Homebrew itself had “split” its entire package management landscape into x86 and ARM versions, both of which could be present on the same computer. Which brew
you used (and which architecture of packages it preferred to install) depended on a byzantine combination of environment variables/pathing, how you installed Homebrew originally, and what binary architecture of shell/automation you were calling brew
from. While that’s thankfully mostly a thing of the past now that most packages have good support for ARM, it still occasionally crops up with folks I do workstation support for: if they set up Homebrew/their mac way back during the tumultuous times when ARM was brand new and Rosetta was used really heavily everywhere, they can have significant issues today trying to get things to run natively whenever possible.
Short version of the “original sin” that got me into this situation: a workstation provisioning tool ran as x86 because its providers only distribute it as such. When it ran rustup
, the x86-only cargo
was installed.
The less-quick fix #
To prevent other users of flamegraph-rs
from getting into this situation, I made a PR to address the issue by forcing dtrace
to run as the host’s native architecture regardless of the architecture preference implicitly expressed by any parent processes.
I also filed some issues:
- The issue with
flamegraph-rs
(which my PR attempts to fix).- (Edit) This fix was merged into
flamegraph-rs
version 0.5.1, so this particular tool is resilient to this issue. Thanks to Dirkjan Ochtman, the maintainer, for reviewing, testing, and cutting the release!
- (Edit) This fix was merged into
- An issue with the Puppet library that invokes
rustup
, encouraging maintainers of that library to force native architecture onrustup
invocations.- (Edit) my fix was merged into
puppet-rustup
version 0.2.4, preventing Rust toolchains installed via that tool from experiencing this issue. Thanks to Daniel Parks, the maintainer, for reviewing, testing, and cutting the release!
- (Edit) my fix was merged into
- An issue with Puppet Bolt, encouraging its maintainers to publish ARM installers for Bolt to reduce the likelihood of future users encountering this issue.
- (Edit) a fix was merged by the maintainers (thanks!). Artifacts built with that fix should be published to the Puppet bolt artifact store some time soon, after which the Bolt homebrew cask formula can be updated to use them, hopefully resolving this issue once and for all.
- An issue (“Feedback Report”) to Apple describing some problems I encountered when using the
ARCHPREFERENCE
environment variable to try to affect which architecturedtrace
/sudo
run as. In short, it seems like that variable (documented here) does not always have the effect it is documented to have on architecture preference selection.
Tl;dr and Morals of the Story #
If you’re affected by some subpart of this specific issue (involving Puppet, Bolt, or flamegraph-rs
) do whichever of the following are applicable to you in order to pull in the fixes listed above:
- Remove your
rustup
toolchain andcargo
and reinstall them from the latest distributions available. - Update Homebrew and your installation(s) of the
puppet-bolt
andpuppet-agent
casks. - Update
puppet-rustup
in your Puppet/Bolt project. - Update
flamegraph-rs
(either as a standalone Cargo-installed binary or a dependency of a Rust project).
If you find yourself in a similar situation, here are the cliff notes of what I learned:
- If
file $(which cargo)
indicates that thecargo
binary’s architecture doesn’t match your Mac’s architecture, problems may occur. - If you’re using cargo to run programs which themselves run programs (via
cargo run
-ing your program that runs other programs, or via cargo plugins likeflamegraph
), I’d consider it a best practice to ensure that the architecture of thecargo
binary matches the architecture of the child (or grandchild etc.) programs being run within it. - If you’re using
cargo
to run programs that attach debuggers like DTrace to other binaries, you must ensure that the binary architecture ofcargo
matches the architecture of the program being debugged. - If you find a mismatch in this area and are using rustup, completely remove rustup and then reinstall it, and the requisite toolchain(s), at the correct architecture.
- If
dtrace
on MacOS ever complains about syntax errors when loading its stdlib via files like/usr/lib/dtrace/darwin.d
, this probably indicates an architecture mismatch betweendtrace
and the program you’re tracing, and probably doesn’t indicate a need to install additional symbols via a MacOS KDK. - Even though Rosetta usually does the right thing and stays out of your way, it pays to understand its architecture-preference propagation rules.
But Wait! #
There’s more!
Astute readers will have noticed an inconsistency between the hypothesis regarding architecture-preference inheritance among process spawners, and the architectures of the spawners that were actually in play on my system.
Stay tuned for a sequel post (this one’s too long already) within the next week (edit: this was hilariously optimistic, it’s been two months and it’s still sitting in my drafts folder accumulating tweaks), in which I will dig into that and figure out what, exactly, the Rosetta arch-preference rules are.