llvm-project.git/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp, branch main

[lldb] Correct 32-bit build failure in StopInfoMachException.cpp (#159523)

2025-09-18T08:59:02+00:00

uintptr_t is usually a good idea when handling pointers, but lldb has to
handle 64-bit addresses that might be from a remote system, on a 32-bit
system.

So I've changed a few instances here to use addr_t which is 64-bit
everywhere.

Before we got:
https://lab.llvm.org/buildbot/#/builders/18/builds/21247

```
../llvm-project/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp:81:28: error: constexpr variable 'g_mte_tag_mask' must be initialized by a constant expression
   81 | static constexpr uintptr_t g_mte_tag_mask = (uintptr_t)0x0f << g_mte_tag_shift;
      |                            ^                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../llvm-project/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp:81:61: note: shift count 56 >= width of type 'uintptr_t' (aka 'unsigned int') (32 bits)
   81 | static constexpr uintptr_t g_mte_tag_mask = (uintptr_t)0x0f << g_mte_tag_shift;
      |                                                             ^
1 error generated.
```

Original code added by #159117.

[lldb] Recognize MTE fault Mach exceptions (#159117)

2025-09-17T18:20:52+00:00

Recognize an MTE tag fault Mach exception. A tag fault is an error
reported by Arm's Memory Tagging Extension (MTE) when a memory access
attempts to use a pointer with a tag that doesn't match the tag stored
with the memory. LLDB will print the tag and address to make the issue
easier to diagnose.

This was hand tested by debugging an MTE enabled binary on an iPhone 17
running iOS 26.

rdar://113575216

[lldb] Change lldb's breakpoint handling behavior, reland (#126988)

2025-02-13T19:30:10+00:00

lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`, you
will hit the breakpoint and the pc will not yet advance. This thread has
not completed its stepi, and the ThreadPlanStepInstruction is still on
the stack. If you then `continue` the thread, lldb will now stop and
say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that.
When the thread resumes, if the pc is still at the same site, we will
continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or
not), the Process plugin should call
Thread::SetThreadHitBreakpointSite(). When we go to resume the thread,
we will push a step-over-breakpoint ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

I first landed this patch in July 2024 via
https://github.com/llvm/llvm-project/pull/96260

but the CI bots and wider testing found a number of test case failures
that needed to be updated, I reverted it. I've fixed all of those issues
in separate PRs and this change should run cleanly on all the CI bots
now.

rdar://123942164

[lldb] Remove unfiltered stop reason propagation from StopInfoMachException (#122817)

2025-01-14T19:25:58+00:00

In the presence of OS plugins, StopInfoMachException currently
propagates breakpoint stop reasons even if those breakpoints were not
intended for a specific thread, effectively removing our ability to set
thread-specific breakpoints.

This was originally added in [1], but the motivation provided in the
comment does not seem strong enough to remove the ability to set
thread-specific breakpoints. The only way to break thread specific
breakpoints would be if a user set such a breakpoint and _then_ loaded
an OS plugin, a scenario which we would likely not want to support.

[1]:
https://github.com/swiftlang/llvm-project/commit/ab745c2ad865c07f3905482fd071ef36c024713a#diff-8ec6e41b1dffa7ac4b5841aae24d66442ef7ebc62c8618f89354d84594f91050R501

[lldb] Support overriding the disassembly CPU & features (#115382)

2024-11-12T00:27:15+00:00

Add the ability to override the disassembly CPU and CPU features through
a target setting (`target.disassembly-cpu` and
`target.disassembly-features`) and a `disassemble` command option
(`--cpu` and `--features`).

This is especially relevant for architectures like RISC-V which relies
heavily on CPU extensions.

The majority of this patch is plumbing the options through. I recommend
looking at DisassemblerLLVMC and the test for the observable change in
behavior.

Revert "[lldb] Change lldb's breakpoint handling behavior (#96260)"

2024-07-20T01:43:53+00:00

This reverts commit 05f0e86cc895181b3d2210458c78938f83353002.

The debuginfo dexter tests are failing, probably because the way
stepping over breakpoints has changed with my patches.  And there
are two API tests fails on the ubuntu-arm (32-bit) bot. I'll need
to investigate both of these, neither has an obvious failure reason.

[lldb] Change lldb's breakpoint handling behavior (#96260)

2024-07-20T00:26:13+00:00

lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, but I'm convinced it is
necessary: If you're stopped at a BreakpointSite that has not yet
executed, you `stepi`, you will hit the breakpoint and the pc will not
yet advance. This thread has not completed its stepi, and the thread
plan is still on the stack. If you then `continue` the thread, lldb will
now stop and say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution. I
discussed this with Jim, and trying to paper over this behavior will
lead to more complicated scenarios behaving non-intuitively. And mostly
it's the testsuite that was trying to instruction step past a breakpoint
and getting thrown off -- and I changed those tests to expect the new
behavior.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record
that.  When the thread resumes, if the pc is still at the same site, we
will continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or not),
the Process plugin should call Thread::SetThreadHitBreakpointSite().
When we go to resume the thread, we will push a step-over-breakpoint
ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

rdar://123942164

[lldb] Tighten ABI assert in `StopInfoMachException::DeterminePtrauthFailure` (NFC) (#95015)

2024-06-10T17:49:16+00:00

This patch tightens the assert check for the ABISP object in
`StopInfoMachException::DeterminePtrauthFailure`.

This causes some failure when debugging on a system that doesn't have
pointer authentification support, like on Intel for instance.

rdar://129401926

Signed-off-by: Med Ismail Bennani

[lldb] Detect a Darwin kernel issue and work around it (#81573)

2024-02-14T21:06:20+00:00

On arm64 machines, when there is a hardware breakpoint or watchpoint
set, and lldb has instruction-stepped a thread, and then done a
Process::Resume, we will sometimes receive an extra "instruction step
completed" mach exception and the pc has not advanced. From a user's
perspective, they hit Continue and lldb stops again at the same spot.
From the testsuite's perspective, this has been a constant source of
testsuite failures for any test using hardware watchpoints and
breakpoints, the arm64 CI bots seem especially good at hitting this
issue.

Jim and I have been slowly looking at this for a few months now, and
finally I decided to try to detect this situation in lldb and silently
resume the process again when it happens.

We were already detecting this "got an insn-step finished mach exception
but this thread was not instruction stepping" combination in
StopInfoMachException where we take the mach exception and create a
StopInfo object for it. We had a lot of logging we used to understand
the failure as it was hit on the bots in assert builds.

This patch adds a new case to `Thread::GetPrivateStopInfo()` to call the
StopInfo's (new) `IsContinueInterrupted()` method. In
StopInfoMachException, where we previously had logging for assert
builds, I now note it in an ivar, and when
`Thread::GetPrivateStopInfo()` asks if this has happened, we check all
of the combination of events that this comes up: We have a hardware
breakpoint or watchpoint, we were not instruction stepping this thread
but got an insn-step mach exception, the pc is the same as the previous
stop's pc. And in that case, `Thread::GetPrivateStopInfo()` returns no
StopInfo -- indicating that this thread would like to resume execution.

The `Thread` object has two StackFrameLists, `m_curr_frames_sp` and
`m_prev_frames_sp`. When a thread resumes execution, we move
`m_curr_frames_sp` in to `m_prev_frames_sp` and when it stops executing,
w euse `m_prev_frames_sp` to seed the new `m_curr_frames_sp` if most of
the stack is the same as before.

In this same location, I now save the Thread's RegisterContext::GetPC
into an ivar, `m_prev_framezero_pc`. StopInfoMachException needs this
information to check all of the conditions I outlined above for
`IsContinueInterrupted`.

This has passed exhaustive testing and we do not have any testsuite
failures for hardware watchpoints and breakpoints due to this kernel bug
with the patch in place. In focusing on these tests for thousands of
runs, I have found two other uncommon race conditions for the
TestConcurrent* tests on arm64. TestConcurrentManyBreakpoints.py (which
uses no hardware watchpoint/breakpoints) will sometimes only have 99
breakpoints when it expects 100, and any of the concurrent tests using
the shared harness (I've seen it in
TestConcurrentWatchBreakDelay.py,
TestConcurrentTwoBreakpointsOneSignal.py,
TestConcurrentSignalDelayWatch.py) can fail when the test harness checks
that there is only one thread still running at the end, and it finds two
-- one of them under pthread_exit / pthread_terminate. Both of these
failures happen on github main without my changes, and with my changes -
they are unrelated race conditions in these tests, and I'm sure I'll be
looking into them at some point if they hit the CI bots with frequency.
On my computer, these are in the 0.3-0.5% of the time class. But the CI
bots do have different timing.

[lldb] Add support for large watchpoints in lldb (#79962)

2024-02-01T05:03:38+00:00

This patch is the next piece of work in my Large Watchpoint proposal,
https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116

This patch breaks a user's watchpoint into one or more
WatchpointResources which reflect what the hardware registers can cover.
This means we can watch objects larger than 8 bytes, and we can watched
unaligned address ranges. On a typical 64-bit target with 4 watchpoint
registers you can watch 32 bytes of memory if the start address is
doubleword aligned.

Additionally, if the remote stub implements AArch64 MASK style
watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2
size region of memory up to 2GB, aligned to that same size.

I updated the Watchpoint constructor and CommandObjectWatchpoint to
create a CompilerType of Array when the size of the watched
region is greater than pointer-size and we don't have a variable type to
use. For pointer-size and smaller, we can display the watched granule as
an integer value; for larger-than-pointer-size we will display as an
array of bytes.

I have `watchpoint list` now print the WatchpointResources used to
implement the watchpoint.

I added a WatchpointAlgorithm class which has a top-level static method
that takes an enum flag mask WatchpointHardwareFeature and a user
address and size, and returns a vector of WatchpointResources covering
the request. It does not take into account the number of watchpoint
registers the target has, or the number still available for use. Right
now there is only one algorithm, which monitors power-of-2 regions of
memory. For up to pointer-size, this is what Intel hardware supports.
AArch64 Byte Address Select watchpoints can watch any number of
contiguous bytes in a pointer-size memory granule, that is not currently
supported so if you ask to watch bytes 3-5, the algorithm will watch the
entire doubleword (8 bytes). The newly default "modify" style means we
will silently ignore modifications to bytes outside the watched range.

I've temporarily skipped TestLargeWatchpoint.py for all targets. It was
only run on Darwin when using the in-tree debugserver, which was a proxy
for "debugserver supports MASK watchpoints". I'll be adding the
aforementioned feature flag from the stub and enabling full mask
watchpoints when a debugserver with that feature is enabled, and
re-enable this test.

I added a new TestUnalignedLargeWatchpoint.py which only has one test
but it's a great one, watching a 22-byte range that is unaligned and
requires four 8-byte watchpoints to cover.

I also added a unit test, WatchpointAlgorithmsTests, which has a number
of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I
think there's interesting possible different approaches to how we cover
these; I note in the unit test that a user requesting a watch on address
0x12e0 of 120 bytes will be covered by two watchpoints today, a
128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte
watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer
false positives/private stops. As we try refining this one, it's helpful
to have a collection of tests to make sure things don't regress.

I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64
Ubuntu. I have not modifed the Windows process plugins yet, I might try
that as a standalone patch, I'd be making the change blind, but the
necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty
small so it might be obvious enough that I can change it and see what
the Windows CI thinks.

There isn't yet a packet (or a qSupported feature query) for the gdb
remote serial protocol stub to communicate its watchpoint capabilities
to lldb. I'll be doing that in a patch right after this is landed,
having debugserver advertise its capability of AArch64 MASK watchpoints,
and have ProcessGDBRemote add eWatchpointHardwareArmMASK to
WatchpointAlgorithms so we can watch larger than 32-byte requests on
Darwin.

I haven't yet tackled WatchpointResource *sharing* by multiple
Watchpoints. This is all part of the goal, especially when we may be
watching a larger memory range than the user requested, if they then add
another watchpoint next to their first request, it may be covered by the
same WatchpointResource (hardware watchpoint register). Also one "read"
watchpoint and one "write" watchpoint on the same memory granule need to
be handled, making the WatchpointResource cover all requests.

As WatchpointResources aren't shared among multiple Watchpoints yet,
there's no handling of running the conditions/commands/etc on multiple
Watchpoints when their shared WatchpointResource is hit. The goal beyond
"large watchpoint" is to unify (much more) the Watchpoint and Breakpoint
behavior and commands. I have a feeling I may be slowly chipping away at
this for a while.

Re-landing this patch after fixing two undefined behaviors in
WatchpointAlgorithms found by UBSan and by failures on different
CI bots.

rdar://108234227