llvm-project.git, branch users/guy-david/aarch64-post-truncating-store-simd

[AArch64] Post-truncating store for i8/i16 on lane zero

2025-10-20T12:45:11+00:00

[InstCombine] Move ptrtoaddr tests to InstSimplify (NFC)

2025-10-20T12:39:40+00:00

All the existing tests test code either in ConstantFolding or
InstSimplify, so move them to use -passes=instsimplify instead of
-passes=instcombine. This makes sure we keep InstSimplify coverage
even if there are subsuming InstCombine folds.

This requires writing some of the constant folding tests in a
different way, as InstSimplify does not try to re-fold already
existing constant expressions.

[Clang][NFC] Rename UnqualPtrTy to DefaultPtrTy (#163207)

2025-10-20T12:34:21+00:00

`UnqualPtrTy` didn't always match `llvm::PointerType::getUnqual`:
sometimes it returned a pointer that is not in address space 0 (notably
for SPIRV).

Since `UnqualPtrTy` was used as the "generic" or "default" pointer type,
this patch renames it to `DefaultPtrTy` to avoid confusion with LLVM's
`PointerType::getUnqual`.

[SLP]Do not pack div-like copyable values

2025-10-20T12:19:42+00:00

If a main instruction in the copyables is a div-like instruction, the
compiler cannot pack duplicates, extending with poisons, these
instructions, being vectorize, will result in undefined behavior.

Fixes #164185

[InstSimplify] Support ptrtoaddr in simplifyCastInst()

2025-10-20T12:18:34+00:00

Handle ptrtoaddr the same way as ptrtoint. The fold already only
operates on the index/address bits.

[mlir][docs] Add documentation for No-rollback Conversion Driver (#164071)

2025-10-20T12:04:30+00:00

Add documentation for the no-rollback conversion driver. Also improve
the documentation of the old rollback driver. In particular: which
modifications are performed immediately and which are delayed.

[AArch64] Improve lowering of GPR zeroing in copyPhysReg (#163059)

2025-10-20T11:59:58+00:00

This patch pivots GPR32 and GPR64 zeroing into distinct branches to
simplify the code an improve the lowering.

Zeroing GPR moves are now handled differently than non-zeroing ones.
Zero source registers WZR and XZR do not require register annotations of
undef, implicit and kill. The non-zeroing source now cannot process WZR
removing the ternary expression. This patch also moves GPR64 logic right
after GPR32 for better organization.

[AArch64][GlobalISel] Add rax1.ll test converage. NFC

2025-10-20T11:53:30+00:00

[lldb] Fix the "RegisterValue::SetValueFromData" method for 128-bit integer registers (#163646)

2025-10-20T11:48:00+00:00

Fix the `RegisterValue::SetValueFromData` method so that it works also
for 128-bit registers that contain integers.

Without this change, the `RegisterValue::SetValueFromData` method does
not work correctly
for 128-bit registers that contain (signed or unsigned) integers.

---

Steps to reproduce the problem:

(1)

Create a program that writes a 128-bit number to a 128-bit registers
`xmm0`. E.g.:
```
#include 

int main() {
  __asm__ volatile (
      "pinsrq $0, %[lo], %%xmm0\n\t"  // insert low 64 bits
      "pinsrq $1, %[hi], %%xmm0"    // insert high 64 bits
      :
      : [lo]"r"(0x7766554433221100),
        [hi]"r"(0xffeeddccbbaa9988)
  );
  return 0;
}
```

(2)

Compile this program with LLVM compiler:
```
$ $YOUR/clang -g -o main main.c
```

(3)

Modify LLDB so that when it will be reading value from the `xmm0`
register, instead of assuming that it is vector register, it will treat
it as if it contain an integer. This can be achieved e.g. this way:
```
diff --git a/lldb/source/Utility/RegisterValue.cpp b/lldb/source/Utility/RegisterValue.cpp
index 0e99451c3b70..a4b51db3e56d 100644
--- a/lldb/source/Utility/RegisterValue.cpp
+++ b/lldb/source/Utility/RegisterValue.cpp
@@ -188,6 +188,7 @@ Status RegisterValue::SetValueFromData(const RegisterInfo ®_info,
     break;
   case eEncodingUint:
   case eEncodingSint:
+  case eEncodingVector:
     if (reg_info.byte_size == 1)
       SetUInt8(src.GetMaxU32(&src_offset, src_len));
     else if (reg_info.byte_size <= 2)
@@ -217,23 +218,6 @@ Status RegisterValue::SetValueFromData(const RegisterInfo ®_info,
     else if (reg_info.byte_size == sizeof(long double))
       SetLongDouble(src.GetLongDouble(&src_offset));
     break;
-  case eEncodingVector: {
-    m_type = eTypeBytes;
-    assert(reg_info.byte_size <= kMaxRegisterByteSize);
-    buffer.bytes.resize(reg_info.byte_size);
-    buffer.byte_order = src.GetByteOrder();
-    if (src.CopyByteOrderedData(
-            src_offset,          // offset within "src" to start extracting data
-            src_len,             // src length
-            buffer.bytes.data(), // dst buffer
-            buffer.bytes.size(), // dst length
-            buffer.byte_order) == 0) // dst byte order
-    {
-      error = Status::FromErrorStringWithFormat(
-          "failed to copy data for register write of %s", reg_info.name);
-      return error;
-    }
-  }
   }
 
   if (m_type == eTypeInvalid)
```

(4)

Rebuild the LLDB.

(5)

Observe what happens how LLDB will print the content of this register
after it was initialized with 128-bit value.
```
$YOUR/lldb --source ./main
(lldb) target create main
Current executable set to '.../main' (x86_64).
(lldb) breakpoint set --file main.c --line 11
Breakpoint 1: where = main`main + 45 at main.c:11:3, address = 0x000000000000164d
(lldb) settings set stop-line-count-before 20
(lldb) process launch
Process 2568735 launched: '.../main' (x86_64)
Process 2568735 stopped
* thread #1, name = 'main', stop reason = breakpoint 1.1
    frame #0: 0x000055555555564d main`main at main.c:11:3
   1   	#include 
   2   	
   3   	int main() {
   4   	  __asm__ volatile (
   5   	      "pinsrq $0, %[lo], %%xmm0\n\t"  // insert low 64 bits
   6   	      "pinsrq $1, %[hi], %%xmm0"    // insert high 64 bits
   7   	      :
   8   	      : [lo]"r"(0x7766554433221100),
   9   	        [hi]"r"(0xffeeddccbbaa9988)
   10  	  );
-> 11  	  return 0;
   12  	}
(lldb) register read --format hex xmm0
    xmm0 = 0x7766554433221100ffeeddccbbaa9988
```

You can see that the upper and lower 64-bit wide halves are swapped.

---------

Co-authored-by: Matej Košík

[libcxx] Optimize std::generate for segmented iterators (#163006)

2025-10-20T11:37:33+00:00

Part of #102817.

This patch attempts to optimize the performance of `std::generate` for
segmented iterators. Below are the benchmark numbers from
`libcxx\test\benchmarks\algorithms\modifying\generate.bench.cpp`. Test
cases that use segmented iterators have also been added.

- before

```
std::generate(deque)/32           194 ns          193 ns      3733333
std::generate(deque)/50           276 ns          276 ns      2488889
std::generate(deque)/1024        5096 ns         5022 ns       112000
std::generate(deque)/8192       40806 ns        40806 ns        17231
```

- after

```
std::generate(deque)/32           106 ns          105 ns      6400000
std::generate(deque)/50           139 ns          138 ns      4977778
std::generate(deque)/1024        2713 ns         2699 ns       248889
std::generate(deque)/8192       18983 ns        19252 ns        37333
```

---------

Co-authored-by: A. Jiang