llvm-project.git/libclc, branch users/boomanaiden154/main.lit-remove-python-27-code-paths-in-builtin-diff

[NFC][libclc] Replace _CLC_V_V_VP_VECTORIZE macro with use of unary_def_with_ptr_scalarize.inc (#157002)

2025-09-09T00:11:27+00:00

Commit d50f2ef437ae removes _CLC_V_V_VP_VECTORIZE from header file, but
the macro is still used in our downstream code:
https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/modf.cl#L30
https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/sincos.cl#L31

We can either revert d50f2ef437ae or replace macro with use of
unary_def_with_ptr_scalarize.inc. This PR uses the latter approach.

[libclc] Implement erf/erfc vector function with loop since scalar function is large (#157055)

2025-09-05T11:58:24+00:00

This PR reduces amdgcn--amdhsa.bc size by 1.8% and nvptx64--nvidiacl.bc
size by 4%.
Loop trip count is constant and backend can decide whether to unroll.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[libclc] Override generic symbol using llvm-link --override flag instead of using weak linkage (#156778)

2025-09-05T11:58:07+00:00

Before this PR, weak linkage is applied to a few CLC generic functions
to allow target specific implementation to override generic one.
However, adding weak linkage has a side effect of preventing
inter-procedural optimization, such as PostOrderFunctionAttrsPass,
because weak function doesn't have exact definition (as determined by
hasExactDefinition in the pass).

This PR resolves the issue by adding --override flag for every
non-generic bitcode file in llvm-link run. This approach eliminates the
need for weak linkage while still allowing target-specific
implementation to override generic one.
llvm-diff shows imporoved attribute deduction for some functions in
amdgcn--amdhsa.bc, e.g.
  %23 = tail call half @llvm.sqrt.f16(half %22)
=>
  %23 = tail call noundef half @llvm.sqrt.f16(half %22)

[NFC][libclc] Set MACRO_ARCH to ${ARCH} uncondionally before customizing (#156789)

2025-09-04T23:35:40+00:00

Our downstream libclc add a few more targets that customizes build_flags
and opt_flags. Then in each customization block, MACRO_ARCH is defined
to be ${ARCH}.
Hoisting MACRO_ARCH definition out of if-else-end block avoids code
duplication. This also avoids potential error when MACRO_ARCH definition
is forgotten, e.g. in https://github.com/intel/llvm/pull/19971.

[NFC][libclc] Remove unused -DCLC_INTERNAL build flag, remove unused M_LOG210 (#156590)

2025-09-04T22:44:37+00:00

[NFC][libclc] Move _CLC_V_V_VP_VECTORIZE macro into clc_lgamma_r.cl and delete clcmacro.h (#156280)

2025-09-03T00:23:01+00:00

clcmacro.h only defines _CLC_V_V_VP_VECTORIZE which is only used in
clc/lib/generic/math/clc_lgamma_r.cl.

[libclc] update __clc_mem_fence: add MemorySemantic arg and use __builtin_amdgcn_fence for AMDGPU (#152275)

2025-09-01T03:03:45+00:00

It is necessary to add MemorySemantic argument for AMDGPU which means
the memory or address space to which the memory ordering is applied.

The MemorySemantic is also necessary for implementing the SPIR-V
MemoryBarrier instruction. Additionally, the implementation of
__clc_mem_fence on Intel GPUs requires the MemorySemantic argument.

Using __builtin_amdgcn_fence for AMDGPU is follow-up of
https://github.com/llvm/llvm-project/pull/151446#discussion_r2254006508

llvm-diff shows no change to nvptx64--nvidiacl.bc.

libclc: CMake: include GetClangResourceDir (#155836)

2025-08-28T16:56:33+00:00

`get_clang_resource_dir` is not guarantee to be there. Make sure of it
by including `GetClangResourceDir`.

[libclc] Only create a target per each compile command for cmake MSVC generator (#154479)

2025-08-21T23:45:42+00:00

libclc sequential build issue addressed in commit 0c21d6b4c8ad is
specific to cmake MSVC generator. Therefore, this PR avoids creating a
large number of targets when a non-MSVC generator is used, such as the
Ninja generator, which is used in pre-merge CI on Windows in
llvm-project repo. We plan to migrate from MSVC generator to Ninja
generator in our downstream CI to fix flaky cmake bug `Cannot restore
timestamp`, which might be related to the large number of targets.

[libclc] Use elementwise ctlz/cttz builtins for CLC clz/ctz (#154535)

2025-08-21T08:32:03+00:00

Using the elementwise builtin optimizes the vector case; instead of
scalarizing we can compile directly to the vector intrinsics.