summaryrefslogtreecommitdiff
path: root/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
AgeCommit message (Collapse)Author
2025-01-14[MemProf] Fix an assertion when writing distributed index for aliasee (#122946)Teresa Johnson
The ThinLTO index bitcode writer uses a helper forEachSummary to manage preparation and writing of summaries needed for each distributed index file. For alias summaries, it invokes the provided callback for the aliasee as well, as we at least need to produce a value id for the alias's summary. However, all summary generation for the aliasee itself should be skipped on calls when IsAliasee is true. We invoke the callback again if that value's summary is to be written as well. We were asserting in debug mode when invoking collectMemProfCallStacks, because a given stack id index was not in the StackIdIndicesToIndex map. It was not added because the forEachSummary invocation that records these ids in the map (invoked from the IndexBitcodeWriter constructor) was correctly skipping this handling when invoked for aliasees. We need the same guard in the invocation that calls collectMemProfCallStacks. Note that this doesn't cause any real problems in a non-asserts build as the missing map lookup will return the default 0 value from the map, which isn't used since we don't actually write the corresponding summary.
2025-01-13[IR] Introduce captures attribute (#116990)Nikita Popov
This introduces the `captures` attribute as described in: https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420 This initial patch only introduces the IR/bitcode support for the attribute and its in-memory representation as `CaptureInfo`. This will be followed by a patch to upgrade and remove the `nocapture` attribute, and then by actual inference/analysis support. Based on the RFC feedback, I've used a syntax similar to the `memory` attribute, though the only "location" that can be specified is `ret`. I've added some pretty extensive documentation to LangRef on the semantics. One non-obvious bit here is that using ptrtoint will not result in a "return-only" capture, even if the ptrtoint result is only used in the return value. Without this requirement we wouldn't be able to continue ordinary capture analysis on the return value.
2024-12-17[TySan] Add initial Type Sanitizer (LLVM) (#76259)Florian Hahn
This patch introduces the LLVM components of a type sanitizer: a sanitizer for type-based aliasing violations. It is based on Hal Finkel's https://reviews.llvm.org/D32198. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test in the runtime patch. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. When the sanitizer is active, we disable actually using the TBAA metadata for AA. This way we're less likely to use TBAA to remove memory accesses that we'd like to verify. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. It goes together with the corresponding clang changes (https://github.com/llvm/llvm-project/pull/76260) and compiler-rt changes (https://github.com/llvm/llvm-project/pull/76261) PR: https://github.com/llvm/llvm-project/pull/76259
2024-12-02[ThinLTO]Supports declaration import for global variables in distributed ↵Mingming Liu
ThinLTO (#117616) When `-import-declaration` option is enabled, declaration import is supported for functions. https://github.com/llvm/llvm-project/pull/88024 has the context for this option. This patch supports declaration import for global variables in distributed ThinLTO. The motivating use case is to propagate `dso_local` attribute of global variables across modules, to optimize global variable access when a binary is built with `-fno-direct-access-external-data`. * With `-fdirect-access-external-data`, non thread-local global variables will [have `dso_local` attributes](https://github.com/llvm/llvm-project/blob/fe3c23b439b9a2d00442d9bc6a4ca86f73066a3d/clang/lib/CodeGen/CodeGenModule.cpp#L1730-L1746). This optimizes the global variable access as shown by https://gcc.godbolt.org/z/vMzWcKdh3
2024-11-24[memprof] Speed up llvm-profdata (#117446)Kazu Hirata
CallStackRadixTreeBuilder::build takes the parameter MemProfFrameIndexes by value, involving copies: std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>> MemProfFrameIndexes Then "build" makes another copy of MemProfFrameIndexe and passes it to encodeCallStack for every call stack, which is painfully slow. This patch changes the type to a pointer so that we don't have to make a copy every time we pass the argument. Without this patch, it takes 553 seconds to run "llvm-profdata merge" on a large MemProf raw profile. This patch shortenes that down to 67 seconds.
2024-11-22Reapply "[MemProf] Use radix tree for alloc contexts in bitcode summaries" ↵Teresa Johnson
(#117395) (#117404) This reverts commit fdb050a5024320ec29d2edf3f2bc686c3a84abaa, and restores ccb4702038900d82d1041ff610788740f5cef723, with a fix for build bot failures. Specifically, add ProfileData to the dependences of the BitWriter library, which was causing shared library builds of LLVM to fail. Reproduced the failure with a shared library build and confirmed this change fixes that build failure.
2024-11-22Revert "[MemProf] Use radix tree for alloc contexts in bitcode summaries" ↵Teresa Johnson
(#117395) Reverts llvm/llvm-project#117066 This is causing some build bot failures that need investigation.
2024-11-22[MemProf] Use radix tree for alloc contexts in bitcode summaries (#117066)Teresa Johnson
Leverage the support added to represent allocation contexts in a more compact way via a radix tree in the indexed profile to similarly reduce sizes of the bitcode summaries. For a large target, this reduced the size of the per-module summaries by about 18% and in the distributed combined index files by 28%.
2024-11-20[NFCI][WPD]Use unique string saver to store type id (#106932)Mingming Liu
Currently, both [TypeIdMap](https://github.com/llvm/llvm-project/blob/67a1fdb014790a38a205d28e1748634de34471dd/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1356) and [TypeIdCompatibleVtableMap](https://github.com/llvm/llvm-project/blob/67a1fdb014790a38a205d28e1748634de34471dd/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1363) keep type-id as `std::string` in the combined index for LTO indexing analysis. With this change, index uses a unique-string-saver to own the string copies and two maps above can use string references to save some memory. This shows a 3% memory reduction (from 8.2GiB to 7.9GiB) in an internal binary with high indexing memory usage.
2024-11-20[LLVM][NFC] Use `used`'s element type if available (#116804)Alex Voicu
When embedding, if `compiler.used` exists, we should re-use it's element type instead of blindly assuming it's an unqualified pointer.
2024-11-18[MemProf] Change the STACK_ID record to fixed width values (#116448)Teresa Johnson
The stack ids are hashes that are close to 64 bits in size, so emitting as a pair of 32-bit fixed-width values is more efficient than a VBR. This reduced the summary bitcode size for a large target by about 1%. Bump the index version and ensure we can read the old format.
2024-11-15[MemProf] Print full context hash when reporting hinted bytes (#114465)Teresa Johnson
Improve the information printed when -memprof-report-hinted-sizes is enabled. Now print the full context hash computed from the original profile, similar to what we do when reporting matching statistics. This will make it easier to correlate with the profile. Note that the full context hash must be computed at profile match time and saved in the metadata and summary, because we may trim the context during matching when it isn't needed for distinguishing hotness. Similarly, due to the context trimming, we may have more than one full context id and total size pair per MIB in the metadata and summary, which now get a list of these pairs. Remove the old aggregate size from the metadata and summary support. One other change from the prior support is that we no longer write the size information into the combined index for the LTO backends, which don't use this information, which reduces unnecessary bloat in distributed index files.
2024-11-13[DebugInfo] Add a specification attribute to LLVM DebugInfo (#115362)Augusto Noronha
Add a specification attribute to LLVM DebugInfo, which is analogous to DWARF's DW_AT_specification. According to the DWARF spec: "A debugging information entry that represents a declaration that completes another (earlier) non-defining declaration may have a DW_AT_specification attribute whose value is a reference to the debugging information entry representing the non-defining declaration." This patch allows types to be specifications of other types. This is used by Swift to represent generic types. For example, given this Swift program: ``` struct MyStruct<T> { let t: T } let variable = MyStruct<Int>(t: 43) ``` The Swift compiler emits (roughly) an unsubtituted type for MyStruct<T>: ``` DW_TAG_structure_type DW_AT_name ("MyStruct") // "$s1w8MyStructVyxGD" is a Swift mangled name roughly equivalent to // MyStruct<T> DW_AT_linkage_name ("$s1w8MyStructVyxGD") // other attributes here ``` And a specification for MyStruct<Int>: ``` DW_TAG_structure_type DW_AT_specification (<link to "MyStruct">) // "$s1w8MyStructVySiGD" is a Swift mangled name equivalent to // MyStruct<Int> DW_AT_linkage_name ("$s1w8MyStructVySiGD") DW_AT_byte_size (0x08) // other attributes here ```
2024-11-06[DebugInfo] Add num_extra_inhabitants to debug info (#112590)Augusto Noronha
An extra inhabitant is a bit pattern that does not represent a valid value for instances of a given type. The number of extra inhabitants is the number of those bit configurations. This is used by Swift to save space when composing types. For example, because Bool only needs 2 bit patterns to represent all of its values (true and false), an Optional<Bool> only occupies 1 byte in memory by using a bit configuration that is unused by Bool. Which bit patterns are unused are part of the ABI of the language. Since Swift generics are not monomorphized, by using dynamic libraries you can have generic types whose size, alignment, etc, are known only at runtime (which is why this feature is needed). This patch adds num_extra_inhabitants to LLVM-IR debug info and in DWARF as an Apple extension.
2024-10-26[rtsan][llvm][NFC] Rename sanitize_realtime_unsafe attr to ↵davidtrevelyan
sanitize_realtime_blocking (#113155) # What This PR renames the newly-introduced llvm attribute `sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise, sibling variables such as `SanitizeRealtimeUnsafe` are renamed to `SanitizeRealtimeBlocking` respectively. There are no other functional changes. # Why? - There are a number of problems that can cause a function to be real-time "unsafe", - we wish to communicate what problems rtsan detects and *why* they're unsafe, and - a generic "unsafe" attribute is, in our opinion, too broad a net - which may lead to future implementations that need extra contextual information passed through them in order to communicate meaningful reasons to users. - We want to avoid this situation and make the runtime library boundary API/ABI as simple as possible, and - we believe that restricting the scope of attributes to names like `sanitize_realtime_blocking` is an effective means of doing so. We also feel that the symmetry between `[[clang::blocking]]` and `sanitize_realtime_blocking` is easier to follow as a developer. # Concerns - I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been part of the tree for a few weeks now (introduced here: https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't been released in version 20 yet, am I correct in considering this to not be a breaking change?
2024-10-15[IR] Add `samesign` flag to icmp instruction (#111419)elhewaty
Inspired by https://discourse.llvm.org/t/rfc-signedness-independent-icmps/81423
2024-10-12[LLVM] New NoDivergenceSource function attribute (#111832)Tim Renouf
A call to a function that has this attribute is not a source of divergence, as used by UniformityAnalysis. That allows a front-end to use known-name calls as an instruction extension mechanism (e.g. https://github.com/GPUOpen-Drivers/llvm-dialects ) without such a call being a source of divergence.
2024-10-11[IR] Allow MDString in operand bundles (#110805)Serge Pavlov
This change implements support of metadata strings in operand bundle values. It makes possible calls like: call void @some_func(i32 %x) [ "foo"(i32 42, metadata !"abc") ] It requires some extension of the bitcode serialization. As SSA values and metadata are stored in different tables, there must be a way to distinguish them during deserialization. It is implemented by putting a special marker before the metadata index. The marker cannot be treated as a reference to any SSA value, so it unambiguously identifies metadata. It allows extending the bitcode serialization without breaking compatibility. Metadata as operand bundle values are intended to be used in floating-point function calls. They would represent the same information as now is passed by the constrained intrinsic arguments.
2024-09-19[LLVM][rtsan] Add `sanitize_realtime_unsafe` attribute (#106754)davidtrevelyan
2024-09-19Target ABI: improve call parameters extensions handling (#100757)Jonas Paulsson
For the purpose of verifying proper arguments extensions per the target's ABI, introduce the NoExt attribute that may be used by a target when neither sign- or zeroextension is required (e.g. with a struct in register). The purpose of doing so is to be able to verify that there is always one of these attributes present and by this detecting cases where sign/zero extension is actually missing. As a first step, this patch has the verification step done for the SystemZ backend only, but left off by default until all known issues have been addressed. Other targets/front-ends can now also add NoExt attribute where needed and do this check in the backend.
2024-09-08[Clang] C++20 Coroutines: Introduce Frontend Attribute ↵Yuxuan Chen
[[clang::coro_await_elidable]] (#99282) This patch is the frontend implementation of the coroutine elide improvement project detailed in this discourse post: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 This patch proposes a C++ struct/class attribute `[[clang::coro_await_elidable]]`. This notion of await elidable task gives developers and library authors a certainty that coroutine heap elision happens in a predictable way. Originally, after we lower a coroutine to LLVM IR, CoroElide is responsible for analysis of whether an elision can happen. Take this as an example: ``` Task foo(); Task bar() { co_await foo(); } ``` For CoroElide to happen, the ramp function of `foo` must be inlined into `bar`. This inlining happens after `foo` has been split but `bar` is usually still a presplit coroutine. If `foo` is indeed a coroutine, the inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide then runs an analysis to figure out whether the SSA value of `coro.begin()` of `foo` gets destroyed before `bar` terminates. `Task` types are rarely simple enough for the destroy logic of the task to reference the SSA value from `coro.begin()` directly. Hence, the pass is very ineffective for even the most trivial C++ Task types. Improving CoroElide by implementing more powerful analyses is possible, however it doesn't give us the predictability when we expect elision to happen. The approach we want to take with this language extension generally originates from the philosophy that library implementations of `Task` types has the control over the structured concurrency guarantees we demand for elision to happen. That is, the lifetime for the callee's frame is shorter to that of the caller. The ``[[clang::coro_await_elidable]]`` is a class attribute which can be applied to a coroutine return type. When a coroutine function that returns such a type calls another coroutine function, the compiler performs heap allocation elision when the following conditions are all met: - callee coroutine function returns a type that is annotated with ``[[clang::coro_await_elidable]]``. - In caller coroutine, the return value of the callee is a prvalue that is immediately `co_await`ed. From the C++ perspective, it makes sense because we can ensure the lifetime of elided callee cannot exceed that of the caller if we can guarantee that the caller coroutine is never destroyed earlier than the callee coroutine. This is not generally true for any C++ programs. However, the library that implements `Task` types and executors may provide this guarantee to the compiler, providing the user with certainty that HALO will work on their programs. After this patch, when compiling coroutines that return a type with such attribute, the frontend checks that the type of the operand of `co_await` expressions (not `operator co_await`). If it's also attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata on the call or invoke instruction as a hint for a later middle end pass to elide the elision. The original patch version is https://github.com/llvm/llvm-project/pull/94693 and as suggested, the patch is split into frontend and middle end solutions into stacked PRs. The middle end CoroSplit patch can be found at https://github.com/llvm/llvm-project/pull/99283 The middle end transformation that performs the elide can be found at https://github.com/llvm/llvm-project/pull/99285
2024-09-06[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding ↵Mingming Liu
synthetic count passes. (#107471) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of https://github.com/llvm/llvm-project/commit/64498c54831bed9cf069e0923b9b73678c6451d8). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version
2024-09-06Add usub_cond and usub_sat operations to atomicrmw (#105568)anjenner
These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.
2024-08-30Revert "[LLVM][rtsan] Add LLVM nosanitize_realtime attribute (#105447)" ↵Chris Apple
(#106743) This reverts commit 178fc4779ece31392a2cd01472b0279e50b3a199. This attribute was not needed now that we are using the lsan style ScopedDisabler for disabling this sanitizer See #106736 #106125 For more discussion
2024-08-27Reapply: Use an abbrev to reduce size of VALUE_GUID records in ThinLTO ↵Jan Voung
summaries (#106165) This retries #90692 which was reverted previously due to issues with lld-available being set, even if the copy of lld is not built from source. This does not change any code compared to #90692 to address the lld-available issue. The main change w.r.t, lld-available is xfailing tests in PR #99056 (until a longer term fix is available).
2024-08-26[LLVM][rtsan] Add LLVM nosanitize_realtime attribute (#105447)Chris Apple
2024-08-23[IR] Inroduce ModuleToSummariesForIndexTy (NFC) (#105906)Kazu Hirata
This patch introduces type alias ModuleToSummariesForIndexTy. I'm planning to change the type slightly to allow heterogeneous lookup (that is, std::map<K, V, std::less<>>) in a subsequent patch. The problem is that changing the type affects many places. Using a type alias reduces the impact.
2024-08-23[llvm] Use range-based for loops (NFC) (#105861)Kazu Hirata
2024-08-23[Bitcode] Use DenseSet instead of std::set (NFC) (#105851)Kazu Hirata
DefOrUseGUIDs is used only for membership checking purposes. We don't need std::set's strengths like iterators staying valid or the ability to traverse in a sorted order. While I am at it, this patch replaces count with contains for slightly increased readability.
2024-08-15[Bitcode] Use range-based for loops (NFC) (#104534)Kazu Hirata
2024-08-13[Bitcode] Use range-based for loops (NFC) (#103628)Kazu Hirata
2024-08-08[LLVM][rtsan] Add sanitize_realtime attribute for the realtime sanitizer ↵Chris Apple
(#100596) Add a new "sanitize_realtime" attribute, which will correspond to the nonblocking function effect in clang. This is used in the realtime sanitizer transform. Please see the [reviewer support document](https://github.com/realtime-sanitizer/radsan/blob/doc/review-support/doc/review.md) for what our next steps are. The original discourse thread can be found [here](https://discourse.llvm.org/t/rfc-nolock-and-noalloc-attributes/76837)
2024-07-25Remove the `x86_mmx` IR type. (#98505)James Y Knight
It is now translated to `<1 x i64>`, which allows the removal of a bunch of special casing. This _incompatibly_ changes the ABI of any LLVM IR function with `x86_mmx` arguments or returns: instead of passing in mmx registers, they will now be passed via integer registers. However, the real-world incompatibility caused by this is expected to be minimal, because Clang never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>` or `double`, depending on ABI. This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type. That type simply no longer corresponds to an IR type, and is used only by MMX intrinsics and inline-asm operands. Because SelectionDAGBuilder only knows how to generate the operands/results of intrinsics based on the IR type, it thus now generates the intrinsics with the type MVT::v1i64, instead of MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus have the X86 backend fix them up in DAGCombine. (This may be a short-lived hack, if all the MMX intrinsics can be removed in upcoming changes.) Works towards issue #98272.
2024-07-19[CodeGen][ARM64EC] Add support for hybrid_patchable attribute. (#92965)Jacek Caban
2024-07-11[MemProf] Track and report profiled sizes through cloning (#98382)Teresa Johnson
If requested, via the -memprof-report-hinted-sizes option, track the total profiled size of each MIB through the thin link, then report on the corresponding allocation coldness after all cloning is complete. To save size, a different bitcode record type is used for the allocation info when the option is specified, and the sizes are kept separate from the MIBs in the index.
2024-07-08Reland "[ThinLTO][Bitcode] Generate import type in bitcode" (#97253)Mingming Liu
https://github.com/llvm/llvm-project/pull/87600 was reverted in order to revert https://github.com/llvm/llvm-project/commit/6262763341fcd71a2b0708cf7485f9abd1d26ba8. Now https://github.com/llvm/llvm-project/pull/95482 is fix forward for https://github.com/llvm/llvm-project/commit/6262763341fcd71a2b0708cf7485f9abd1d26ba8. This patch is a reland for https://github.com/llvm/llvm-project/pull/87600 **Changes on top of original patch** In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of `GVSummaryPtrSet` an `unordered_set` which is more memory efficient when the number of elements is smaller than 128 [1] **Original commit message** For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization. This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly. [1] https://github.com/llvm/llvm-project/blob/393eff4e02e7ab3d234d246a8d6912c8e745e6f9/llvm/lib/Support/SmallPtrSet.cpp#L43
2024-06-25[clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (#95061)Alex Voicu
This patch augments the HIPAMD driver to allow it to target AMDGCN flavoured SPIR-V compilation. It's mostly straightforward, as we re-use some of the existing SPIRV infra, however there are a few notable additions: - we introduce an `amdgcnspirv` offload arch, rather than relying on using `generic` (this is already fairly overloaded) or simply using `spirv` or `spirv64` (we'll want to use these to denote unflavoured SPIRV, once we bring up that capability) - initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU targets, as it would require some relatively intrusive surgery in the HIPAMD Toolchain and the Driver to deal with two triples (`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively) - in order to retain user provided compiler flags and have them available at JIT time, we rely on embedding the command line via `-fembed-bitcode=marker`, which the bitcode writer had previously not implemented for SPIRV; we only allow it conditionally for AMDGCN flavoured SPIRV, and it is handled correctly by the Translator (it ends up as a string literal) Once the SPIRV BE is no longer experimental we'll switch to using that rather than the translator. There's some additional work that'll come via a separate PR around correctly piping through AMDGCN's implementation of `printf`, for now we merely handle its flags correctly.
2024-06-21Add the 'initializes' attribute langref and support (#84803)Haopeng Liu
We propose adding a new LLVM attribute, `initializes((Lo1,Hi1),(Lo2,Hi2),...)`, which expresses the notion of memory space (i.e., intervals, in bytes) that the argument pointing to is initialized in the function. Will commit the attribute inferring in the follow-up PRs. https://discourse.llvm.org/t/rfc-llvm-new-initialized-parameter-attribute-for-improved-interprocedural-dse/77337
2024-06-20Revert "[DebugInfo][BPF] Add 'annotations' field for DIBasicType & DI… ↵eddyz87
(#96172) …SubroutineType (#91422)" This reverts commit 3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c. As reported in [1,2] the commit above causes CI failure for powerpc-aix target. There is also a performance regression reported in [3]. Reverting to comply with the developer policy. [1] https://github.com/llvm/llvm-project/pull/91422#issuecomment-2179425473 [2] https://lab.llvm.org/buildbot/#/builders/64/builds/62 [3] https://github.com/llvm/llvm-project/pull/91422#issuecomment-2175631443
2024-06-18[DebugInfo][BPF] Add 'annotations' field for DIBasicType & DISubroutineType ↵eddyz87
(#91422) Extend `DIBasicType` and `DISubroutineType` with additional field `annotations`, e.g. as below: ``` !5 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed, annotations: !6) !6 = !{!7} !7 = !{!"btf:type_tag", !"tag1"} ``` The field would be used by BPF backend to generate DWARF attributes corresponding to `btf_type_tag` type attributes, e.g.: ``` 0x00000029: DW_TAG_base_type DW_AT_name ("int") DW_AT_encoding (DW_ATE_signed) DW_AT_byte_size (0x04) 0x0000002d: DW_TAG_LLVM_annotation DW_AT_name ("btf:type_tag") DW_AT_const_value ("tag1") ``` Such DWARF entries would be used to generate BTF definitions by tools like [pahole](https://github.com/acmel/dwarves). Note: similar fields with similar purposes are already present in DIDerivedType and DICompositeType. Currently "btf_type_tag" attributes are represented in debug information as 'annotations' fields in DIDerivedType with DW_TAG_pointer_type tag. The annotation on a pointer corresponds to pointee having the attributes in the final BTF. The discussion in [thread](https://lore.kernel.org/bpf/87r0w9jjoq.fsf@oracle.com/) came to conclusion, that such annotations should apply to the annotated type itself. Hence the necessity to extend `DIBasicType` & `DISubroutineType` types with 'annotations' field to represent cases like below: ``` int __attribute__((btf_type_tag("foo"))) bar; ``` This was previously tracked as differential revision: https://reviews.llvm.org/D143966
2024-06-10[LLVM][IR][Sanitizers] Add sanitize_numerical_stability attribute (#95051)Alexander Shaposhnikov
Add sanitize_numerical_stability attribute.
2024-06-05Revert "[ThinLTO][Bitcode] Generate import type in bitcode (#87600)" (#94502)Mingming Liu
This reverts commit 6262763341fcd71a2b0708cf7485f9abd1d26ba8, to prepare for the revert of https://github.com/llvm/llvm-project/pull/92718. https://github.com/llvm/llvm-project/pull/92718 causes LTO indexing OOM in some applications.
2024-06-04[IR] Remove support for icmp and fcmp constant expressions (#93038)Nikita Popov
Remove support for the icmp and fcmp constant expressions. This is part of: https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179 As usual, many of the updated tests will no longer test what they were originally intended to -- this is hard to preserve when constant expressions get removed, and in many cases just impossible as the existence of a specific kind of constant expression was the cause of the issue in the first place.
2024-06-03[MemProf] Determine stack id references in BitcodeWriter without sorting ↵Teresa Johnson
(#94285) A cycle profile of a thin link showed a lot of time spent in sort called from the BitcodeWriter, which was being used to compute the unique references to stack ids in the summaries emitted for each backend in a distributed thinlto build. We were also frequently invoking lower_bound to locate stack id indices in the resulting vector when writing out the referencing memprof records. Change this to use a map to uniquify the references, and to hold the index of the corresponding stack id in the StackIds vector, which is now populated at the same time. This reduced the time of a large thin link by about 10%.
2024-06-03BitcodeWriter: ensure `Buffer` is heap allocatedMircea Trofin
PR #92983 accidentally changed the buffer allocation in `llvm::WriteBitcodeToFile` to be allocated on the stack, which is problematic given it's a large-ish buffer (256K)
2024-05-30Unittests and usability for BitstreamWriter incremental flushing (#92983)Mircea Trofin
- added unittests for the raw_fd_stream output case. - the `BitstreamWriter` ctor was confusing, the relationship between the buffer and the file stream wasn't clear and in fact there was a potential bug in `BitcodeWriter` in the mach-o case, because that code assumed in-buffer only serialization. The incremental flushing behavior of flushing at end of block boundaries was an implementation detail that meant serializers not using blocks (for example) would need to know to check the buffer and flush. The bug was latent - in the sense that, today, because the stream being passed was not a `raw_fd_stream`, incremental buffering never kicked in. The new design moves the responsibility of flushing to the `BitstreamWriter`, and makes it work with any `raw_ostream` (but incrementally flush only in the `raw_fd_stream` case). If the `raw_ostream` is over a buffer - i.e. a `raw_svector_stream` - then it's equivalent to today's buffer case. For all other `raw_ostream` cases, buffering is an implementation detail. In all cases, the buffer is flushed (well, in the buffer case, that's a moot statement). This simplifies the state and state transitions the user has to track: you have a raw_ostream -> BitstreamWrite in it -> destroy the writer => the bitstream is completely written in your raw_ostream. The "buffer" case and the "raw_fd_stream" case become optimizations rather than imposing state transition concerns to the user.
2024-05-28[IR][AArch64][PAC] Add "ptrauth(...)" Constant to represent signed pointers. ↵Ahmed Bougacha
(#85738) This defines a new kind of IR Constant that represents a ptrauth signed pointer, as used in AArch64 PAuth. It allows representing most kinds of signed pointer constants used thus far in the llvm ptrauth implementations, notably those used in the Darwin and ELF ABIs being implemented for c/c++. These signed pointer constants are then lowered to ELF/MachO relocations. These can be simply thought of as a constant `llvm.ptrauth.sign`, with the interesting addition of discriminator computation: the `ptrauth` constant can also represent a combined blend, when both address and integer discriminator operands are used. Both operands are otherwise optional, with default values 0/null.
2024-05-27[IR] Add getelementptr nusw and nuw flags (#90824)Nikita Popov
This implements the `nusw` and `nuw` flags for `getelementptr` as proposed at https://discourse.llvm.org/t/rfc-add-nusw-and-nuw-flags-for-getelementptr/78672. The three possible flags are encapsulated in the new `GEPNoWrapFlags` class. Currently this class has a ctor from bool, interpreted as the InBounds flag. This ctor should be removed in the future, as code gets migrated to handle all flags. There are a few places annotated with `TODO(gep_nowrap)`, where I've had to touch code but opted to not infer or precisely preserve the new flags, so as to keep this as NFC as possible and make sure any changes of that kind get test coverage when they are made.
2024-05-22[ThinLTO][Bitcode] Generate import type in bitcode (#87600)Mingming Liu
For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization. This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly.
2024-05-15Fix typo "indicies" (#92232)Jay Foad