llvm-project.git/flang/lib/Optimizer/CodeGen/FIROpPatterns.cpp, branch main

[Flang][OpenMP] Additional global address space modifications for device (#119585)

2025-09-17T01:27:03+00:00

A prior PR added a portion of the global address space modifications
required for declare target to, this PR seeks to add
 a small amount more leftover from that PR.
   
The intent is to allow for more correct IR that the backends (in
particular AMDGPU) can treat more aptly for optimisations and code
correctness
    
1/3 required PRs to enable declare target to mapping, should look at PR
3/3 to check for full green passes (this one will fail a number due to
some dependencies).
    
Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com

[mlir][NFC] update `flang/lib` create APIs (12/n) (#149914)

2025-07-24T23:05:40+00:00

See https://github.com/llvm/llvm-project/pull/147168 for more info.

[flang] Emit `fir.global` in the global address space (#146653)

2025-07-02T15:15:22+00:00

Instead of emitting globals in the program/default address space, emit
them in the global address space. This also requires changes how address
of code-gen is handled, we need to cast to the default address space to
prevent code-gen issues.

[flang] Read the extra field from the in box when doing reboxing (#102992)

2024-08-14T18:23:56+00:00

Updated version of #102686. The issue was that in some rebox case the
addendum presence flag should be updated and not always taken from the
"from" box. This is the case when reboxing a fir.class to a fir.box that
doesn't require an addendum for example.

Open a new review since there is a bit of additional code in the CodeGen
part.

Revert "[flang] Read the extra field from the in box when doing reboxing" (#102931)

2024-08-12T16:35:50+00:00

Reverts llvm/llvm-project#102686 as it might be the source of buildbot
failures https://lab.llvm.org/buildbot/#/builders/143/builds/1392.

[flang] Read the extra field from the in box when doing reboxing (#102686)

2024-08-12T15:48:27+00:00

The extra field in the descriptor carries multiple information and
cannot be deducted anymore when doing a reboxing. This patch updates the
codegen to retrieve the extra field value from the inboc and set it in
the new box.

[flang] Match the type of the element size in the box in getValueFromBox (#100512)

2024-08-06T22:23:05+00:00

Currently, `%17 = fir.box_elesize %16 :
(!fir.class>>) -> i32`
is translated to
```
  %4 = getelementptr { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, ptr %1, i32 0, i32 1
  %5 = load i32, ptr %4, align 4
```
The type of the element size is `i64`. The load essentially truncates
the value and yields incorrect result in the big endian environment. The
problem occurs in the `storage_size` intrinsic on a polymorphic
variable.

[Flang] Hoisting constant-sized allocas at flang codegen. (#95310)

2024-06-14T16:36:05+00:00

This change modifies the `AllocaOpConversion` in flang codegen to insert
constant-sized LLVM allocas at the entry block of `LLVMFuncOp` or
OpenACC/OpenMP Op, rather than in-place at the `fir.alloca`. This
effectively hoists constant-sized FIR allocas to the proper block.

When compiling the example subroutine below with `flang-new`, we get a
llvm.stacksave/stackrestore pair around a constant-sized `fir.alloca
i32`.

```
subroutine test(n)
    block
      integer :: n
      print *, n
    end block
  end subroutine test
```

Without the proposed change, downstream LLVM compilation cannot hoist
this constant-sized alloca out of the stacksave/stackrestore region
which may lead to missed downstream optimizations:

```
*** IR Dump After Safe Stack instrumentation pass (safe-stack) ***
define void @test_(ptr %0) !dbg !3 {
  %2 = call ptr @llvm.stacksave.p0(), !dbg !7
  %3 = alloca i32, i64 1, align 4, !dbg !8
  %4 = call ptr @_FortranAioBeginExternalListOutput(i32 6, ptr @_QQclX62c91d05f046c7a656e7978eb13f2821, i32 4), !dbg !9
  %5 = load i32, ptr %3, align 4, !dbg !10, !tbaa !11
  %6 = call i1 @_FortranAioOutputInteger32(ptr %4, i32 %5), !dbg !10
  %7 = call i32 @_FortranAioEndIoStatement(ptr %4), !dbg !9
  call void @llvm.stackrestore.p0(ptr %2), !dbg !15
  ret void, !dbg !16
}
```

With this change, the `llvm.alloca` is already hoisted out of the
stacksave/stackrestore region during flang codegen:

```
// -----// IR Dump After FIRToLLVMLowering (fir-to-llvm-ir) //----- //
  llvm.func @test_(%arg0: !llvm.ptr {fir.bindc_name = "n"}) attributes {fir.internal_name = "_QPtest"} {
    %0 = llvm.mlir.constant(4 : i32) : i32
    %1 = llvm.mlir.constant(1 : i64) : i64
    %2 = llvm.alloca %1 x i32 {bindc_name = "n"} : (i64) -> !llvm.ptr
    %3 = llvm.mlir.constant(6 : i32) : i32
    %4 = llvm.mlir.undef : i1
    %5 = llvm.call @llvm.stacksave.p0() {fastmathFlags = #llvm.fastmath} : () -> !llvm.ptr
    %6 = llvm.mlir.addressof @_QQclX62c91d05f046c7a656e7978eb13f2821 : !llvm.ptr
    %7 = llvm.call @_FortranAioBeginExternalListOutput(%3, %6, %0) {fastmathFlags = #llvm.fastmath} : (i32, !llvm.ptr, i32) -> !llvm.ptr
    %8 = llvm.load %2 {tbaa = [#tbaa_tag]} : !llvm.ptr -> i32
    %9 = llvm.call @_FortranAioOutputInteger32(%7, %8) {fastmathFlags = #llvm.fastmath} : (!llvm.ptr, i32) -> i1
    %10 = llvm.call @_FortranAioEndIoStatement(%7) {fastmathFlags = #llvm.fastmath} : (!llvm.ptr) -> i32
    llvm.call @llvm.stackrestore.p0(%5) {fastmathFlags = #llvm.fastmath} : (!llvm.ptr) -> ()
    llvm.return
  }
```

---------

Co-authored-by: Vijay Kandiah

[flang][fir] add codegen for fir.load of assumed-rank fir.box (#93569)

2024-05-30T07:30:27+00:00

- Update LLVM type conversion of assumed-rank fir.box/class to generate
the type of the maximum ranked descriptor. That way, alloca for assumed
rank descriptor copies are always big enough. This is needed in the
fir.load case that generates a new storage for the value
- Add a "computeBoxSize" helper to compute the dynamic size of a
descriptor.
- Use that size to generate an llvm.memcpy intrinsic to copy the input
descriptor into the new storage.

Looking at https://reviews.llvm.org/D108221?id=404635, it seems valid to
add the TBAA node on the memcpy, which I did.

In a further patch, I think we should likely always use a memcpy since
LLVM seems to have a better time optimizing it than fir.load/fir.store
patterns.

[flang] update fir.box_rank and fir.is_array codegen (#93541)

2024-05-28T15:32:27+00:00

fir.box_rank codegen was invalid, it was assuming the rank field in the
descriptor was an i32. This is not correct. Do not hard code the type,
use the named position to find the type, and convert as needed in the
patterns.