<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm-project.git/flang/lib/Optimizer/Transforms/LoopVersioning.cpp, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/'/>
<entry>
<title>[mlir][NFC] update `flang/Optimizer/Transforms` create APIs (11/n)  (#149915)</title>
<updated>2025-07-21T23:37:17+00:00</updated>
<author>
<name>Maksim Levental</name>
<email>maksim.levental@gmail.com</email>
</author>
<published>2025-07-21T23:37:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=46f6df0848ea04449c6179ecdedc404ee5b5cf11'/>
<id>46f6df0848ea04449c6179ecdedc404ee5b5cf11</id>
<content type='text'>
See https://github.com/llvm/llvm-project/pull/147168 for more info.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
See https://github.com/llvm/llvm-project/pull/147168 for more info.</pre>
</div>
</content>
</entry>
<entry>
<title>[flang] Optimize redundant array repacking. (#147881)</title>
<updated>2025-07-14T16:41:42+00:00</updated>
<author>
<name>Slava Zakharin</name>
<email>szakharin@nvidia.com</email>
</author>
<published>2025-07-14T16:41:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4775b9689844e1ec7b425e039aff4fcbb758b4d4'/>
<id>4775b9689844e1ec7b425e039aff4fcbb758b4d4</id>
<content type='text'>
This patch allows optimizing redundant array repacking, when
the source array is statically known to be contiguous.
This is part of the implementation plan for the array repacking
feature, though, it does not affect any real life use case
as long as FIR inlining is not a thing. I experimented with
simple cases of FIR inling using `-inline-all`, and I recorded
these cases in optimize-array-repacking.fir tests.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch allows optimizing redundant array repacking, when
the source array is statically known to be contiguous.
This is part of the implementation plan for the array repacking
feature, though, it does not affect any real life use case
as long as FIR inlining is not a thing. I experimented with
simple cases of FIR inling using `-inline-all`, and I recorded
these cases in optimize-array-repacking.fir tests.</pre>
</div>
</content>
</entry>
<entry>
<title>[Flang] Add missing dependent dialects to MLIR passes (#139260)</title>
<updated>2025-05-13T15:17:49+00:00</updated>
<author>
<name>Sergio Afonso</name>
<email>safonsof@amd.com</email>
</author>
<published>2025-05-13T15:17:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=7e8b3fea43f1dfa1d5611a70d887cba5d79b2da9'/>
<id>7e8b3fea43f1dfa1d5611a70d887cba5d79b2da9</id>
<content type='text'>
This patch updates several passes to include the DLTI dialect, since
their use of the `fir::support::getOrSetMLIRDataLayout()` utility
function could, in some cases, require this dialect to be loaded in
advance.

Also, the `CUFComputeSharedMemoryOffsetsAndSize` pass has been updated
with a dependency to the GPU dialect, as its invocation to
`cuf::getOrCreateGPUModule()` would result in the same kind of error if
no other operations or attributes from that dialect were present in the
input MLIR module.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch updates several passes to include the DLTI dialect, since
their use of the `fir::support::getOrSetMLIRDataLayout()` utility
function could, in some cases, require this dialect to be loaded in
advance.

Also, the `CUFComputeSharedMemoryOffsetsAndSize` pass has been updated
with a dependency to the GPU dialect, as its invocation to
`cuf::getOrCreateGPUModule()` would result in the same kind of error if
no other operations or attributes from that dialect were present in the
input MLIR module.</pre>
</div>
</content>
</entry>
<entry>
<title>[flang] Propagate contiguous attribute through HLFIR. (#138797)</title>
<updated>2025-05-13T01:33:47+00:00</updated>
<author>
<name>Slava Zakharin</name>
<email>szakharin@nvidia.com</email>
</author>
<published>2025-05-13T01:33:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=2d12d31f44acac54d7b2858624cb8a1db5a0a8ce'/>
<id>2d12d31f44acac54d7b2858624cb8a1db5a0a8ce</id>
<content type='text'>
This change allows marking more designators producing an opaque
box with 'contiguous' attribute, e.g. like in test1 case
in flang/test/HLFIR/propagate-contiguous-attribute.fir.
This would make isSimplyContiguous() return true for such
designators allowing merging hlfir.eval_in_mem with hlfir.assign
where the LHS is a contiguous array section.

Depends on #139003</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change allows marking more designators producing an opaque
box with 'contiguous' attribute, e.g. like in test1 case
in flang/test/HLFIR/propagate-contiguous-attribute.fir.
This would make isSimplyContiguous() return true for such
designators allowing merging hlfir.eval_in_mem with hlfir.assign
where the LHS is a contiguous array section.

Depends on #139003</pre>
</div>
</content>
</entry>
<entry>
<title>[flang] Recognize fir.pack_array in LoopVersioning. (#133191)</title>
<updated>2025-03-31T18:41:43+00:00</updated>
<author>
<name>Slava Zakharin</name>
<email>szakharin@nvidia.com</email>
</author>
<published>2025-03-31T18:41:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=0ac8cb1b3df724f549a62f6b277745af3d50b23a'/>
<id>0ac8cb1b3df724f549a62f6b277745af3d50b23a</id>
<content type='text'>
This change enables LoopVersioning when `fir.pack_array` is met
in the def-use chain. It fixes a couple of huge performance regressions
caused by enabling `-frepack-arrays`.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change enables LoopVersioning when `fir.pack_array` is met
in the def-use chain. It fixes a couple of huge performance regressions
caused by enabling `-frepack-arrays`.</pre>
</div>
</content>
</entry>
<entry>
<title>[Flang] Move non-common headers to FortranSupport (#124416)</title>
<updated>2025-02-06T14:29:10+00:00</updated>
<author>
<name>Michael Kruse</name>
<email>llvm-project@meinersbur.de</email>
</author>
<published>2025-02-06T14:29:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=b815a3942a0b0a9e7aab6b269ffdb0e93abc4368'/>
<id>b815a3942a0b0a9e7aab6b269ffdb0e93abc4368</id>
<content type='text'>
Move non-common files from FortranCommon to FortranSupport (analogous to
LLVMSupport) such that

* declarations and definitions that are only used by the Flang compiler,
but not by the runtime, are moved to FortranSupport

* declarations and definitions that are used by both ("common"), the
compiler and the runtime, remain in FortranCommon

* generic STL-like/ADT/utility classes and algorithms remain in
FortranCommon

This allows a for cleaner separation between compiler and runtime
components, which are compiled differently. For instance, runtime
sources must not use STL's `&lt;optional&gt;` which causes problems with CUDA
support. Instead, the surrogate header `flang/Common/optional.h` must be
used. This PR fixes this for `fast-int-sel.h`.

Declarations in include/Runtime are also used by both, but are
header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler
and runtime, is also moved into FortranCommon.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move non-common files from FortranCommon to FortranSupport (analogous to
LLVMSupport) such that

* declarations and definitions that are only used by the Flang compiler,
but not by the runtime, are moved to FortranSupport

* declarations and definitions that are used by both ("common"), the
compiler and the runtime, remain in FortranCommon

* generic STL-like/ADT/utility classes and algorithms remain in
FortranCommon

This allows a for cleaner separation between compiler and runtime
components, which are compiled differently. For instance, runtime
sources must not use STL's `&lt;optional&gt;` which causes problems with CUDA
support. Instead, the surrogate header `flang/Common/optional.h` must be
used. This PR fixes this for `fast-int-sel.h`.

Declarations in include/Runtime are also used by both, but are
header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler
and runtime, is also moved into FortranCommon.</pre>
</div>
</content>
</entry>
<entry>
<title>[Flang][MLIR] Extend DataLayout utilities to have basic GPU Module support (#123149)</title>
<updated>2025-01-30T16:31:50+00:00</updated>
<author>
<name>agozillon</name>
<email>Andrew.Gozillon@amd.com</email>
</author>
<published>2025-01-30T16:31:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=4186805060bb06dc3362707d45e6f0f826208a8f'/>
<id>4186805060bb06dc3362707d45e6f0f826208a8f</id>
<content type='text'>
As there is now certain areas where we now have the possibility of
having either a ModuleOp or GPUModuleOp and both of these modules can
have DataLayout's and we may require utilising the DataLayout utilities
in these areas I've taken the liberty of trying to extend them for use
with both.

Those with more knowledge of how they wish the GPUModuleOp's to interact
with their parent ModuleOp's DataLayout may have further alterations
they wish to make in the future, but for the moment, it'll simply
utilise the basic data layout construction which I believe combines
parent and child datalayouts from the ModuleOp and GPUModuleOp. If there
is no GPUModuleOp DataLayout it should default to the parent ModuleOp.

It's worth noting there is some weirdness if you have two module
operations defining builtin dialect DataLayout Entries, it appears the
combinatorial functionality for DataLayouts doesn't support the merging
of these.

This behaviour is useful for areas like:
https://github.com/llvm/llvm-project/pull/119585/files#diff-19fc4bcb38829d085e25d601d344bbd85bf7ef749ca359e348f4a7c750eae89dR1412
where we have a crossroads between the two different module operations.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As there is now certain areas where we now have the possibility of
having either a ModuleOp or GPUModuleOp and both of these modules can
have DataLayout's and we may require utilising the DataLayout utilities
in these areas I've taken the liberty of trying to extend them for use
with both.

Those with more knowledge of how they wish the GPUModuleOp's to interact
with their parent ModuleOp's DataLayout may have further alterations
they wish to make in the future, but for the moment, it'll simply
utilise the basic data layout construction which I believe combines
parent and child datalayouts from the ModuleOp and GPUModuleOp. If there
is no GPUModuleOp DataLayout it should default to the parent ModuleOp.

It's worth noting there is some weirdness if you have two module
operations defining builtin dialect DataLayout Entries, it appears the
combinatorial functionality for DataLayouts doesn't support the merging
of these.

This behaviour is useful for areas like:
https://github.com/llvm/llvm-project/pull/119585/files#diff-19fc4bcb38829d085e25d601d344bbd85bf7ef749ca359e348f4a7c750eae89dR1412
where we have a crossroads between the two different module operations.</pre>
</div>
</content>
</entry>
<entry>
<title>[flang] Enable loop-versioning for slices. (#120344)</title>
<updated>2024-12-23T15:53:10+00:00</updated>
<author>
<name>Slava Zakharin</name>
<email>szakharin@nvidia.com</email>
</author>
<published>2024-12-23T15:53:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=711419e3025678511e3d26c4c30d757f9029d598'/>
<id>711419e3025678511e3d26c4c30d757f9029d598</id>
<content type='text'>
Loops resulting from array expressions like array(:,i)
may be versioned for the unit stride of the innermost dimension,
when the initial array is an assumed-shape array (which are contiguous
in many Fortran programs).
This speeds up facerec for about 12% due to further vectorization
of the innermost loop produced for the total SUM reduction.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Loops resulting from array expressions like array(:,i)
may be versioned for the unit stride of the innermost dimension,
when the initial array is an assumed-shape array (which are contiguous
in many Fortran programs).
This speeds up facerec for about 12% due to further vectorization
of the innermost loop produced for the total SUM reduction.</pre>
</div>
</content>
</entry>
<entry>
<title>[flang] replace fir.complex usages with mlir complex (#110850)</title>
<updated>2024-10-03T15:10:57+00:00</updated>
<author>
<name>jeanPerier</name>
<email>jperier@nvidia.com</email>
</author>
<published>2024-10-03T15:10:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=c4204c0b29a6721267b1bcbaeedd7b1118e42396'/>
<id>c4204c0b29a6721267b1bcbaeedd7b1118e42396</id>
<content type='text'>
Core patch of
https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292.
After that, the last step is to remove fir.complex from FIR types.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Core patch of
https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292.
After that, the last step is to remove fir.complex from FIR types.</pre>
</div>
</content>
</entry>
<entry>
<title>[flang][debug] Support derived types. (#99476)</title>
<updated>2024-08-27T09:30:49+00:00</updated>
<author>
<name>Abid Qadeer</name>
<email>haqadeer@amd.com</email>
</author>
<published>2024-08-27T09:30:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/llvm-project.git/commit/?id=d07dc73bcfcd4026b956eb08b770ff0c47546b66'/>
<id>d07dc73bcfcd4026b956eb08b770ff0c47546b66</id>
<content type='text'>
This PR adds initial debug support for derived type. It handles
`RecordType` and generates appropriate `DICompositeTypeAttr`. The
`TypeInfoOp` is used to get information about the parent and location of
the derived type.

We use `getTypeSizeAndAlignment` to get the size and alignment of the
components of the derived types. This function needed a few changes to
be suitable to be used here:

1. The `getTypeSizeAndAlignment` errored out on unsupported type which
would not work with incremental way we are building debug support. A new
variant of this function has been that returns an std::optional. The original
function has been renamed to `getTypeSizeAndAlignmentOrCrash` as it
will call `TODO()` for unsupported types.

2. The Character type was returning size of just element and not the
whole string which has been fixed.

The testcase checks for offsets of the components which had to be
hardcoded in the test. So the testcase is currently enabled on x86_64.

With this PR in place, this is how the debugging of derived types look
like:

```
type :: t_date
    integer :: year, month, day
  end type

  type :: t_address
    integer :: house_number
  end type
  type, extends(t_address) :: t_person
    character(len=20) name
  end type
  type, extends(t_person)  :: t_employee
    type(t_date) :: hired_date
    real :: monthly_salary
  end type
  type(t_employee) :: employee

(gdb) p employee
$1 = ( t_person = ( t_address = ( house_number = 1 ), name = 'John', ' ' &lt;repeats 16 times&gt; ), hired_date = ( year = 2020, month = 1, day = 20 ), monthly_salary = 3.1400001 )
```</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This PR adds initial debug support for derived type. It handles
`RecordType` and generates appropriate `DICompositeTypeAttr`. The
`TypeInfoOp` is used to get information about the parent and location of
the derived type.

We use `getTypeSizeAndAlignment` to get the size and alignment of the
components of the derived types. This function needed a few changes to
be suitable to be used here:

1. The `getTypeSizeAndAlignment` errored out on unsupported type which
would not work with incremental way we are building debug support. A new
variant of this function has been that returns an std::optional. The original
function has been renamed to `getTypeSizeAndAlignmentOrCrash` as it
will call `TODO()` for unsupported types.

2. The Character type was returning size of just element and not the
whole string which has been fixed.

The testcase checks for offsets of the components which had to be
hardcoded in the test. So the testcase is currently enabled on x86_64.

With this PR in place, this is how the debugging of derived types look
like:

```
type :: t_date
    integer :: year, month, day
  end type

  type :: t_address
    integer :: house_number
  end type
  type, extends(t_address) :: t_person
    character(len=20) name
  end type
  type, extends(t_person)  :: t_employee
    type(t_date) :: hired_date
    real :: monthly_salary
  end type
  type(t_employee) :: employee

(gdb) p employee
$1 = ( t_person = ( t_address = ( house_number = 1 ), name = 'John', ' ' &lt;repeats 16 times&gt; ), hired_date = ( year = 2020, month = 1, day = 20 ), monthly_salary = 3.1400001 )
```</pre>
</div>
</content>
</entry>
</feed>
