Merge branch 'main' into users/mingmingl-llvm/samplefdo-profile-formatusers/mingmingl-llvm/samplefdo-profile-format

author: Mingming Liu <mingmingl@google.com> 2025-09-10 15:25:31 -0700
committer: GitHub <noreply@github.com> 2025-09-10 15:25:31 -0700
commit: 1417dafa1db9cb1b2b09438aa9f53ea5ab6e36e2 (patch)
tree: 57f4b1f313c8cf74eed8819870f39c36ea263c68 /llvm/docs
parent: 898b813bc8a6d0276bf0f4769f5f2f64b34e632d (diff)
parent: b8cefcb601ddaa18482555c4ff363c01a270c2fe (diff)
32 files changed, 966 insertions, 561 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index b6d61a62f50f..37563203f2f8 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -517,19 +517,19 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
 
      **GCN GFX12 (RDNA 4)** [AMD-GCN-GFX12-RDNA4]_
      -----------------------------------------------------------------------------------------------------------------------
-     ``gfx1200``                 ``amdgcn``   dGPU  - cumode          - Architected                   *TBA*
-                                                    - wavefrontsize64   flat
-                                                                        scratch                       .. TODO::
+     ``gfx1200``                 ``amdgcn``   dGPU  - cumode          - Architected                   - Radeon RX 9060
+                                                    - wavefrontsize64   flat                          - Radeon RX 9060 XT
+                                                                        scratch
                                                                       - Packed
-                                                                        work-item                       Add product
-                                                                        IDs                             names.
+                                                                        work-item
+                                                                        IDs
 
-     ``gfx1201``                 ``amdgcn``   dGPU  - cumode          - Architected                   *TBA*
-                                                    - wavefrontsize64   flat
-                                                                        scratch                       .. TODO::
+     ``gfx1201``                 ``amdgcn``   dGPU  - cumode          - Architected                   - Radeon RX 9070
+                                                    - wavefrontsize64   flat                          - Radeon RX 9070 XT
+                                                                        scratch                       - Radeon RX 9070 GRE
                                                                       - Packed
-                                                                        work-item                       Add product
-                                                                        IDs                             names.
+                                                                        work-item
+                                                                        IDs
 
      ``gfx1250``                 ``amdgcn``   APU                     - Architected                   *TBA*
                                                                         flat
@@ -537,6 +537,8 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
                                                                       - Packed
                                                                         work-item                       Add product
                                                                         IDs                             names.
+                                                                      - Workgroup
+                                                                        Clusters
 
      =========== =============== ============ ===== ================= =============== =============== ======================
 
@@ -768,9 +770,6 @@ For example:
                                                   performant than code generated for XNACK replay
                                                   disabled.
 
-     cu-stores       TODO                         On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used.
-                                                  If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater.
-
      =============== ============================ ==================================================
 
 .. _amdgpu-target-id:
@@ -1098,6 +1097,22 @@ is conservatively correct for OpenCL.
                              - ``wavefront`` and executed by a thread in the
                                same wavefront.
 
+     ``cluster``             Synchronizes with, and participates in modification
+                             and seq_cst total orderings with, other operations
+                             (except image operations) for all address spaces
+                             (except private, or generic that accesses private)
+                             provided the other operation's sync scope is:
+
+                             - ``system``, ``agent`` or ``cluster`` and
+                               executed by a thread on the same cluster.
+                             - ``workgroup`` and executed by a thread in the
+                               same work-group.
+                             - ``wavefront`` and executed by a thread in the
+                               same wavefront.
+
+                             On targets that do not support workgroup cluster
+                             launch mode, this behaves like ``agent`` scope instead.
+
      ``workgroup``           Synchronizes with, and participates in modification
                              and seq_cst total orderings with, other operations
                              (except image operations) for all address spaces
@@ -1131,6 +1146,9 @@ is conservatively correct for OpenCL.
      ``agent-one-as``        Same as ``agent`` but only synchronizes with other
                              operations within the same address space.
 
+     ``cluster-one-as``      Same as ``cluster`` but only synchronizes with other
+                             operations within the same address space.
+
      ``workgroup-one-as``    Same as ``workgroup`` but only synchronizes with
                              other operations within the same address space.
 
@@ -1437,7 +1455,6 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
                                                    Returns a pair for the swapped registers. The first element of the return corresponds
                                                    to the swapped element of the first argument.
 
-
   llvm.amdgcn.permlane32.swap                      Provide direct access to `v_permlane32_swap_b32` instruction on supported targets.
                                                    Swaps the values across lanes of first 2 operands. Rows 2 and 3 of the first operand are
                                                    swapped with rows 0 and 1 of the second operand (one row is 16 lanes).
@@ -1458,6 +1475,25 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
                                                    - `v_mov_b32 <dest> <old>`
                                                    - `v_mov_b32 <dest> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>`
 
+  :ref:`llvm.prefetch <int_prefetch>`              Implemented on gfx1250, ignored on earlier targets.
+                                                   First argument is flat, global, or constant address space pointer.
+                                                   Any other address space is not supported.
+                                                   On gfx125x generates flat_prefetch_b8 or global_prefetch_b8 and brings data to GL2.
+                                                   Second argument is rw and currently ignored. Can be 0 or 1.
+                                                   Third argument is locality, 0-3. Translates to memory scope:
+
+                                                   * 0 - SCOPE_SYS
+                                                   * 1 - SCOPE_DEV
+                                                   * 2 - SCOPE_SE
+                                                   * 3 - SCOPE_SE
+
+                                                   Note that SCOPE_CU is not generated and not safe on an invalid address.
+                                                   Fourth argument is cache type:
+
+                                                   * 0 - Instruction cache, currently ignored and no code is generated.
+                                                   * 1 - Data cache.
+
+                                                   Instruction cache prefetches are unsafe on invalid address.
   ==============================================   ==========================================================
 
 .. TODO::
@@ -5114,9 +5150,7 @@ The fields used by CP for code objects before V3 also match those specified in
                                                      and must be 0,
      >454    1 bit   ENABLE_SGPR_PRIVATE_SEGMENT
                      _SIZE
-     455     1 bit   USES_CU_STORES                  GFX12.5: Whether the ``cu-stores`` target attribute is enabled.
-                                                     If 0, then all stores are ``SCOPE_SE`` or higher.
-     457:456 2 bits                                  Reserved, must be 0.
+     457:455 3 bits                                  Reserved, must be 0.
      458     1 bit   ENABLE_WAVEFRONT_SIZE32         GFX6-GFX9
                                                        Reserved, must be 0.
                                                      GFX10-GFX11
@@ -18254,8 +18288,6 @@ terminated by an ``.end_amdhsa_kernel`` directive.
                                                                                   GFX942)
      ``.amdhsa_user_sgpr_private_segment_size``               0                   GFX6-GFX12   Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
                                                                                                :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
-     ``.amdhsa_uses_cu_stores``                               0                   GFX12.5      Controls USES_CU_STORES in
-                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
      ``.amdhsa_wavefront_size32``                             Target              GFX10-GFX12  Controls ENABLE_WAVEFRONT_SIZE32 in
                                                               Feature                          :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
                                                               Specific
diff --git a/llvm/docs/AddingConstrainedIntrinsics.rst b/llvm/docs/AddingConstrainedIntrinsics.rst
index a6acb6b51536..bd14f121144c 100644
--- a/llvm/docs/AddingConstrainedIntrinsics.rst
+++ b/llvm/docs/AddingConstrainedIntrinsics.rst
@@ -20,13 +20,13 @@ Add the new intrinsic to the table of intrinsics::
 Add SelectionDAG node types
 ===========================
 
-Add the new STRICT version of the node type to the ISD::NodeType enum::
+Add the new ``STRICT`` version of the node type to the ``ISD::NodeType`` enum::
 
   include/llvm/CodeGen/ISDOpcodes.h
 
 Strict version name must be a concatenation of prefix ``STRICT_`` and the name
-of corresponding non-strict node name. For instance, strict version of the
-node FADD must be STRICT_FADD.
+of the corresponding non-strict node name. For instance, strict version of the
+node ``FADD`` must be ``STRICT_FADD``.
 
 Update mappings
 ===============
@@ -51,30 +51,30 @@ Update Selector components
 Building the SelectionDAG
 -------------------------
 
-The function SelectionDAGBuilder::visitConstrainedFPIntrinsic builds DAG nodes
-using mappings specified in ConstrainedOps.def. If however this default build is
+The ``SelectionDAGBuilder::visitConstrainedFPIntrinsic`` function builds DAG nodes
+using mappings specified in ``ConstrainedOps.def``. If however this default build is
 not sufficient, the build can be modified, see how it is implemented for
-STRICT_FP_ROUND. The new STRICT node will eventually be converted
-to the matching non-STRICT node. For this reason it should have the same
-operands and values as the non-STRICT version but should also use the chain.
-This makes subsequent sharing of code for STRICT and non-STRICT code paths
+``STRICT_FP_ROUND``. The new ``STRICT`` node will eventually be converted
+to the matching non-``STRICT`` node. For this reason it should have the same
+operands and values as the non-``STRICT`` version but should also use the chain.
+This makes subsequent sharing of code for ``STRICT`` and non-``STRICT`` code paths
 easier::
 
   lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
 
-Most of the STRICT nodes get legalized the same as their matching non-STRICT
-counterparts. A new STRICT node with this property must get added to the
-switch in SelectionDAGLegalize::LegalizeOp().::
+Most of the ``STRICT`` nodes get legalized the same as their matching non-``STRICT``
+counterparts. A new ``STRICT`` node with this property must get added to the
+switch in ``SelectionDAGLegalize::LegalizeOp()``::
 
   lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
 
 Other parts of the legalizer may need to be updated as well. Look for
-places where the non-STRICT counterpart is legalized and update as needed.
-Be careful of the chain since STRICT nodes use it but their counterparts
+places where the non-``STRICT`` counterpart is legalized and update as needed.
+Be careful of the chain since ``STRICT`` nodes use it but their counterparts
 often don't.
 
-The code to do the conversion or mutation of the STRICT node to a non-STRICT
-version of the node happens in SelectionDAG::mutateStrictFPToFP(). In most cases
+The conversion or mutation of the ``STRICT`` node to a non-``STRICT``
+version of the node happens in ``SelectionDAG::mutateStrictFPToFP()``. In most cases
 the function can do the conversion using information from ConstrainedOps.def. Be
 careful updating this function since some nodes have the same return type
 as their input operand, but some are different. Both of these cases must
@@ -82,13 +82,13 @@ be properly handled::
 
   lib/CodeGen/SelectionDAG/SelectionDAG.cpp
 
-Whether the mutation may happens or not, depends on how the new node has been
-registered in TargetLoweringBase::initActions(). By default all strict nodes are
+Whether the mutation happens or not depends on how the new node has been
+registered in ``TargetLoweringBase::initActions()``. By default, all strict nodes are
 registered with Expand action::
 
   lib/CodeGen/TargetLoweringBase.cpp
 
-To make debug logs readable it is helpful to update the SelectionDAG's
+To make debug logs readable, it is helpful to update the SelectionDAG's
 debug logger:::
 
   lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
diff --git a/llvm/docs/AdvancedBuilds.rst b/llvm/docs/AdvancedBuilds.rst
index ee178dd3772c..9e25355365a8 100644
--- a/llvm/docs/AdvancedBuilds.rst
+++ b/llvm/docs/AdvancedBuilds.rst
@@ -16,7 +16,7 @@ If **you are a new contributor**, please start with the :doc:`GettingStarted` or
 :doc:`CMake` pages. This page is intended for users doing more complex builds.
 
 Many of the examples below are written assuming specific CMake Generators.
-Unless otherwise explicitly called out these commands should work with any CMake
+Unless explicitly stated otherwise, these commands should work with any CMake
 generator.
 
 Many of the build configurations mentioned on this documentation page can be
@@ -30,14 +30,14 @@ Bootstrap Builds
 ================
 
 The Clang CMake build system supports bootstrap (aka multi-stage) builds. At a
-high level a multi-stage build is a chain of builds that pass data from one
+high level, a multi-stage build is a chain of builds that pass data from one
 stage into the next. The most common and simple version of this is a traditional
 bootstrap build.
 
 In a simple two-stage bootstrap build, we build clang using the system compiler,
 then use that just-built clang to build clang again. In CMake this simplest form
 of a bootstrap build can be configured with a single option,
-CLANG_ENABLE_BOOTSTRAP.
+``CLANG_ENABLE_BOOTSTRAP``.
 
 .. code-block:: console
 
@@ -52,9 +52,9 @@ configurations for each stage. The next series of examples utilize CMake cache
 scripts to provide more complex options.
 
 By default, only a few CMake options will be passed between stages.
-The list, called _BOOTSTRAP_DEFAULT_PASSTHROUGH, is defined in clang/CMakeLists.txt.
-To force the passing of the variables between stages, use the -DCLANG_BOOTSTRAP_PASSTHROUGH
-CMake option, each variable separated by a ";". As example:
+The list, called _BOOTSTRAP_DEFAULT_PASSTHROUGH, is defined in ``clang/CMakeLists.txt``.
+To force the passing of the variables between stages, use the ``-DCLANG_BOOTSTRAP_PASSTHROUGH``
+CMake option, each variable separated by a ";". For example:
 
 .. code-block:: console
 
@@ -65,9 +65,9 @@ CMake option, each variable separated by a ";". As example:
       <path to source>/llvm
   $ ninja stage2
 
-CMake options starting by ``BOOTSTRAP_`` will be passed only to the stage2 build.
-This gives the opportunity to use Clang specific build flags.
-For example, the following CMake call will enabled '-fno-addrsig' only during
+CMake options starting with ``BOOTSTRAP_`` will be passed only to the stage2 build.
+This gives the opportunity to use Clang-specific build flags.
+For example, the following CMake call will enable ``-fno-addrsig`` only during
 the stage2 build for C and C++.
 
 .. code-block:: console
@@ -77,7 +77,7 @@ the stage2 build for C and C++.
 The clang build system refers to builds as stages. A stage1 build is a standard
 build using the compiler installed on the host, and a stage2 build is built
 using the stage1 compiler. This nomenclature holds up to more stages too. In
-general a stage*n* build is built using the output from stage*n-1*.
+general, a stage*n* build is built using the output from stage*n-1*.
 
 Apple Clang Builds (A More Complex Bootstrap)
 =============================================
@@ -90,7 +90,7 @@ compiler is a balance of optimization vs build time because it is a throwaway.
 The stage2 compiler is the fully optimized compiler intended to ship to users.
 
 Setting up these compilers requires a lot of options. To simplify the
-configuration the Apple Clang build settings are contained in CMake Cache files.
+configuration, the Apple Clang build settings are contained in CMake Cache files.
 You can build an Apple Clang compiler using the following commands:
 
 .. code-block:: console
@@ -99,7 +99,7 @@ You can build an Apple Clang compiler using the following commands:
   $ ninja stage2-distribution
 
 This CMake invocation configures the stage1 host compiler, and sets
-CLANG_BOOTSTRAP_CMAKE_ARGS to pass the Apple-stage2.cmake cache script to the
+``CLANG_BOOTSTRAP_CMAKE_ARGS`` to pass the Apple-stage2.cmake cache script to the
 stage2 configuration step.
 
 When you build the stage2-distribution target it builds the minimal stage1
@@ -113,18 +113,18 @@ build configurations.
 Multi-stage PGO
 ===============
 
-Profile-Guided Optimizations (PGO) is a really great way to optimize the code
+Profile-Guided Optimization (PGO) is a really great way to optimize the code
 clang generates. Our multi-stage PGO builds are a workflow for generating PGO
 profiles that can be used to optimize clang.
 
 At a high level, the way PGO works is that you build an instrumented compiler,
 then you run the instrumented compiler against sample source files. While the
 instrumented compiler runs it will output a bunch of files containing
-performance counters (.profraw files). After generating all the profraw files
+performance counters (``.profraw`` files). After generating all the profraw files
 you use llvm-profdata to merge the files into a single profdata file that you
-can feed into the LLVM_PROFDATA_FILE option.
+can feed into the ``LLVM_PROFDATA_FILE`` option.
 
-Our PGO.cmake cache automates that whole process. You can use it for
+Our ``PGO.cmake`` cache automates that whole process. You can use it for
 configuration with CMake with the following command:
 
 .. code-block:: console
@@ -133,11 +133,11 @@ configuration with CMake with the following command:
       <path to source>/llvm
 
 There are several additional options that the cache file also accepts to modify
-the build, particularly the PGO_INSTRUMENT_LTO option. Setting this option to
+the build, particularly the ``PGO_INSTRUMENT_LTO`` option. Setting this option to
 Thin or Full will enable ThinLTO or full LTO respectively, further enhancing
 the performance gains from a PGO build by enabling interprocedural
 optimizations. For example, to run a CMake configuration for a PGO build
-that also enables ThinTLO, use the following command:
+that also enables ThinLTO, use the following command:
 
 .. code-block:: console
 
@@ -146,13 +146,13 @@ that also enables ThinTLO, use the following command:
       <path to source>/llvm
 
 By default, clang will generate profile data by compiling a simple
-hello world program.  You can also tell clang use an external
+hello world program.  You can also tell clang to use an external
 project for generating profile data that may be a better fit for your
 use case.  The project you specify must either be a lit test suite
-(use the CLANG_PGO_TRAINING_DATA option) or a CMake project (use the
-CLANG_PERF_TRAINING_DATA_SOURCE_DIR option).
+(use the ``CLANG_PGO_TRAINING_DATA`` option) or a CMake project (use the
+``CLANG_PERF_TRAINING_DATA_SOURCE_DIR`` option).
 
-For example, If you wanted to use the
+For example, if you wanted to use the
 `LLVM Test Suite <https://github.com/llvm/llvm-test-suite/>`_ to generate
 profile data you would use the following command:
 
@@ -162,8 +162,8 @@ profile data you would use the following command:
        -DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite> \
        -DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes
 
-The BOOTSTRAP\_ prefixes tells CMake to pass the variables on to the instrumented
-stage two build.  And the CLANG_PGO_TRAINING_DEPS option let's you specify
+The ``BOOTSTRAP\_`` prefix tells CMake to pass the variables on to the instrumented
+stage two build.  And the ``CLANG_PGO_TRAINING_DEPS`` option lets you specify
 additional build targets to build before building the external project.  The
 LLVM Test Suite requires compiler-rt to build, so we need to add the
 `runtimes` target as a dependency.
@@ -182,7 +182,7 @@ build directory. This takes a really long time because it builds clang twice,
 and you *must* have compiler-rt in your build tree.
 
 This process uses any source files under the perf-training directory as training
-data as long as the source files are marked up with LIT-style RUN lines.
+data as long as the source files are marked up with LIT-style ``RUN`` lines.
 
 After it finishes you can use :code:`find . -name clang.profdata` to find it, but it
 should be at a path something like:
@@ -191,7 +191,7 @@ should be at a path something like:
 
   <build dir>/tools/clang/stage2-instrumented-bins/utils/perf-training/clang.profdata
 
-You can feed that file into the LLVM_PROFDATA_FILE option when you build your
+You can feed that file into the ``LLVM_PROFDATA_FILE`` option when you build your
 optimized compiler.
 
 It may be necessary to build additional targets before running perf training, such as
@@ -214,7 +214,7 @@ The PGO cache generates the following additional targets:
 
 **stage2-instrumented-generate-profdata**
   Depends on stage2-instrumented and will use the instrumented compiler to
-  generate profdata based on the training files in clang/utils/perf-training
+  generate profdata based on the training files in ``clang/utils/perf-training``
 
 **stage2**
   Depends on stage2-instrumented-generate-profdata and will use the stage1
@@ -257,11 +257,11 @@ Then, build the BOLT-optimized binary by running the following ninja command:
   $ ninja clang-bolt
 
 If you're seeing errors in the build process, try building with a recent
-version of Clang/LLVM by setting the CMAKE_C_COMPILER and
-CMAKE_CXX_COMPILER flags to the appropriate values.
+version of Clang/LLVM by setting the ``CMAKE_C_COMPILER`` and
+``CMAKE_CXX_COMPILER`` flags to the appropriate values.
 
 It is also possible to use BOLT on top of PGO and (Thin)LTO for an even more
-significant runtime speedup. To configure a three stage PGO build with ThinLTO
+significant runtime speedup. To configure a three-stage PGO build with ThinLTO
 that optimizes the resulting binary with BOLT, use the following CMake
 configuration command:
 
@@ -282,14 +282,14 @@ Then, to build the final optimized binary, build the stage2-clang-bolt target:
 3-Stage Non-Determinism
 =======================
 
-In the ancient lore of compilers non-determinism is like the multi-headed hydra.
+In the ancient lore of compilers, non-determinism is like the multi-headed hydra.
 Whenever its head pops up, terror and chaos ensue.
 
-Historically one of the tests to verify that a compiler was deterministic would
-be a three stage build. The idea of a three stage build is you take your sources
+Historically, one of the tests to verify that a compiler was deterministic would
+be a three-stage build. The idea of a three-stage build is that you take your sources
 and build a compiler (stage1), then use that compiler to rebuild the sources
 (stage2), then you use that compiler to rebuild the sources a third time
-(stage3) with an identical configuration to the stage2 build. At the end of
+(stage3) with a configuration identical to the stage2 build. At the end of
 this, you have a stage2 and stage3 compiler that should be bit-for-bit
 identical.
 
@@ -301,4 +301,4 @@ following commands:
   $ cmake -G Ninja -C <path to source>/clang/cmake/caches/3-stage.cmake <path to source>/llvm
   $ ninja stage3
 
-After the build you can compare the stage2 and stage3 compilers.
+After the build, you can compare the stage2 and stage3 compilers.
diff --git a/llvm/docs/AliasAnalysis.rst b/llvm/docs/AliasAnalysis.rst
index 1830cca91504..af6da8cf64ea 100644
--- a/llvm/docs/AliasAnalysis.rst
+++ b/llvm/docs/AliasAnalysis.rst
@@ -45,7 +45,7 @@ query, respectively.
 The ``AliasAnalysis`` interface exposes information about memory, represented in
 several different ways.  In particular, memory objects are represented as a
 starting address and size, and function calls are represented as the actual
-``call`` or ``invoke`` instructions that performs the call.  The
+``call`` or ``invoke`` instructions that perform the call.  The
 ``AliasAnalysis`` interface also exposes some helper methods which allow you to
 get mod/ref information for arbitrary instructions.
 
@@ -119,20 +119,20 @@ Must, May, and No Alias Responses
 
 The ``NoAlias`` response may be used when there is never an immediate dependence
 between any memory reference *based* on one pointer and any memory reference
-*based* the other. The most obvious example is when the two pointers point to
+*based on* the other. The most obvious example is when the two pointers point to
 non-overlapping memory ranges. Another is when the two pointers are only ever
 used for reading memory. Another is when the memory is freed and reallocated
 between accesses through one pointer and accesses through the other --- in this
 case, there is a dependence, but it's mediated by the free and reallocation.
 
-As an exception to this is with the :ref:`noalias <noalias>` keyword;
+An exception to this is with the :ref:`noalias <noalias>` keyword;
 the "irrelevant" dependencies are ignored.
 
 The ``MayAlias`` response is used whenever the two pointers might refer to the
 same object.
 
 The ``PartialAlias`` response is used when the two memory objects are known to
-be overlapping in some way, regardless whether they start at the same address
+be overlapping in some way, regardless of whether they start at the same address
 or not.
 
 The ``MustAlias`` response may only be returned if the two memory objects are
@@ -205,15 +205,15 @@ satisfy the ``doesNotAccessMemory`` method also satisfy ``onlyReadsMemory``.
 Writing a new ``AliasAnalysis`` Implementation
 ==============================================
 
-Writing a new alias analysis implementation for LLVM is quite straight-forward.
+Writing a new alias analysis implementation for LLVM is quite straightforward.
 There are already several implementations that you can use for examples, and the
-following information should help fill in any details.  For examples, take a
+following information should help fill in any details.  For example, take a
 look at the `various alias analysis implementations`_ included with LLVM.
 
 Different Pass styles
 ---------------------
 
-The first step to determining what type of :doc:`LLVM pass <WritingAnLLVMPass>`
+The first step is to determine what type of :doc:`LLVM pass <WritingAnLLVMPass>`
 you need to use for your Alias Analysis.  As is the case with most other
 analyses and transformations, the answer should be fairly obvious from what type
 of problem you are trying to solve:
@@ -233,7 +233,7 @@ Your subclass of ``AliasAnalysis`` is required to invoke two methods on the
 ``AliasAnalysis`` base class: ``getAnalysisUsage`` and
 ``InitializeAliasAnalysis``.  In particular, your implementation of
 ``getAnalysisUsage`` should explicitly call into the
-``AliasAnalysis::getAnalysisUsage`` method in addition to doing any declaring
+``AliasAnalysis::getAnalysisUsage`` method in addition to declaring
 any pass dependencies your pass has.  Thus you should have something like this:
 
 .. code-block:: c++
@@ -243,7 +243,7 @@ any pass dependencies your pass has.  Thus you should have something like this:
     // declare your dependencies here.
   }
 
-Additionally, your must invoke the ``InitializeAliasAnalysis`` method from your
+Additionally, you must invoke the ``InitializeAliasAnalysis`` method from your
 analysis run method (``run`` for a ``Pass``, ``runOnFunction`` for a
 ``FunctionPass``, or ``InitializePass`` for an ``ImmutablePass``).  For example
 (as part of a ``Pass``):
@@ -344,7 +344,7 @@ The ``addEscapingUse`` method
 The ``addEscapingUse`` method is used when the uses of a pointer value have
 changed in ways that may invalidate precomputed analysis information.
 Implementations may either use this callback to provide conservative responses
-for points whose uses have change since analysis time, or may recompute some or
+for points whose uses have changed since analysis time, or may recompute some or
 all of their internal state to continue providing accurate responses.
 
 In general, any new use of a pointer value is considered an escaping use, and
@@ -379,16 +379,16 @@ also no way of setting a chain of analyses as the default.
 There is no way for transform passes to declare that they preserve
 ``AliasAnalysis`` implementations. The ``AliasAnalysis`` interface includes
 ``deleteValue`` and ``copyValue`` methods which are intended to allow a pass to
-keep an AliasAnalysis consistent, however there's no way for a pass to declare
+keep an AliasAnalysis consistent; however, there's no way for a pass to declare
 in its ``getAnalysisUsage`` that it does so. Some passes attempt to use
-``AU.addPreserved<AliasAnalysis>``, however this doesn't actually have any
+``AU.addPreserved<AliasAnalysis>``; however, this doesn't actually have any
 effect.
 
 Similarly, the ``opt -p`` option introduces ``ModulePass`` passes between each
 pass, which prevents the use of ``FunctionPass`` alias analysis passes.
 
 The ``AliasAnalysis`` API does have functions for notifying implementations when
-values are deleted or copied, however these aren't sufficient. There are many
+values are deleted or copied; however, these aren't sufficient. There are many
 other ways that LLVM IR can be modified which could be relevant to
 ``AliasAnalysis`` implementations which can not be expressed.
 
@@ -406,7 +406,7 @@ unreliable.
 Many alias queries can be reformulated in terms of other alias queries. When
 multiple ``AliasAnalysis`` queries are chained together, it would make sense to
 start those queries from the beginning of the chain, with care taken to avoid
-infinite looping, however currently an implementation which wants to do this can
+infinite looping; however, currently an implementation which wants to do this can
 only start such queries from itself.
 
 Using alias analysis results
@@ -477,7 +477,7 @@ will help make sense of why things are designed the way they are.
 Using the ``AliasAnalysis`` interface directly
 ----------------------------------------------
 
-If neither of these utility class are what your pass needs, you should use the
+If neither of these utility classes are what your pass needs, you should use the
 interfaces exposed by the ``AliasAnalysis`` class directly.  Try to use the
 higher-level methods when possible (e.g., use mod/ref information instead of the
 `alias`_ method directly if possible) to get the best precision and efficiency.
@@ -488,7 +488,7 @@ Existing alias analysis implementations and clients
 If you're going to be working with the LLVM alias analysis infrastructure, you
 should know what clients and implementations of alias analysis are available.
 In particular, if you are implementing an alias analysis, you should be aware of
-the `the clients`_ that are useful for monitoring and evaluating different
+`the clients`_ that are useful for monitoring and evaluating different
 implementations.
 
 .. _various alias analysis implementations:
@@ -513,7 +513,7 @@ important facts:
 * Many common standard C library functions `never access memory or only read
   memory`_.
 * Pointers that obviously point to constant globals "``pointToConstantMemory``".
-* Function calls can not modify or references stack allocations if they never
+* Function calls cannot modify or reference stack allocations if they never
   escape from the function that allocates them (a common case for automatic
   arrays).
 
@@ -590,7 +590,7 @@ with any of the implementations above.
 The ``-adce`` pass
 ^^^^^^^^^^^^^^^^^^
 
-The ``-adce`` pass, which implements Aggressive Dead Code Elimination uses the
+The ``-adce`` pass, which implements Aggressive Dead Code Elimination, uses the
 ``AliasAnalysis`` interface to delete calls to functions that do not have
 side-effects and are not used.
 
@@ -602,7 +602,7 @@ transformations.  It uses the ``AliasAnalysis`` interface for several different
 transformations:
 
 * It uses mod/ref information to hoist or sink load instructions out of loops if
-  there are no instructions in the loop that modifies the memory loaded.
+  no instructions in the loop modify the memory loaded.
 
 * It uses mod/ref information to hoist function calls out of loops that do not
   write to memory and are loop-invariant.
@@ -615,7 +615,7 @@ The ``-argpromotion`` pass
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The ``-argpromotion`` pass promotes by-reference arguments to be passed in
-by-value instead.  In particular, if pointer arguments are only loaded from it
+by-value instead.  In particular, if pointer arguments are only loaded from, it
 passes in the value loaded instead of the address to the function.  This pass
 uses alias information to make sure that the value loaded from the argument
 pointer is not modified between the entry of the function and any load of the
diff --git a/llvm/docs/Benchmarking.rst b/llvm/docs/Benchmarking.rst
index cd7b835e47c5..d16896511445 100644
--- a/llvm/docs/Benchmarking.rst
+++ b/llvm/docs/Benchmarking.rst
@@ -17,24 +17,24 @@ for example.
 General
 ================================
 
-* Use a high resolution timer, e.g. perf under linux.
+* Use a high-resolution timer, e.g., perf under Linux.
 
 * Run the benchmark multiple times to be able to recognize noise.
 
 * Disable as many processes or services as possible on the target system.
 
-* Disable frequency scaling, turbo boost and address space
-  randomization (see OS specific section).
+* Disable frequency scaling, Turbo Boost and address space
+  randomization (see OS-specific section).
 
-* Static link if the OS supports it. That avoids any variation that
+* Use static linking if the OS supports it. That avoids any variation that
   might be introduced by loading dynamic libraries. This can be done
-  by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake.
+  by passing ``-DLLVM_BUILD_STATIC=ON`` to CMake.
 
-* Try to avoid storage. On some systems you can use tmpfs. Putting the
+* Try to avoid storage. On some systems, you can use tmpfs. Putting the
   program, inputs and outputs on tmpfs avoids touching a real storage
   system, which can have a pretty big variability.
 
-  To mount it (on linux and freebsd at least)::
+  To mount it (on Linux and FreeBSD at least)::
 
     mount -t tmpfs -o size=<XX>g none dir_to_mount
 
@@ -52,7 +52,7 @@ Linux
      echo performance > $i
    done
 
-* Use https://github.com/lpechacek/cpuset to reserve cpus for just the
+* Use https://github.com/lpechacek/cpuset to reserve CPU cores for just the
   program you are benchmarking. If using perf, leave at least 2 cores
   so that perf runs in one and your program in another::
 
@@ -73,7 +73,7 @@ Linux
 
     cset shield --exec -- perf stat -r 10 <cmd>
 
-  This will run the command after ``--`` in the isolated cpus. The
+  This will run the command after ``--`` in the isolated CPU cores. The
   particular perf command runs the ``<cmd>`` 10 times and reports
   statistics.
 
@@ -82,6 +82,6 @@ With these in place you can expect perf variations of less than 0.1%.
 Linux Intel
 -----------
 
-* Disable turbo mode::
+* Disable Turbo Boost::
 
     echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
diff --git a/llvm/docs/BigEndianNEON.rst b/llvm/docs/BigEndianNEON.rst
index 9f388519cea0..a11f292a5a81 100644
--- a/llvm/docs/BigEndianNEON.rst
+++ b/llvm/docs/BigEndianNEON.rst
@@ -1,5 +1,5 @@
 ==============================================
-Using ARM NEON instructions in big endian mode
+Using ARM NEON instructions in big-endian mode
 ==============================================
 
 .. contents::
@@ -8,16 +8,16 @@ Using ARM NEON instructions in big endian mode
 Introduction
 ============
 
-Generating code for big endian ARM processors is for the most part straightforward. NEON loads and stores however have some interesting properties that make code generation decisions less obvious in big endian mode.
+Generating code for big-endian ARM processors is straightforward for the most part. NEON loads and stores, however, have some interesting properties that make code generation decisions less obvious in big-endian mode.
 
 The aim of this document is to explain the problem with NEON loads and stores, and the solution that has been implemented in LLVM.
 
-In this document the term "vector" refers to what the ARM ABI calls a "short vector", which is a sequence of items that can fit in a NEON register. This sequence can be 64 or 128 bits in length, and can constitute 8, 16, 32 or 64 bit items. This document refers to A64 instructions throughout, but is almost applicable to the A32/ARMv7 instruction sets also. The ABI format for passing vectors in A32 is slightly different to A64. Apart from that, the same concepts apply.
+In this document, the term "vector" refers to what the ARM ABI calls a "short vector", which is a sequence of items that can fit in a NEON register. This sequence can be 64 or 128 bits in length, and can constitute 8, 16, 32 or 64 bit items. This document refers to A64 instructions throughout, but is almost applicable to the A32/ARMv7 instruction sets also. The ABI format for passing vectors in A32 is slightly different to A64. Apart from that, the same concepts apply.
 
 Example: C-level intrinsics -> assembly
 ---------------------------------------
 
-It may be helpful first to illustrate how C-level ARM NEON intrinsics are lowered to instructions.
+It may be helpful to first illustrate how C-level ARM NEON intrinsics are lowered to instructions.
 
 This trivial C function takes a vector of four ints and sets the zero'th lane to the value "42"::
 
@@ -26,7 +26,7 @@ This trivial C function takes a vector of four ints and sets the zero'th lane to
         return vsetq_lane_s32(42, p, 0);
     }
 
-arm_neon.h intrinsics generate "generic" IR where possible (that is, normal IR instructions not ``llvm.arm.neon.*`` intrinsic calls). The above generates::
+``arm_neon.h`` intrinsics generate "generic" IR where possible (that is, normal IR instructions, not ``llvm.arm.neon.*`` intrinsic calls). The above generates::
 
     define <4 x i32> @f(<4 x i32> %p) {
       %vset_lane = insertelement <4 x i32> %p, i32 42, i32 0
@@ -45,7 +45,7 @@ Problem
 
 The main problem is how vectors are represented in memory and in registers.
 
-First, a recap. The "endianness" of an item affects its representation in memory only. In a register, a number is just a sequence of bits - 64 bits in the case of AArch64 general purpose registers. Memory, however, is a sequence of addressable units of 8 bits in size. Any number greater than 8 bits must therefore be split up into 8-bit chunks, and endianness describes the order in which these chunks are laid out in memory.
+First, a recap. The "endianness" of an item affects its representation in memory only. In a register, a number is just a sequence of bits - 64 bits in the case of AArch64 general-purpose registers. Memory, however, is a sequence of addressable units of 8 bits in size. Any number greater than 8 bits must therefore be split up into 8-bit chunks, and endianness describes the order in which these chunks are laid out in memory.
 
 A "little endian" layout has the least significant byte first (lowest in memory address). A "big endian" layout has the *most* significant byte first. This means that when loading an item from big endian memory, the lowest 8-bits in memory must go in the most significant 8-bits, and so forth.
 
@@ -58,7 +58,7 @@ A "little endian" layout has the least significant byte first (lowest in memory
     Big endian vector load using ``LDR``.
 
 
-A vector is a consecutive sequence of items that are operated on simultaneously. To load a 64-bit vector, 64 bits need to be read from memory. In little endian mode, we can do this by just performing a 64-bit load - ``LDR q0, [foo]``. However if we try this in big endian mode, because of the byte swapping the lane indices end up being swapped! The zero'th item as laid out in memory becomes the n'th lane in the vector.
+A vector is a consecutive sequence of items that are operated on simultaneously. To load a 64-bit vector, 64 bits need to be read from memory. In little-endian mode, we can do this by just performing a 64-bit load - ``LDR q0, [foo]``. However, if we try this in big-endian mode, because of the byte swapping the lane indices end up being swapped! The zero'th item as laid out in memory becomes the n'th lane in the vector.
 
 .. figure:: ARM-BE-ld1.png
     :align: right
@@ -66,22 +66,22 @@ A vector is a consecutive sequence of items that are operated on simultaneously.
     Big endian vector load using ``LD1``. Note that the lanes retain the correct ordering.
 
 
-Because of this, the instruction ``LD1`` performs a vector load but performs byte swapping not on the entire 64 bits, but on the individual items within the vector. This means that the register content is the same as it would have been on a little endian system.
+Because of this, the ``LD1`` instruction performs a vector load but performs byte swapping not on the entire 64 bits, but on the individual items within the vector. This means that the register content is the same as it would have been on a little-endian system.
 
-It may seem that ``LD1`` should suffice to perform vector loads on a big endian machine. However there are pros and cons to the two approaches that make it less than simple which register format to pick.
+It may seem that ``LD1`` should suffice to perform vector loads on a big-endian machine. However, there are pros and cons to the two approaches that make it less than simple which register format to pick.
 
 There are two options:
 
     1. The content of a vector register is the same *as if* it had been loaded with an ``LDR`` instruction.
     2. The content of a vector register is the same *as if* it had been loaded with an ``LD1`` instruction.
 
-Because ``LD1 == LDR + REV`` and similarly ``LDR == LD1 + REV`` (on a big endian system), we can simulate either type of load with the other type of load plus a ``REV`` instruction. So we're not deciding which instructions to use, but which format to use (which will then influence which instruction is best to use).
+Because ``LD1 == LDR + REV`` and similarly ``LDR == LD1 + REV`` (on a big-endian system), we can simulate either type of load with the other type of load plus a ``REV`` instruction. So we're not deciding which instructions to use, but which format to use (which will then influence which instruction is best to use).
 
 .. The 'clearer' container is required to make the following section header come after the floated
    images above.
 .. container:: clearer
 
-    Note that throughout this section we only mention loads. Stores have exactly the same problems as their associated loads, so have been skipped for brevity.
+    Note that throughout this section, we only mention loads. Stores have exactly the same problems as their associated loads, so have been skipped for brevity.
 
 
 Considerations
@@ -90,7 +90,7 @@ Considerations
 LLVM IR Lane ordering
 ---------------------
 
-LLVM IR has first class vector types. In LLVM IR, the zero'th element of a vector resides at the lowest memory address. The optimizer relies on this property in certain areas, for example when concatenating vectors together. The intention is for arrays and vectors to have identical memory layouts - ``[4 x i8]`` and ``<4 x i8>`` should be represented the same in memory. Without this property there would be many special cases that the optimizer would have to cleverly handle.
+LLVM IR has first class vector types. In LLVM IR, the zero'th element of a vector resides at the lowest memory address. The optimizer relies on this property in certain areas, for example, when concatenating vectors together. The intention is for arrays and vectors to have identical memory layouts - ``[4 x i8]`` and ``<4 x i8>`` should be represented the same in memory. Without this property, there would be many special cases that the optimizer would have to cleverly handle.
 
 Use of ``LDR`` would break this lane ordering property. This doesn't preclude the use of ``LDR``, but we would have to do one of two things:
 
@@ -102,11 +102,11 @@ AAPCS
 
 The ARM procedure call standard (AAPCS) defines the ABI for passing vectors between functions in registers. It states:
 
-    When a short vector is transferred between registers and memory it is treated as an opaque object. That is a short vector is stored in memory as if it were stored with a single ``STR`` of the entire register; a short vector is loaded from memory using the corresponding ``LDR`` instruction. On a little-endian system this means that element 0 will always contain the lowest addressed element of a short vector; on a big-endian system element 0 will contain the highest-addressed element of a short vector.
+    When a short vector is transferred between registers and memory, it is treated as an opaque object. That is a short vector is stored in memory as if it were stored with a single ``STR`` of the entire register; a short vector is loaded from memory using the corresponding ``LDR`` instruction. On a little-endian system, this means that element 0 will always contain the lowest addressed element of a short vector; on a big-endian system element 0 will contain the highest-addressed element of a short vector.
 
     -- Procedure Call Standard for the ARM 64-bit Architecture (AArch64), 4.1.2 Short Vectors
 
-The use of ``LDR`` and ``STR`` as the ABI defines has at least one advantage over ``LD1`` and ``ST1``. ``LDR`` and ``STR`` are oblivious to the size of the individual lanes of a vector. ``LD1`` and ``ST1`` are not - the lane size is encoded within them. This is important across an ABI boundary, because it would become necessary to know the lane width the callee expects. Consider the following code:
+The use of ``LDR`` and ``STR`` as the ABI defines has at least one advantage over ``LD1`` and ``ST1``. ``LDR`` and ``STR`` are oblivious to the size of the individual lanes of a vector. ``LD1`` and ``ST1`` are not - the lane size is encoded within them. This is important across an ABI boundary because it would become necessary to know the lane width the callee expects. Consider the following code:
 
 .. code-block:: c
 
@@ -132,7 +132,7 @@ Alignment
 
 In strict alignment mode, ``LDR qX`` requires its address to be 128-bit aligned, whereas ``LD1`` only requires it to be as aligned as the lane size. If we canonicalised on using ``LDR``, we'd still need to use ``LD1`` in some places to avoid alignment faults (the result of the ``LD1`` would then need to be reversed with ``REV``).
 
-Most operating systems however do not run with alignment faults enabled, so this is often not an issue.
+Most operating systems, however, do not run with alignment faults enabled, so this is often not an issue.
 
 Summary
 -------
@@ -156,7 +156,7 @@ Implementation
 
 There are 3 parts to the implementation:
 
-    1. Predicate ``LDR`` and ``STR`` instructions so that they are never allowed to be selected to generate vector loads and stores. The exception is one-lane vectors [1]_ - these by definition cannot have lane ordering problems so are fine to use ``LDR``/``STR``.
+    1. Predicate ``LDR`` and ``STR`` instructions so that they are never allowed to be selected to generate vector loads and stores. The exception is one-lane vectors [1]_; by definition, these cannot have lane ordering problems so are fine to use ``LDR``/``STR``.
 
     2. Create code generation patterns for bitconverts that create ``REV`` instructions.
 
@@ -168,9 +168,9 @@ Bitconverts
 .. image:: ARM-BE-bitcastfail.png
     :align: right
 
-The main problem with the ``LD1`` solution is dealing with bitconverts (or bitcasts, or reinterpret casts). These are pseudo instructions that only change the compiler's interpretation of data, not the underlying data itself. A requirement is that if data is loaded and then saved again (called a "round trip"), the memory contents should be the same after the store as before the load. If a vector is loaded and is then bitconverted to a different vector type before storing, the round trip will currently be broken.
+The main problem with the ``LD1`` solution is dealing with bitconverts (or bitcasts, or reinterpret casts). These are pseudo instructions that only change the compiler's interpretation of data, not the underlying data itself. A requirement is that if data is loaded and then saved again (called a "round trip"), the memory contents should be the same after the store as before the load. If a vector is loaded and then bitconverted to a different vector type before being stored, the round trip will currently be broken.
 
-Take for example this code sequence::
+Take this code sequence, for example::
 
     %0 = load <4 x i32> %x
     %1 = bitcast <4 x i32> %0 to <2 x i64>
@@ -185,7 +185,7 @@ This would produce a code sequence such as that in the figure on the right. The
 .. image:: ARM-BE-bitcastsuccess.png
     :align: right
 
-Conceptually this is simple - we can insert a ``REV`` undoing the ``LD1`` of type ``X`` (converting the in-register representation to the same as if it had been loaded by ``LDR``) and then insert another ``REV`` to change the representation to be as if it had been loaded by an ``LD1`` of type ``Y``.
+Conceptually, this is simple - we can insert a ``REV`` undoing the ``LD1`` of type ``X`` (converting the in-register representation to the same as if it had been loaded by ``LDR``) and then insert another ``REV`` to change the representation to be as if it had been loaded by an ``LD1`` of type ``Y``.
 
 For the previous example, this would be::
 
@@ -201,4 +201,4 @@ For the previous example, this would be::
 
 It turns out that these ``REV`` pairs can, in almost all cases, be squashed together into a single ``REV``. For the example above, a ``REV128 4s`` + ``REV128 2d`` is actually a ``REV64 4s``, as shown in the figure on the right.
 
-.. [1] One lane vectors may seem useless as a concept but they serve to distinguish between values held in general purpose registers and values held in NEON/VFP registers. For example, an ``i64`` would live in an ``x`` register, but ``<1 x i64>`` would live in a ``d`` register.
+.. [1] One-lane vectors may seem useless as a concept, but they serve to distinguish between values held in general-purpose registers and values held in NEON/VFP registers. For example, an ``i64`` would live in an ``x`` register, but ``<1 x i64>`` would live in a ``d`` register.
diff --git a/llvm/docs/BitCodeFormat.rst b/llvm/docs/BitCodeFormat.rst
index 8a26b101c4bf..f686784d0a7c 100644
--- a/llvm/docs/BitCodeFormat.rst
+++ b/llvm/docs/BitCodeFormat.rst
@@ -87,9 +87,9 @@ Fixed Width Integers
 ^^^^^^^^^^^^^^^^^^^^
 
 Fixed-width integer values have their low bits emitted directly to the file.
-For example, a 3-bit integer value encodes 1 as 001.  Fixed width integers are
+For example, a 3-bit integer value encodes 1 as 001.  Fixed-width integers are
 used when there are a well-known number of options for a field.  For example,
-boolean values are usually encoded with a 1-bit wide integer.
+boolean values are usually encoded with a 1-bit-wide integer.
 
 .. _Variable Width Integers:
 .. _Variable Width Integer:
@@ -229,7 +229,7 @@ Data Records
 ------------
 
 Data records consist of a record code and a number of (up to) 64-bit integer
-values.  The interpretation of the code and values is application specific and
+values.  The interpretation of the code and values is application-specific and
 may vary between different block types.  Records can be encoded either using an
 unabbrev record, or with an abbreviation.  In the LLVM IR format, for example,
 there is a record which encodes the target triple of a module.  The code is
@@ -469,19 +469,19 @@ Native Object File Wrapper Format
 =================================
 
 Bitcode files for LLVM IR may also be wrapped in a native object file
-(i.e. ELF, COFF, Mach-O).  The bitcode must be stored in a section of the object
-file named ``__LLVM,__bitcode`` for MachO or ``.llvmbc`` for the other object
+(i.e., ELF, COFF, Mach-O).  The bitcode must be stored in a section of the object
+file named ``__LLVM,__bitcode`` for Mach-O or ``.llvmbc`` for the other object
 formats. ELF objects additionally support a ``.llvm.lto`` section for
-:doc:`FatLTO`, which contains bitcode suitable for LTO compilation (i.e. bitcode
+:doc:`FatLTO`, which contains bitcode suitable for LTO compilation (i.e., bitcode
 that has gone through a pre-link LTO pipeline).  The ``.llvmbc`` section
 predates FatLTO support in LLVM, and may not always contain bitcode that is
-suitable for LTO (i.e. from ``-fembed-bitcode``).  The wrapper format is useful
+suitable for LTO (i.e., from ``-fembed-bitcode``).  The wrapper format is useful
 for accommodating LTO in compilation pipelines where intermediate objects must
 be native object files which contain metadata in other sections.
 
 Not all tools support this format.  For example, lld and the gold plugin will
 ignore the ``.llvmbc`` section when linking object files, but can use
-``.llvm.lto`` sections when passed the correct command line options.
+``.llvm.lto`` sections when passed the correct command-line options.
 
 .. _encoding of LLVM IR:
 
@@ -585,7 +585,7 @@ MODULE_CODE_VERSION Record
 ``[VERSION, version#]``
 
 The ``VERSION`` record (code 1) contains a single value indicating the format
-version. Versions 0, 1 and 2 are supported at this time. The difference between
+version. Versions 0, 1, and 2 are supported at this time. The difference between
 version 0 and 1 is in the encoding of instruction operands in
 each `FUNCTION_BLOCK`_.
 
@@ -1033,13 +1033,13 @@ in the file `Attributes.td
 .. note::
   The ``allocsize`` attribute has a special encoding for its arguments. Its two
   arguments, which are 32-bit integers, are packed into one 64-bit integer value
-  (i.e. ``(EltSizeParam << 32) | NumEltsParam``), with ``NumEltsParam`` taking on
+  (i.e., ``(EltSizeParam << 32) | NumEltsParam``), with ``NumEltsParam`` taking on
   the sentinel value -1 if it is not specified.
 
 .. note::
   The ``vscale_range`` attribute has a special encoding for its arguments. Its two
   arguments, which are 32-bit integers, are packed into one 64-bit integer value
-  (i.e. ``(Min << 32) | Max``), with ``Max`` taking on the value of ``Min`` if
+  (i.e., ``(Min << 32) | Max``), with ``Max`` taking on the value of ``Min`` if
   it is not specified.
 
 .. _TYPE_BLOCK:
@@ -1137,7 +1137,7 @@ TYPE_CODE_POINTER Record
 ``[POINTER, pointee type, address space]``
 
 The ``POINTER`` record (code 8) adds a pointer type to the type table. The
-operand fields are
+operand fields are:
 
 * *pointee type*: The type index of the pointed-to type
 
@@ -1155,7 +1155,7 @@ TYPE_CODE_FUNCTION_OLD Record
 ``[FUNCTION_OLD, vararg, ignored, retty, ...paramty... ]``
 
 The ``FUNCTION_OLD`` record (code 9) adds a function type to the type table.
-The operand fields are
+The operand fields are:
 
 * *vararg*: Non-zero if the type represents a varargs function
 
@@ -1173,7 +1173,7 @@ TYPE_CODE_ARRAY Record
 ``[ARRAY, numelts, eltty]``
 
 The ``ARRAY`` record (code 11) adds an array type to the type table.  The
-operand fields are
+operand fields are:
 
 * *numelts*: The number of elements in arrays of this type
 
@@ -1185,7 +1185,7 @@ TYPE_CODE_VECTOR Record
 ``[VECTOR, numelts, eltty]``
 
 The ``VECTOR`` record (code 12) adds a vector type to the type table.  The
-operand fields are
+operand fields are:
 
 * *numelts*: The number of elements in vectors of this type
 
@@ -1235,7 +1235,7 @@ TYPE_CODE_STRUCT_ANON Record
 ``[STRUCT_ANON, ispacked, ...eltty...]``
 
 The ``STRUCT_ANON`` record (code 18) adds a literal struct type to the type
-table. The operand fields are
+table. The operand fields are:
 
 * *ispacked*: Non-zero if the type represents a packed structure
 
@@ -1258,7 +1258,7 @@ TYPE_CODE_STRUCT_NAMED Record
 
 The ``STRUCT_NAMED`` record (code 20) adds an identified struct type to the
 type table, with a name defined by a previously encountered ``STRUCT_NAME``
-record. The operand fields are
+record. The operand fields are:
 
 * *ispacked*: Non-zero if the type represents a packed structure
 
@@ -1271,7 +1271,7 @@ TYPE_CODE_FUNCTION Record
 ``[FUNCTION, vararg, retty, ...paramty... ]``
 
 The ``FUNCTION`` record (code 21) adds a function type to the type table. The
-operand fields are
+operand fields are:
 
 * *vararg*: Non-zero if the type represents a varargs function
 
@@ -1294,7 +1294,7 @@ TYPE_CODE_TARGET_TYPE Record
 
 The ``TARGET_TYPE`` record (code 26) adds a target extension type to the type
 table, with a name defined by a previously encountered ``STRUCT_NAME`` record.
-The operand fields are
+The operand fields are:
 
 * *num_tys*: The number of parameters that are types (as opposed to integers)
 
diff --git a/llvm/docs/BranchWeightMetadata.rst b/llvm/docs/BranchWeightMetadata.rst
index 62204753e29b..3fa21720d25f 100644
--- a/llvm/docs/BranchWeightMetadata.rst
+++ b/llvm/docs/BranchWeightMetadata.rst
@@ -9,16 +9,16 @@ Introduction
 ============
 
 Branch Weight Metadata represents branch weights as its likeliness to be taken
-(see :doc:`BlockFrequencyTerminology`). Metadata is assigned to an
-``Instruction`` that is a terminator as a ``MDNode`` of the ``MD_prof`` kind.
-The first operator is always a ``MDString`` node with the string
-"branch_weights".  Number of operators depends on the terminator type.
+(see :doc:`BlockFrequencyTerminology`). Metadata is assigned to a
+terminator ``Instruction`` as an ``MDNode`` of the ``MD_prof`` kind.
+The first operand is always an ``MDString`` node with the string
+"branch_weights".  The number of operands depends on the terminator type.
 
-Branch weights might be fetch from the profiling file, or generated based on
-`__builtin_expect`_ and `__builtin_expect_with_probability`_ instruction.
+Branch weights might be fetched from the profiling file or generated based on
+`__builtin_expect`_ and `__builtin_expect_with_probability`_ instructions.
 
-All weights are represented as an unsigned 32-bit values, where higher value
-indicates greater chance to be taken.
+All weights are represented as unsigned 32-bit values, where a higher value
+indicates a greater chance of being taken.
 
 Supported Instructions
 ======================
@@ -26,7 +26,7 @@ Supported Instructions
 ``BranchInst``
 ^^^^^^^^^^^^^^
 
-Metadata is only assigned to the conditional branches. There are two extra
+Metadata is only assigned to conditional branches. There are two extra
 operands for the true and the false branch.
 We optionally track if the metadata was added by ``__builtin_expect`` or
 ``__builtin_expect_with_probability`` with an optional field ``!"expected"``.
@@ -43,7 +43,7 @@ We optionally track if the metadata was added by ``__builtin_expect`` or
 ``SwitchInst``
 ^^^^^^^^^^^^^^
 
-Branch weights are assigned to every case (including the ``default`` case which
+Branch weights are assigned to every case (including the ``default`` case, which
 is always case #0).
 
 .. code-block:: none
@@ -74,7 +74,7 @@ Branch weights are assigned to every destination.
 
 Calls may have branch weight metadata, containing the execution count of
 the call. It is currently used in SamplePGO mode only, to augment the
-block and entry counts which may not be accurate with sampling.
+block and entry counts, which may not be accurate with sampling.
 
 .. code-block:: none
 
@@ -89,9 +89,9 @@ block and entry counts which may not be accurate with sampling.
 
 Invoke instruction may have branch weight metadata with one or two weights.
 The second weight is optional and corresponds to the unwind branch.
-If only one weight is set then it contains the execution count of the call
+If only one weight is set, then it contains the execution count of the call
 and used in SamplePGO mode only as described for the call instruction. If both
-weights are specified then the second weight contains count of unwind branch
+weights are specified then the second weight contains the count of unwind branch
 taken and the first weights contains the execution count of the call minus
 the count of unwind branch taken. Both weights specified are used to calculate
 BranchProbability as for BranchInst and for SamplePGO the sum of both weights
@@ -139,7 +139,7 @@ true, in other case condition is likely to be false. For example:
 ^^^^^^^^^^^^^^^^^^^^
 
 The ``exp`` parameter is the value. The ``c`` parameter is the expected
-value. If the expected value doesn't show on the cases list, the ``default``
+value. If the expected value doesn't appear in the cases list, the ``default``
 case is assumed to be likely taken.
 
 .. code-block:: c++
@@ -159,15 +159,15 @@ Built-in ``expect.with.probability`` Instruction
 ``__builtin_expect_with_probability(long exp, long c, double probability)`` has
 the same semantics as ``__builtin_expect``, but the caller provides the
 probability that ``exp == c``. The last argument ``probability`` must be
-constant floating-point expression and be in the range [0.0, 1.0] inclusive.
+a constant floating-point expression and be in the range [0.0, 1.0] inclusive.
 The usage is also similar as ``__builtin_expect``, for example:
 
 ``if`` statement
 ^^^^^^^^^^^^^^^^
 
-If the expect comparison value ``c`` is equal to 1(true), and probability
+If the expected comparison value ``c`` is equal to 1(true), and probability
 value ``probability`` is set to 0.8, that means the probability of condition
-to be true is 80% while that of false is 20%.
+being true is 80% while that of false is 20%.
 
 .. code-block:: c++
 
@@ -178,8 +178,8 @@ to be true is 80% while that of false is 20%.
 ``switch`` statement
 ^^^^^^^^^^^^^^^^^^^^
 
-This is basically the same as ``switch`` statement in ``__builtin_expect``.
-The probability that ``exp`` is equal to the expect value is given in
+This is similar to the ``switch`` statement in ``__builtin_expect``.
+The probability that ``exp`` is equal to the expected value is given in
 the third argument ``probability``, while the probability of other value is
 the average of remaining probability(``1.0 - probability``). For example:
 
@@ -195,8 +195,8 @@ the average of remaining probability(``1.0 - probability``). For example:
 CFG Modifications
 =================
 
-Branch Weight Metatada is not proof against CFG changes. If terminator operands'
-are changed some action should be taken. In other case some misoptimizations may
+Branch Weight Metadata is not proof against CFG changes. If terminator operands'
+are changed, some action should be taken. Otherwise, misoptimizations may
 occur due to incorrect branch prediction information.
 
 Function Entry Counts
@@ -212,7 +212,7 @@ invoked (in the case of instrumentation-based profiles). In the case of
 sampling-based profiles, this operand is an approximation of how many times
 the function was invoked.
 
-For example, in the code below, the instrumentation for function foo()
+For example, in the code below, the instrumentation for function ``foo()``
 indicates that it was called 2,590 times at runtime.
 
 .. code-block:: llvm
@@ -222,12 +222,12 @@ indicates that it was called 2,590 times at runtime.
   }
   !1 = !{!"function_entry_count", i64 2590}
 
-If "function_entry_count" has more than 2 operands, the later operands are
+If "function_entry_count" has more than 2 operands, the subsequent operands are
 the GUID of the functions that needs to be imported by ThinLTO. This is only
-set by sampling based profile. It is needed because the sampling based profile
+set by sampling-based profile. It is needed because the sampling-based profile
 was collected on a binary that had already imported and inlined these functions,
 and we need to ensure the IR matches in the ThinLTO backends for profile
 annotation. The reason why we cannot annotate this on the callsite is that it
-can only goes down 1 level in the call chain. For the cases where
-foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we will need to go down 2 levels
-in the call chain to import both bar_in_b_cc and baz_in_c_cc.
+can only go down 1 level in the call chain. For the cases where
+``foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc()``, we will need to go down 2 levels
+in the call chain to import both ``bar_in_b_cc`` and ``baz_in_c_cc``.
diff --git a/llvm/docs/BuildingADistribution.rst b/llvm/docs/BuildingADistribution.rst
index d2f81f6fc3d2..10e571cdea3f 100644
--- a/llvm/docs/BuildingADistribution.rst
+++ b/llvm/docs/BuildingADistribution.rst
@@ -9,7 +9,7 @@ Introduction
 ============
 
 This document is geared toward people who want to build and package LLVM and any
-combination of LLVM sub-project tools for distribution. This document covers
+combination of its sub-project tools for distribution. This document covers
 useful features of the LLVM build system as well as best practices and general
 information about packaging LLVM.
 
@@ -20,7 +20,7 @@ workings of the builds described in the :doc:`AdvancedBuilds` document.
 General Distribution Guidance
 =============================
 
-When building a distribution of a compiler it is generally advised to perform a
+When building a distribution of a compiler, it is generally advised to perform a
 bootstrap build of the compiler. That means building a "stage 1" compiler with
 your host toolchain, then building the "stage 2" compiler using the "stage 1"
 compiler. This is done so that the compiler you distribute benefits from all the
@@ -57,7 +57,7 @@ at process launch time, which can be very slow for C++ code.
 
 The simplest example of building a distribution with reasonable performance is
 captured in the DistributionExample CMake cache file located at
-clang/cmake/caches/DistributionExample.cmake. The following command will perform
+``clang/cmake/caches/DistributionExample.cmake``. The following commands will perform
 and install the distribution build:
 
 .. code-block:: console
@@ -69,7 +69,7 @@ and install the distribution build:
 Difference between ``install`` and ``install-distribution``
 -----------------------------------------------------------
 
-One subtle but important thing to note is the difference between the ``install``
+One subtle but important difference to note is between the ``install``
 and ``install-distribution`` targets. The ``install`` target is expected to
 install every part of LLVM that your build is configured to generate except the
 LLVM testing tools. Alternatively the ``install-distribution`` target, which is
@@ -137,7 +137,7 @@ Special Notes for Library-only Distributions
 
 One of the most powerful features of LLVM is its library-first design mentality
 and the way you can compose a wide variety of tools using different portions of
-LLVM. Even in this situation using *BUILD_SHARED_LIBS* is not supported. If you
+LLVM. Even in this situation, using *BUILD_SHARED_LIBS* is not supported. If you
 want to distribute LLVM as a shared library for use in a tool, the recommended
 method is using *LLVM_BUILD_LLVM_DYLIB*, and you can use *LLVM_DYLIB_COMPONENTS*
 to configure which LLVM components are part of libLLVM.
@@ -147,7 +147,7 @@ Options for Optimizing LLVM
 ===========================
 
 There are four main build optimizations that our CMake build system supports.
-When performing a bootstrap build it is not beneficial to do anything other than
+When performing a bootstrap build, it is not beneficial to do anything other than
 setting *CMAKE_BUILD_TYPE* to ``Release`` for the stage-1 compiler. This is
 because the more intensive optimizations are expensive to perform and the
 stage-1 compiler is thrown away. All of the further options described should be
@@ -162,7 +162,7 @@ debug information and use ``-O3`` you can override the
 *CMAKE_<LANG>_FLAGS_RELWITHDEBINFO* option for C and CXX.
 DistributionExample.cmake does this.
 
-Another easy to use option is Link-Time-Optimization. You can set the
+Another easy-to-use option is Link-Time-Optimization. You can set the
 *LLVM_ENABLE_LTO* option on your stage-2 build to ``Thin`` or ``Full`` to enable
 building LLVM with LTO. These options will significantly increase link time of
 the binaries in the distribution, but it will create much faster binaries. This
@@ -175,7 +175,7 @@ in-tree profiling tests are very limited, and generating the profile takes a
 significant amount of time, but it can result in a significant improvement in
 the performance of the generated binaries.
 
-In addition to PGO profiling we also have limited support in-tree for generating
+In addition to PGO profiling, we also have limited in-tree support for generating
 linker order files. These files provide the linker with a suggested ordering for
 functions in the final binary layout. This can measurably speed up clang by
 physically grouping functions that are called temporally close to each other.
@@ -187,7 +187,7 @@ Options for Reducing Size
 =========================
 
 .. warning::
-  Any steps taken to reduce the binary size will come at a cost of runtime
+  Any steps taken to reduce binary size will come at the cost of runtime
   performance in the generated binaries.
 
 The simplest and least significant way to reduce binary size is to set the
@@ -197,7 +197,7 @@ both the least benefit to size and the least impact on performance.
 
 The most impactful way to reduce binary size is to dynamically link LLVM into
 all the tools. This reduces code size by decreasing duplication of common code
-between the LLVM-based tools. This can be done by setting the following two
+among the LLVM-based tools. This can be done by setting the following two
 CMake options to ``On``: *LLVM_BUILD_LLVM_DYLIB* and *LLVM_LINK_LLVM_DYLIB*.
 
 .. warning::
@@ -214,35 +214,35 @@ that are already documented include: *LLVM_TARGETS_TO_BUILD*, *LLVM_ENABLE_PROJE
 *LLVM_ENABLE_RUNTIMES*, *LLVM_BUILD_LLVM_DYLIB*, and *LLVM_LINK_LLVM_DYLIB*.
 
 **LLVM_ENABLE_RUNTIMES**:STRING
-  When building a distribution that includes LLVM runtime projects (i.e. libcxx,
+  When building a distribution that includes LLVM runtime projects (i.e., libcxx,
   compiler-rt, libcxxabi, libunwind...), it is important to build those projects
   with the just-built compiler.
 
 **LLVM_DISTRIBUTION_COMPONENTS**:STRING
-  This variable can be set to a semi-colon separated list of LLVM build system
+  This variable can be set to a semicolon-separated list of LLVM build system
   components to install. All LLVM-based tools are components, as well as most
   of the libraries and runtimes. Component names match the names of the build
   system targets.
 
 **LLVM_DISTRIBUTIONS**:STRING
-  This variable can be set to a semi-colon separated list of distributions. See
+  This variable can be set to a semicolon-separated list of distributions. See
   the :ref:`Multi-distribution configurations` section above for details on this
   and other CMake variables to configure multiple distributions.
 
 **LLVM_RUNTIME_DISTRIBUTION_COMPONENTS**:STRING
-  This variable can be set to a semi-colon separated list of runtime library
+  This variable can be set to a semicolon-separated list of runtime library
   components. This is used in conjunction with *LLVM_ENABLE_RUNTIMES* to specify
   components of runtime libraries that you want to include in your distribution.
   Just like with *LLVM_DISTRIBUTION_COMPONENTS*, component names match the names
   of the build system targets.
 
 **LLVM_DYLIB_COMPONENTS**:STRING
-  This variable can be set to a semi-colon separated name of LLVM library
+  This variable can be set to a semicolon-separated name of LLVM library
   components. LLVM library components are either library names with the LLVM
-  prefix removed (i.e. Support, Demangle...), LLVM target names, or special
+  prefix removed (i.e., Support, Demangle...), LLVM target names, or special
   purpose component names. The special purpose component names are:
 
-  #. ``all`` - All LLVM available component libraries
+  #. ``all`` - All available LLVM component libraries
   #. ``Native`` - The LLVM target for the Native system
   #. ``AllTargetsAsmParsers`` - All the included target ASM parsers libraries
   #. ``AllTargetsDescs`` - All the included target descriptions libraries
diff --git a/llvm/docs/CMake.rst b/llvm/docs/CMake.rst
index 30b71bffaf76..ce8355291931 100644
--- a/llvm/docs/CMake.rst
+++ b/llvm/docs/CMake.rst
@@ -40,7 +40,7 @@ We use here the command-line, non-interactive CMake interface.
    through the ``PATH`` environment variable.
 
 #. Create a build directory. Building LLVM in the source
-   directory is not supported. cd to this directory:
+   directory is not supported. ``cd`` to this directory:
 
    .. code-block:: console
 
@@ -108,7 +108,7 @@ Basic CMake usage
 
 This section explains basic aspects of CMake for daily use.
 
-CMake comes with extensive documentation, in the form of html files, and as
+CMake comes with extensive documentation, in the form of HTML files, and as
 online help accessible via the ``cmake`` executable itself. Execute ``cmake
 --help`` for further help options.
 
@@ -139,7 +139,7 @@ A given development platform can have more than one adequate
 generator. If you use Visual Studio, "NMake Makefiles" is a generator you can use
 for building with NMake. By default, CMake chooses the most specific generator
 supported by your development environment. If you want an alternative generator,
-you must tell this to CMake with the ``-G`` option.
+you must specify this to CMake with the ``-G`` option.
 
 .. todo::
 
@@ -206,7 +206,7 @@ used variables that control features of LLVM and enabled subprojects.
 
   * Optimizations make LLVM/Clang run faster but can be an impediment for
     step-by-step debugging.
-  * Builds with debug information can use a lot of RAM and disk space and is
+  * Builds with debug information can use a lot of RAM and disk space and are
     usually slower to run. You can improve RAM usage by using ``lld``, see
     the :ref:`LLVM_USE_LINKER <llvm_use_linker>` option.
   * Assertions are internal checks to help you find bugs. They typically slow
@@ -257,10 +257,10 @@ description is in `LLVM-related variables`_ below.
 
 **LLVM_PARALLEL_{COMPILE,LINK}_JOBS**:STRING
   Building the llvm toolchain can use a lot of resources, particularly
-  linking. These options, when you use the Ninja generator, allow you
+  during linking. These options, when you use the Ninja generator, allow you
   to restrict the parallelism. For example, to avoid OOMs or going
-  into swap, permit only one link job per 15GB of RAM available on a
-  32GB machine, specify ``-G Ninja -DLLVM_PARALLEL_LINK_JOBS=2``.
+  into swap, permit only one link job per 15 GB of RAM available on a
+  32 GB machine, specify ``-G Ninja -DLLVM_PARALLEL_LINK_JOBS=2``.
 
 **LLVM_TARGETS_TO_BUILD**:STRING
   Control which targets are enabled. For example, you may only need to enable
@@ -324,13 +324,13 @@ its enabled sub-projects. Nearly all of these variable names begin with
   Used to decide if LLVM should be built with ABI breaking checks or
   not.  Allowed values are `WITH_ASSERTS` (default), `FORCE_ON` and
   `FORCE_OFF`.  `WITH_ASSERTS` turns on ABI breaking checks in an
-  assertion enabled build.  `FORCE_ON` (`FORCE_OFF`) turns them on
+  assertion-enabled build.  `FORCE_ON` (`FORCE_OFF`) turns them on
   (off) irrespective of whether normal (`NDEBUG`-based) assertions are
   enabled or not.  A version of LLVM built with ABI breaking checks
   is not ABI compatible with a version built without it.
 
 **LLVM_ADDITIONAL_BUILD_TYPES**:LIST
-  Adding a semicolon separated list of additional build types to this flag
+  Adding a semicolon-separated list of additional build types to this flag
   allows for them to be specified as values in ``CMAKE_BUILD_TYPE`` without
   encountering a fatal error during the configuration process.
 
@@ -346,7 +346,7 @@ its enabled sub-projects. Nearly all of these variable names begin with
   determine it.
 
 **LLVM_FORCE_VC_REVISION**:STRING
-  Force a specific Git revision id rather than calling to git to determine it.
+  Force a specific Git revision id rather than calling git to determine it.
   This is useful in environments where git is not available or non-functional
   but the VC revision is available through other means.
 
@@ -358,9 +358,9 @@ its enabled sub-projects. Nearly all of these variable names begin with
   Adds benchmarks to the list of default targets. Defaults to OFF.
 
 **LLVM_BUILD_DOCS**:BOOL
-  Adds all *enabled* documentation targets (i.e. Doxygen and Sphinx targets) as
+  Adds all *enabled* documentation targets (i.e., Doxygen and Sphinx targets) as
   dependencies of the default build targets.  This results in all of the (enabled)
-  documentation targets being as part of a normal build.  If the ``install``
+  documentation targets being built as part of a normal build.  If the ``install``
   target is run, then this also enables all built documentation targets to be
   installed. Defaults to OFF.  To enable a particular documentation target, see
   ``LLVM_ENABLE_SPHINX`` and ``LLVM_ENABLE_DOXYGEN``.
@@ -414,17 +414,17 @@ its enabled sub-projects. Nearly all of these variable names begin with
   variables, respectively.
 
 **LLVM_CODE_COVERAGE_TARGETS**:STRING
-  If set to a semicolon separated list of targets, those targets will be used
+  If set to a semicolon-separated list of targets, those targets will be used
   to drive the code coverage reports. If unset, the target list will be
   constructed using the LLVM build's CMake export list.
 
 **LLVM_COVERAGE_SOURCE_DIRS**:STRING
-  If set to a semicolon separated list of directories, the coverage reports
+  If set to a semicolon-separated list of directories, the coverage reports
   will limit code coverage summaries to just the listed directories. If unset,
   coverage reports will include all sources identified by the tooling.
 
 **LLVM_CREATE_XCODE_TOOLCHAIN**:BOOL
-  macOS Only: If enabled, CMake will generate a target named
+  macOS only: If enabled, CMake will generate a target named
   'install-xcode-toolchain'. This target will create a directory at
   ``$CMAKE_INSTALL_PREFIX/Toolchains`` containing an xctoolchain directory which can
   be used to override the default system tools.
@@ -588,8 +588,8 @@ its enabled sub-projects. Nearly all of these variable names begin with
   Semicolon-separated list of projects to build, or *all* for building all
   (clang, lldb, lld, polly, etc) projects. This flag assumes that projects
   are checked out side-by-side and not nested, i.e. clang needs to be in
-  parallel of llvm instead of nested in ``llvm/tools``. This feature allows
-  to have one build for only LLVM and another for clang+llvm using the same
+  parallel to llvm instead of nested in ``llvm/tools``. This feature allows
+  having one build for only LLVM and another for clang+llvm using the same
   source checkout.
 
   The full list is:
@@ -657,11 +657,11 @@ its enabled sub-projects. Nearly all of these variable names begin with
 **LLVM_EXPERIMENTAL_TARGETS_TO_BUILD**:STRING
   Semicolon-separated list of experimental targets to build and linked into
   llvm. This will build the experimental target without needing it to add to the
-  list of all the targets available in the LLVM's main CMakeLists.txt.
+  list of all the targets available in the LLVM's main ``CMakeLists.txt``.
 
 **LLVM_EXTERNAL_PROJECTS**:STRING
   Semicolon-separated list of additional external projects to build as part of
-  llvm. For each project LLVM_EXTERNAL_<NAME>_SOURCE_DIR have to be specified
+  llvm. For each project, ``LLVM_EXTERNAL_<NAME>_SOURCE_DIR`` has to be specified
   with the path for the source code of the project. Example:
   ``-DLLVM_EXTERNAL_PROJECTS="Foo;Bar"
   -DLLVM_EXTERNAL_FOO_SOURCE_DIR=/src/foo
@@ -676,7 +676,7 @@ its enabled sub-projects. Nearly all of these variable names begin with
   to a valid path, then that project will not be built.
 
 **LLVM_EXTERNALIZE_DEBUGINFO**:BOOL
-  Generate dSYM files and strip executables and libraries (Darwin Only).
+  Generate dSYM files and strip executables and libraries (Darwin only).
   Defaults to OFF.
 
 **LLVM_ENABLE_EXPORTED_SYMBOLS_IN_EXECUTABLES**:BOOL
@@ -751,7 +751,7 @@ its enabled sub-projects. Nearly all of these variable names begin with
     $ D:\git> git clone https://github.com/mjansson/rpmalloc
     $ D:\llvm-project> cmake ... -DLLVM_INTEGRATED_CRT_ALLOC=D:\git\rpmalloc
 
-  This option needs to be used along with the static CRT, ie. if building the
+  This option needs to be used along with the static CRT, i.e., if building the
   Release target, add ``-DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded``.
   Note that rpmalloc is also supported natively in-tree, see option below.
 
@@ -778,7 +778,7 @@ its enabled sub-projects. Nearly all of these variable names begin with
 **LLVM_LIT_TOOLS_DIR**:PATH
   The path to GnuWin32 tools for tests. Valid on Windows host.  Defaults to
   the empty string, in which case lit will look for tools needed for tests
-  (e.g. ``grep``, ``sort``, etc.) in your ``%PATH%``. If GnuWin32 is not in your
+  (e.g., ``grep``, ``sort``, etc.) in your ``%PATH%``. If GnuWin32 is not in your
   ``%PATH%``, then you can set this variable to the GnuWin32 directory so that
   lit can find tools needed for tests in that directory.
 
@@ -797,7 +797,7 @@ its enabled sub-projects. Nearly all of these variable names begin with
   set to non-standard values.
 
 **LLVM_OPTIMIZED_TABLEGEN**:BOOL
-  If enabled and building a debug or asserts build, the CMake build system will
+  If enabled and building a debug or assert build, the CMake build system will
   generate a Release build tree to build a fully optimized tablegen for use
   during the build. Enabling this option can significantly speed up build times,
   especially when building LLVM in Debug configurations.
@@ -818,7 +818,7 @@ its enabled sub-projects. Nearly all of these variable names begin with
   ``LLVM_PARALLEL_{COMPILE,LINK,TABLEGEN}_JOBS`` variable is
   overwritten by computing the memory size divided by the
   specified value. The largest memory user is linking, but remember
-  that jobs in the other categories might run in parallel to the link
+  that jobs in the other categories might run in parallel with the link
   jobs, and you need to consider their memory requirements when
   in a memory-limited environment. Using a
   ``-DLLVM_RAM_PER_LINK_JOB=10000`` is a good approximation. On ELF
@@ -869,7 +869,7 @@ its enabled sub-projects. Nearly all of these variable names begin with
   the default set of UBSan flags.
 
 **LLVM_UNREACHABLE_OPTIMIZE**:BOOL
-  This flag controls the behavior of ``llvm_unreachable()`` in release build
+  This flag controls the behavior of ``llvm_unreachable()`` in a release build
   (when assertions are disabled in general). When ON (default) then
   ``llvm_unreachable()`` is considered "undefined behavior" and optimized as
   such. When OFF it is instead replaced with a guaranteed "trap".
@@ -879,9 +879,9 @@ its enabled sub-projects. Nearly all of these variable names begin with
 
 **LLVM_USE_LINKER**:STRING
   Add ``-fuse-ld={name}`` to the link invocation. The possible values depend on
-  your compiler, for clang the value can be an absolute path to your custom
+  your compiler. For clang, the value can be an absolute path to your custom
   linker, otherwise clang will prefix the name with ``ld.`` and apply its usual
-  search. For example to link LLVM with the Gold linker, cmake can be invoked
+  search. For example, to link LLVM with the Gold linker, cmake can be invoked
   with ``-DLLVM_USE_LINKER=gold``.
 
 **LLVM_USE_OPROFILE**:BOOL
@@ -892,11 +892,11 @@ its enabled sub-projects. Nearly all of these variable names begin with
 
 **LLVM_USE_RELATIVE_PATHS_IN_FILES**:BOOL
   Rewrite absolute source paths in sources and debug info to relative ones. The
-  source prefix can be adjusted via the LLVM_SOURCE_PREFIX variable.
+  source prefix can be adjusted via the ``LLVM_SOURCE_PREFIX`` variable.
 
 **LLVM_USE_RELATIVE_PATHS_IN_DEBUG_INFO**:BOOL
   Rewrite absolute source paths in debug info to relative ones. The source prefix
-  can be adjusted via the LLVM_SOURCE_PREFIX variable.
+  can be adjusted via the ``LLVM_SOURCE_PREFIX`` variable.
 
 **LLVM_USE_SANITIZER**:STRING
   Define the sanitizer used to build LLVM binaries and tests. Possible values
@@ -916,9 +916,9 @@ its enabled sub-projects. Nearly all of these variable names begin with
 
 **SPHINX_OUTPUT_HTML**:BOOL
   If enabled (and ``LLVM_ENABLE_SPHINX`` is enabled) then the targets for
-  building the documentation as html are added (but not built by default unless
+  building the documentation as HTML are added (but not built by default unless
   ``LLVM_BUILD_DOCS`` is enabled). There is a target for each project in the
-  source tree that uses sphinx (e.g.  ``docs-llvm-html``, ``docs-clang-html``
+  source tree that uses sphinx (e.g.,  ``docs-llvm-html``, ``docs-clang-html``
   and ``docs-lld-html``). Defaults to ON.
 
 **SPHINX_OUTPUT_MAN**:BOOL
@@ -973,15 +973,15 @@ A few notes about CMake Caches:
 
 - Order of command line arguments is important
 
-  - ``-D`` arguments specified before -C are set before the cache is processed and
+  - ``-D`` arguments specified before ``-C`` are set before the cache is processed and
     can be read inside the cache file
-  - ``-D`` arguments specified after -C are set after the cache is processed and
+  - ``-D`` arguments specified after ``-C`` are set after the cache is processed and
     are unset inside the cache file
 
 - All ``-D`` arguments will override cache file settings
 - CMAKE_TOOLCHAIN_FILE is evaluated after both the cache file and the command
   line arguments
-- It is recommended that all ``-D`` options should be specified *before* -C
+- It is recommended that all ``-D`` options be specified *before* ``-C``
 
 For more information about some of the advanced build configurations supported
 via Cache files see :doc:`AdvancedBuilds`.
@@ -1004,7 +1004,7 @@ Cross compiling
 
 See `this wiki page <https://gitlab.kitware.com/cmake/community/wikis/doc/cmake/CrossCompiling>`_ for
 generic instructions on how to cross-compile with CMake. It goes into detailed
-explanations and may seem daunting, but it is not. On the wiki page there are
+explanations and may seem daunting, but it is not. The wiki page has
 several examples including toolchain files. Go directly to the
 ``Information how to set up various cross compiling toolchains`` section
 for a quick solution.
@@ -1015,12 +1015,12 @@ cross-compiling.
 Embedding LLVM in your project
 ==============================
 
-From LLVM 3.5 onwards the CMake build system exports LLVM libraries as
+From LLVM 3.5 onward, the CMake build system exports LLVM libraries as
 importable CMake targets. This means that clients of LLVM can now reliably use
 CMake to develop their own LLVM-based projects against an installed version of
 LLVM regardless of how it was built.
 
-Here is a simple example of a CMakeLists.txt file that imports the LLVM libraries
+Here is a simple example of a ``CMakeLists.txt`` file that imports the LLVM libraries
 and uses them to build a simple application ``simple-tool``.
 
 .. code-block:: cmake
@@ -1054,9 +1054,9 @@ and uses them to build a simple application ``simple-tool``.
 
 The ``find_package(...)`` directive when used in CONFIG mode (as in the above
 example) will look for the ``LLVMConfig.cmake`` file in various locations (see
-cmake manual for details).  It creates an ``LLVM_DIR`` cache entry to save the
+CMake manual for details).  It creates an ``LLVM_DIR`` cache entry to save the
 directory where ``LLVMConfig.cmake`` is found or allows the user to specify the
-directory (e.g. by passing ``-DLLVM_DIR=/usr/lib/cmake/llvm`` to
+directory (e.g., by passing ``-DLLVM_DIR=/usr/lib/cmake/llvm`` to
 the ``cmake`` command or by setting it directly in ``ccmake`` or ``cmake-gui``).
 
 This file is available in two different locations.
@@ -1081,7 +1081,7 @@ The ``LLVMConfig.cmake`` file sets various useful variables. Notable variables
 include:
 
 ``LLVM_CMAKE_DIR``
-  The path to the LLVM CMake directory (i.e. the directory containing
+  The path to the LLVM CMake directory (i.e., the directory containing
   ``LLVMConfig.cmake``).
 
 ``LLVM_DEFINITIONS``
@@ -1106,7 +1106,7 @@ include:
   (${LLVM_PACKAGE_VERSION} VERSION_LESS "3.5")``.
 
 ``LLVM_TOOLS_BINARY_DIR``
-  The path to the directory containing the LLVM tools (e.g. ``llvm-as``).
+  The path to the directory containing the LLVM tools (e.g., ``llvm-as``).
 
 Notice that in the above example we link ``simple-tool`` against several LLVM
 libraries. The list of libraries is determined by using the
@@ -1122,7 +1122,7 @@ and will be removed in a future version of LLVM.
 Developing LLVM passes out of source
 ------------------------------------
 
-You can develop LLVM passes out of LLVM's source tree (i.e. against an
+You can develop LLVM passes out of LLVM's source tree (i.e., against an
 installed or built LLVM). An example of a project layout is provided below.
 
 .. code-block:: none
@@ -1197,6 +1197,6 @@ Windows
   Studio 2010 CMake generator. 0 means use all processors. Default is 0.
 
 **CMAKE_MT**:STRING
-  When compiling with clang-cl, CMake may use `llvm-mt` as the Manifest Tool
-  when available. `llvm-mt` is only present when libxml2 is found at build-time.
+  When compiling with clang-cl, CMake may use ``llvm-mt`` as the Manifest Tool
+  when available. ```llvm-mt``` is only present when libxml2 is found at build-time.
   To ensure using Microsoft's Manifest Tool set `CMAKE_MT=mt`.
diff --git a/llvm/docs/CMakePrimer.rst b/llvm/docs/CMakePrimer.rst
index d7895ce3b627..5d244f443fba 100644
--- a/llvm/docs/CMakePrimer.rst
+++ b/llvm/docs/CMakePrimer.rst
@@ -19,7 +19,7 @@ The LLVM project and many of the core projects built on LLVM build using CMake.
 This document aims to provide a brief overview of CMake for developers modifying
 LLVM projects or building their own projects on top of LLVM.
 
-The official CMake language references is available in the cmake-language
+The official CMake language reference is available in the cmake-language
 manpage and `cmake-language online documentation
 <https://cmake.org/cmake/help/v3.4/manual/cmake-language.7.html>`_.
 
@@ -27,7 +27,7 @@ manpage and `cmake-language online documentation
 ==============
 
 CMake is a tool that reads script files in its own language that describe how a
-software project builds. As CMake evaluates the scripts it constructs an
+software project builds. As CMake evaluates the scripts, it constructs an
 internal representation of the software project. Once the scripts have been
 fully processed, if there are no errors, CMake will generate build files to
 actually build the project. CMake supports generating build files for a variety
@@ -58,8 +58,8 @@ program. The example uses only CMake language-defined functions.
    project(HelloWorld)
    add_executable(HelloWorld HelloWorld.cpp)
 
-The CMake language provides control flow constructs in the form of foreach loops
-and if blocks. To make the example above more complicated you could add an if
+The CMake language provides control flow constructs in the form of ``foreach`` loops
+and ``if`` blocks. To make the example above more complicated you could add an if
 block to define "APPLE" when targeting Apple platforms:
 
 .. code-block:: cmake
@@ -77,7 +77,7 @@ Variables, Types, and Scope
 Dereferencing
 -------------
 
-In CMake variables are "stringly" typed. All variables are represented as
+In CMake, variables are "stringly" typed. All variables are represented as
 strings throughout evaluation. Wrapping a variable in ``${}`` dereferences it
 and results in a literal substitution of the name for the value. CMake refers to
 this as "variable evaluation" in their documentation. Dereferences are performed
@@ -115,8 +115,8 @@ evaluated as empty before add_executable is given its arguments.
 Lists
 -----
 
-In CMake lists are semi-colon delimited strings, and it is strongly advised that
-you avoid using semi-colons in lists; it doesn't go smoothly. A few examples of
+In CMake, lists are semicolon-delimited strings, and it is strongly advised that
+you avoid using semicolons in lists; it doesn't go smoothly. A few examples of
 defining lists:
 
 .. code-block:: cmake
@@ -132,7 +132,7 @@ Lists of Lists
 --------------
 
 One of the more complicated patterns in CMake is lists of lists. Because a list
-cannot contain an element with a semi-colon to construct a list of lists you
+cannot contain an element with a semicolon to construct a list of lists you
 make a list of variable names that refer to other lists. For example:
 
 .. code-block:: cmake
@@ -160,15 +160,15 @@ the list.
 
 This pattern is used throughout CMake, the most common example is the compiler
 flags options, which CMake refers to using the following variable expansions:
-CMAKE_${LANGUAGE}_FLAGS and CMAKE_${LANGUAGE}_FLAGS_${CMAKE_BUILD_TYPE}.
+``CMAKE_${LANGUAGE}_FLAGS`` and ``CMAKE_${LANGUAGE}_FLAGS_${CMAKE_BUILD_TYPE}``.
 
 Other Types
 -----------
 
 Variables that are cached or specified on the command line can have types
 associated with them. The variable's type is used by CMake's UI tool to display
-the right input field. A variable's type generally doesn't impact evaluation,
-however CMake does have special handling for some variables such as PATH.
+the right input field. A variable's type generally doesn't impact evaluation;
+however, CMake does have special handling for some variables such as ``PATH``.
 You can read more about the special handling in `CMake's set documentation
 <https://cmake.org/cmake/help/v3.5/command/set.html#set-cache-entry>`_.
 
@@ -183,11 +183,11 @@ set in the scope they are included from, and all subdirectories.
 When a variable that is already set is set again in a subdirectory it overrides
 the value in that scope and any deeper subdirectories.
 
-The CMake set command provides two scope-related options. PARENT_SCOPE sets a
-variable into the parent scope, and not the current scope. The CACHE option sets
+The CMake set command provides two scope-related options. ``PARENT_SCOPE`` sets a
+variable into the parent scope, and not the current scope. The ``CACHE`` option sets
 the variable in the CMakeCache, which results in it being set in all scopes. The
-CACHE option will not set a variable that already exists in the CACHE unless the
-FORCE option is specified.
+``CACHE`` option will not set a variable that already exists in the ``CACHE`` unless the
+``FORCE`` option is specified.
 
 In addition to directory-based scope, CMake functions also have their own scope.
 This means variables set inside functions do not bleed into the parent scope.
@@ -213,7 +213,7 @@ If, ElseIf, Else
   `here <https://cmake.org/cmake/help/v3.4/command/if.html>`_. That resource is
   far more complete.
 
-In general CMake if blocks work the way you'd expect:
+In general, CMake ``if`` blocks work the way you'd expect:
 
 .. code-block:: cmake
 
@@ -225,7 +225,7 @@ In general CMake if blocks work the way you'd expect:
     message("do other other stuff")
   endif()
 
-The single most important thing to know about CMake's if blocks coming from a C
+The single most important thing to know about CMake's ``if`` blocks coming from a C
 background is that they do not have their own scope. Variables set inside
 conditional blocks persist after the ``endif()``.
 
@@ -317,20 +317,20 @@ Modules are CMake's vehicle for enabling code reuse. CMake modules are just
 CMake script files. They can contain code to execute on include as well as
 definitions for commands.
 
-In CMake macros and functions are universally referred to as commands, and they
+In CMake, macros and functions are universally referred to as commands, and they
 are the primary method of defining code that can be called multiple times.
 
 In LLVM we have several CMake modules that are included as part of our
 distribution for developers who don't build our project from source. Those
 modules are the fundamental pieces needed to build LLVM-based projects with
 CMake. We also rely on modules as a way of organizing the build system's
-functionality for maintainability and re-use within LLVM projects.
+functionality for maintainability and reuse within LLVM projects.
 
 Argument Handling
 -----------------
 
 When defining a CMake command handling arguments is very useful. The examples
-in this section will all use the CMake ``function`` block, but this all applies
+in this section will all use the CMake ``function`` block, but this also applies
 to the ``macro`` block as well.
 
 CMake commands can have named arguments that are required at every call site. In
@@ -395,7 +395,7 @@ result in some unexpected behavior if using unreferenced variables. For example:
    # c
    # d
 
-Generally speaking this issue is uncommon because it requires using
+Generally speaking, this issue is uncommon because it requires using
 non-dereferenced variables with names that overlap in the parent scope, but it
 is important to be aware of because it can lead to subtle bugs.
 
@@ -424,7 +424,7 @@ LLVM.
 Useful Built-in Commands
 ========================
 
-CMake has a bunch of useful built-in commands. This document isn't going to
+CMake has a collection of useful built-in commands. This document isn't going to
 go into details about them because The CMake project has excellent
 documentation. To highlight a few useful functions see:
 
diff --git a/llvm/docs/CodeGenerator.rst b/llvm/docs/CodeGenerator.rst
index 8260b5c17342..7486054b55c9 100644
--- a/llvm/docs/CodeGenerator.rst
+++ b/llvm/docs/CodeGenerator.rst
@@ -90,8 +90,8 @@ This design has two important implications.  The first is that LLVM can support
 completely non-traditional code generation targets.  For example, the C backend
 does not require register allocation, instruction selection, or any of the other
 standard components provided by the system.  As such, it only implements these
-two interfaces, and does its own thing. Note that C backend was removed from the
-trunk since LLVM 3.1 release. Another example of a code generator like this is a
+two interfaces, and does its own thing. Note that the C backend was removed from the
+trunk in the LLVM 3.1 release. Another example of a code generator like this is a
 (purely hypothetical) backend that converts LLVM to the GCC RTL form and uses
 GCC to emit machine code for a target.
 
@@ -565,7 +565,7 @@ Conceptually a MI bundle is a MI with a number of other MIs nested within:
 MI bundle support does not change the physical representations of
 MachineBasicBlock and MachineInstr. All the MIs (including top level and nested
 ones) are stored as sequential list of MIs. The "bundled" MIs are marked with
-the 'InsideBundle' flag. A top level MI with the special BUNDLE opcode is used
+the 'InsideBundle' flag. A top-level MI with the special BUNDLE opcode is used
 to represent the start of a bundle. It's legal to mix BUNDLE MIs with individual
 MIs that are not inside bundles nor represent bundles.
 
@@ -575,7 +575,7 @@ The MachineBasicBlock iterator has been modified to skip over bundled MIs to
 enforce the bundle-as-a-single-unit concept. An alternative iterator
 instr_iterator has been added to MachineBasicBlock to allow passes to iterate
 over all of the MIs in a MachineBasicBlock, including those which are nested
-inside bundles. The top level BUNDLE instruction must have the correct set of
+inside bundles. The top-level BUNDLE instruction must have the correct set of
 register MachineOperand's that represent the cumulative inputs and outputs of
 the bundled MIs.
 
@@ -602,7 +602,7 @@ level, devoid of "high level" information like "constant pools", "jump tables",
 "global variables" or anything like that.  At this level, LLVM handles things
 like label names, machine instructions, and sections in the object file.  The
 code in this layer is used for a number of important purposes: the tail end of
-the code generator uses it to write a .s or .o file, and it is also used by the
+the code generator uses it to write a ``.s`` or ``.o`` file, and it is also used by the
 llvm-mc tool to implement standalone machine code assemblers and disassemblers.
 
 This section describes some of the important classes.  There are also a number
@@ -615,8 +615,8 @@ The ``MCStreamer`` API
 ----------------------
 
 MCStreamer is best thought of as an assembler API.  It is an abstract API which
-is *implemented* in different ways (e.g. to output a .s file, output an ELF .o
-file, etc) but whose API correspond directly to what you see in a .s file.
+is *implemented* in different ways (e.g. to output a ``.s`` file, output an ELF ``.o``
+file, etc) but whose API corresponds directly to what you see in a ``.s`` file.
 MCStreamer has one method per directive, such as EmitLabel, EmitSymbolAttribute,
 switchSection, emitValue (for .byte, .word), etc, which directly correspond to
 assembly level directives.  It also has an EmitInstruction method, which is used
@@ -629,7 +629,7 @@ higher level LLVM IR and Machine* constructs down to the MC layer, emitting
 directives through MCStreamer.
 
 On the implementation side of MCStreamer, there are two major implementations:
-one for writing out a .s file (MCAsmStreamer), and one for writing out a .o
+one for writing out a ``.s`` file (MCAsmStreamer), and one for writing out a ``.o``
 file (MCObjectStreamer).  MCAsmStreamer is a straightforward implementation
 that prints out a directive for each method (e.g. ``EmitValue -> .byte``), but
 MCObjectStreamer implements a full assembler.
@@ -639,7 +639,7 @@ Each target that needs it defines a class that inherits from it and is a lot
 like MCStreamer itself: It has one method per directive and two classes that
 inherit from it, a target object streamer and a target asm streamer. The target
 asm streamer just prints it (``emitFnStart -> .fnstart``), and the object
-streamer implement the assembler logic for it.
+streamer implements the assembler logic for it.
 
 To make llvm use these classes, the target initialization must call
 TargetRegistry::RegisterAsmStreamer and TargetRegistry::RegisterMCObjectStreamer
@@ -667,7 +667,7 @@ MCSymbols are created by MCContext and uniqued there.  This means that MCSymbols
 can be compared for pointer equivalence to find out if they are the same symbol.
 Note that pointer inequality does not guarantee the labels will end up at
 different addresses though.  It's perfectly legal to output something like this
-to the .s file:
+to the ``.s`` file:
 
 ::
 
@@ -685,7 +685,7 @@ subclassed by object file specific implementations (e.g. ``MCSectionMachO``,
 ``MCSectionCOFF``, ``MCSectionELF``) and these are created and uniqued by
 MCContext.  The MCStreamer has a notion of the current section, which can be
 changed with the SwitchToSection method (which corresponds to a ".section"
-directive in a .s file).
+directive in a ``.s`` file).
 
 .. _MCInst:
 
@@ -887,7 +887,7 @@ bundled into a single scheduling-unit node, and with immediate operands and
 other nodes that aren't relevant for scheduling omitted.
 
 The option ``-filter-view-dags`` allows to select the name of the basic block
-that you are interested to visualize and filters all the previous
+that you are interested in visualizing and filters all the previous
 ``view-*-dags`` options.
 
 .. _Build initial DAG:
@@ -944,7 +944,7 @@ The Legalize phase is in charge of converting a DAG to only use the operations
 that are natively supported by the target.
 
 Targets often have weird constraints, such as not supporting every operation on
-every supported datatype (e.g. X86 does not support byte conditional moves and
+every supported data type (e.g. X86 does not support byte conditional moves and
 PowerPC does not support sign-extending loads from a 16-bit memory location).
 Legalize takes care of this by open-coding another sequence of operations to
 emulate the operation ("expansion"), by promoting one type to a larger type that
@@ -1077,7 +1077,7 @@ for your target.  It has the following strengths:
   if your patterns make sense or not.
 
 * It can handle arbitrary constraints on operands for the pattern match.  In
-  particular, it is straight-forward to say things like "match any immediate
+  particular, it is straightforward to say things like "match any immediate
   that is a 13-bit sign-extended value".  For examples, see the ``immSExt16``
   and related ``tblgen`` classes in the PowerPC backend.
 
@@ -1129,7 +1129,7 @@ for your target.  It has the following strengths:
 
     def STWU  : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst),
                     "stwu $rS, $dst", LdStStoreUpd, []>,
-                    RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">;
+                    RegConstraint<"$dst.reg = $ea_res">;
 
     def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff),
               (STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>;
@@ -1615,7 +1615,7 @@ Since the MC layer works at the level of abstraction of object files, it doesn't
 have a notion of functions, global variables etc.  Instead, it thinks about
 labels, directives, and instructions.  A key class used at this time is the
 MCStreamer class.  This is an abstract API that is implemented in different ways
-(e.g. to output a .s file, output an ELF .o file, etc) that is effectively an
+(e.g. to output a ``.s`` file, output an ELF ``.o`` file, etc) that is effectively an
 "assembler API".  MCStreamer has one method per directive, such as EmitLabel,
 EmitSymbolAttribute, switchSection, etc, which directly correspond to assembly
 level directives.
@@ -1648,7 +1648,7 @@ three important things that you have to implement for your target:
 
 Finally, at your choosing, you can also implement a subclass of MCCodeEmitter
 which lowers MCInst's into machine code bytes and relocations.  This is
-important if you want to support direct .o file emission, or would like to
+important if you want to support direct ``.o`` file emission, or would like to
 implement an assembler for your target.
 
 Emitting function stack size information
@@ -1678,7 +1678,7 @@ Instructions in a VLIW target can typically be mapped to multiple functional
 units. During the process of packetizing, the compiler must be able to reason
 about whether an instruction can be added to a packet. This decision can be
 complex since the compiler has to examine all possible mappings of instructions
-to functional units. Therefore to alleviate compilation-time complexity, the
+to functional units. Therefore, to alleviate compilation-time complexity, the
 VLIW packetizer parses the instruction classes of a target and generates tables
 at compiler build time. These tables can then be queried by the provided
 machine-independent API to determine if an instruction can be accommodated in a
@@ -1729,7 +1729,7 @@ Instruction Alias Processing
 ----------------------------
 
 Once the instruction is parsed, it enters the MatchInstructionImpl function.
-The MatchInstructionImpl function performs alias processing and then does actual
+The MatchInstructionImpl function performs alias processing and then performs actual
 matching.
 
 Alias processing is the phase that canonicalizes different lexical forms of the
@@ -1934,7 +1934,7 @@ following constraints are met:
 
 * Caller and callee have matching return type or the callee result is not used.
 
-* If any of the callee arguments are being passed in stack, they must be
+* If any of the callee arguments are being passed on the stack, they must be
   available in caller's own incoming argument stack and the frame offsets must
   be the same.
 
@@ -2074,7 +2074,7 @@ character per operand with an optional special size. For example:
 The PowerPC backend
 -------------------
 
-The PowerPC code generator lives in the lib/Target/PowerPC directory.  The code
+The PowerPC code generator lives in the ``lib/Target/PowerPC`` directory.  The code
 generation is retargetable to several variations or *subtargets* of the PowerPC
 ISA; including ppc32, ppc64 and altivec.
 
@@ -2140,7 +2140,7 @@ previous frame pointer (r31.)  The entries in the linkage area are the size of a
 GPR, thus the linkage area is 24 bytes long in 32-bit mode and 48 bytes in
 64-bit mode.
 
-32 bit linkage area:
+32-bit linkage area:
 
 :raw-html:`<table  border="1" cellspacing="0">`
 :raw-html:`<tr>`
@@ -2169,7 +2169,7 @@ GPR, thus the linkage area is 24 bytes long in 32-bit mode and 48 bytes in
 :raw-html:`</tr>`
 :raw-html:`</table>`
 
-64 bit linkage area:
+64-bit linkage area:
 
 :raw-html:`<table border="1" cellspacing="0">`
 :raw-html:`<tr>`
diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst
index 0c9fb6e94b6f..fc590a618031 100644
--- a/llvm/docs/CommandGuide/llvm-ir2vec.rst
+++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst
@@ -13,7 +13,9 @@ DESCRIPTION
 
 :program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It
 generates IR2Vec embeddings for LLVM IR and supports triplet generation 
-for vocabulary training. The tool provides three main subcommands:
+for vocabulary training. 
+
+The tool provides three main subcommands:
 
 1. **triplets**: Generates numeric triplets in train2id format for vocabulary
    training from LLVM IR.
@@ -93,7 +95,7 @@ Example Usage:
 
 .. code-block:: bash
 
-   llvm-ir2vec embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt
+   llvm-ir2vec embeddings --ir2vec-vocab-path=vocab.json --ir2vec-kind=symbolic --level=func input.bc -o embeddings.txt
 
 OPTIONS
 -------
@@ -129,6 +131,16 @@ Subcommand-specific options:
 
    Process only the specified function instead of all functions in the module.
 
+.. option:: --ir2vec-kind=<kind>
+
+   Specify the kind of IR2Vec embeddings to generate. Valid values are:
+
+   * ``symbolic`` - Generate symbolic embeddings (default)
+   * ``flow-aware`` - Generate flow-aware embeddings
+
+   Flow-aware embeddings consider control flow relationships between instructions,
+   while symbolic embeddings focus on the symbolic representation of instructions.
+
 .. option:: --ir2vec-vocab-path=<path>
 
    Specify the path to the vocabulary file (required for embedding generation).
diff --git a/llvm/docs/CommandGuide/llvm-objcopy.rst b/llvm/docs/CommandGuide/llvm-objcopy.rst
index 35d907fbe44d..343e1d8dbac9 100644
--- a/llvm/docs/CommandGuide/llvm-objcopy.rst
+++ b/llvm/docs/CommandGuide/llvm-objcopy.rst
@@ -79,6 +79,15 @@ multiple file formats.
  Enable deterministic mode when copying archives, i.e. use 0 for archive member
  header UIDs, GIDs and timestamp fields. On by default.
 
+.. option:: --extract-section <section>=<file>
+
+ Extract the specified section ``<section>`` into the file ``<file>`` as a
+ seperate object. Can be specified multiple times to extract multiple sections.
+ ``<file>`` is unrelated to the input and output files provided to
+ :program:`llvm-objcopy` and as such the normal copying and editing
+ operations will still be performed. No operations are performed on the sections
+ prior to dumping them.
+
 .. option:: --globalize-symbol <symbol>
 
  Mark any defined symbols named ``<symbol>`` as global symbols in the output.
diff --git a/llvm/docs/Contributing.rst b/llvm/docs/Contributing.rst
index 28b28ffda429..78bb92e5fa68 100644
--- a/llvm/docs/Contributing.rst
+++ b/llvm/docs/Contributing.rst
@@ -4,7 +4,7 @@ Contributing to LLVM
 
 
 Thank you for your interest in contributing to LLVM! There are multiple ways to
-contribute, and we appreciate all contributions. In case you have questions,
+contribute, and we appreciate all contributions. If you have questions,
 you can either use the `Forum`_ or, for a more interactive chat, go to our
 `Discord server`_.
 
@@ -20,14 +20,14 @@ Ways to Contribute
 Bug Reports
 -----------
 If you are working with LLVM and run into a bug, we definitely want to know
-about it. Please let us know and follow the instructions in
+about it. Please follow the instructions in
 :doc:`HowToSubmitABug`  to create a bug report.
 
 Bug Fixes
 ---------
 If you are interested in contributing code to LLVM, bugs labeled with the
 `good first issue`_ keyword in the `bug tracker`_ are a good way to get familiar with
-the code base. If you are interested in fixing a bug please comment on it to
+the code base. If you are interested in fixing a bug, please comment on it to
 let people know you are working on it.
 
 Then try to reproduce and fix the bug with upstream LLVM. Start by building
@@ -43,8 +43,8 @@ There is a separate process to submit security-related bugs, see :ref:`report-se
 
 Bigger Pieces of Work
 ---------------------
-In case you are interested in taking on a bigger piece of work, a list of
-interesting projects is maintained at the `LLVM's Open Projects page`_. In case
+If you are interested in taking on a bigger piece of work, a list of
+interesting projects is maintained at the `LLVM's Open Projects page`_. If
 you are interested in working on any of these projects, please post on the
 `Forum`_, so that we know the project is being worked on.
 
@@ -62,7 +62,7 @@ Once you have a patch ready, it is time to submit it. The patch should:
 
 .. _format patches:
 
-Before sending a patch for review, please also try to ensure it is
+Before sending a patch for review, please also ensure it is
 formatted properly. We use ``clang-format`` for this, which has git integration
 through the ``git-clang-format`` script. On some systems, it may already be
 installed (or be installable via your package manager). If so, you can simply
@@ -108,7 +108,7 @@ you will likely want to run one of the following to add the changes to a commit:
 
 The LLVM project has migrated to GitHub Pull Requests as its review process.
 For more information about the workflow of using GitHub Pull Requests see our
-:ref:`GitHub <github-reviews>` documentation. We still have an read-only
+:ref:`GitHub <github-reviews>` documentation. We still have a read-only
 `LLVM's Phabricator <https://reviews.llvm.org>`_ instance.
 
 To make sure the right people see your patch, please select suitable reviewers
@@ -146,12 +146,12 @@ For developers to commit changes from Git
    See also :ref:`GitHub <github-reviews>` for more details on merging your changes
    into LLVM project monorepo.
 
-Once a patch is reviewed, you can select the "Squash and merge" button in the
+Once a pull request is approved, you can select the "Squash and merge" button in the
 GitHub web interface.
 
 When pushing directly from the command-line to the ``main`` branch, you will need
 to rebase your change. LLVM has a linear-history policy, which means
-that merge commits are not allowed and the ``main`` branch is configured to reject
+that merge commits are not allowed, and the ``main`` branch is configured to reject
 pushes that include merges.
 
 GitHub will display a message that looks like this:
@@ -173,8 +173,8 @@ Please ask for help if you're having trouble with your particular git workflow.
 Git pre-push hook
 ^^^^^^^^^^^^^^^^^
 
-We include an optional pre-push hook that run some sanity checks on the revisions
-you are about to push and ask confirmation if you push multiple commits at once.
+We include an optional pre-push hook that runs some sanity checks on the revisions
+you are about to push and asks for confirmation if you push multiple commits at once.
 You can set it up (on Unix systems) by running from the repository root:
 
 .. code-block:: console
diff --git a/llvm/docs/DebuggingLLVM.rst b/llvm/docs/DebuggingLLVM.rst
new file mode 100644
index 000000000000..37ba2c06b281
--- /dev/null
+++ b/llvm/docs/DebuggingLLVM.rst
@@ -0,0 +1,121 @@
+==============
+Debugging LLVM
+==============
+
+This document is a collection of tips and tricks for debugging LLVM
+using a source-level debugger. The assumption is that you are trying to
+figure out the root cause of a miscompilation in the program that you
+are compiling.
+
+Extract and rerun the compile command
+=====================================
+
+Extract the Clang command that produces the buggy code. The way to do
+this depends on the build system used by your program.
+
+- For Ninja-based build systems, you can pass ``-t commands`` to Ninja
+  and filter the output by the targeted source file name. For example:
+  ``ninja -t commands myprogram | grep path/to/file.cpp``.
+
+- For Bazel-based build systems using Bazel 9 or newer (not released yet
+  as of this writing), you can pass ``--output=commands`` to the ``bazel
+  aquery`` subcommand for a similar result. For example: ``bazel aquery
+  --output=commands 'deps(//myprogram)' | grep path/to/file.cpp``. Build
+  commands must generally be run from a subdirectory of the source
+  directory named ``bazel-$PROJECTNAME``. Bazel typically makes the target
+  paths of ``-o`` and ``-MF`` read-only when running commands outside
+  of a build, so it may be necessary to change or remove these flags.
+
+- A method that should work with any build system is to build your program
+  under `Bear <https://github.com/rizsotto/Bear>`_ and look for the
+  compile command in the resulting ``compile_commands.json`` file.
+
+Once you have the command you can use the following steps to debug
+it. Note that any flags mentioned later in this document are LLVM flags
+so they must be prefixed with ``-mllvm`` when passed to the Clang driver,
+e.g. ``-mllvm -print-after-all``.
+
+Understanding the source of the issue
+=====================================
+
+If you have a miscompilation introduced by a pass, it is
+frequently possible to identify the pass where things go wrong
+by searching a pass-by-pass printout, which is enabled using the
+``-print-after-all`` flag. Pipe stderr into ``less`` (append ``2>&1 |
+less`` to command line) and use text search to move between passes
+(e.g. type ``/Dump After<Enter>``, ``n`` to move to next pass,
+``N`` to move to previous pass). If the name of the function
+containing the buggy IR is known, you can filter the output by passing
+``-filter-print-funcs=functionname``. You can sometimes pass ``-debug`` to
+get useful details about what passes are doing. See also  `PrintPasses.cpp
+<https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/PrintPasses.cpp>`_
+for more useful options.
+
+Creating a debug build of LLVM
+==============================
+
+The subsequent debugging steps require a debug build of LLVM. Pass the
+``-DCMAKE_BUILD_TYPE=Debug`` to CMake in a separate build tree to create
+a debug build.
+
+Understanding where an instruction came from
+============================================
+
+A common debugging task involves understanding which part of the code
+introduced a buggy instruction. The pass-by-pass dump is sometimes enough,
+but for complex or unfamiliar passes, more information is often required.
+
+The first step is to record a run of the debug build of Clang under `rr
+<https://rr-project.org>`_ passing the LLVM flag ``-print-inst-addrs``
+together with ``-print-after-all`` and any desired filters. This will
+cause each instruction printed by LLVM to be suffixed with a comment
+showing the address of the ``Instruction`` object. You can then replay
+the run of Clang with ``rr replay``. Because ``rr`` is deterministic,
+the instruction will receive the same address during the replay, so
+you can break on the instruction's construction using a conditional
+breakpoint that checks for the address printed by LLVM, with commands
+such as the following:
+
+.. code-block:: text
+
+    b Instruction::Instruction if this == 0x12345678
+
+When the breakpoint is hit, you will likely be at the location where
+the instruction was created, so you can unwind the stack with ``bt``
+to see the stack trace. It is also possible that an instruction was
+created multiple times at the same address, so you may need to continue
+until reaching the desired location, but in the author's experience this
+is unlikely to occur.
+
+Identifying the source locations of instructions
+================================================
+
+To identify the source location that caused a particular instruction
+to be created, you can pass the LLVM flag ``-print-inst-debug-locs``
+and each instruction printed by LLVM is suffixed with the file and line
+number of the instruction according to the debug information. Note that
+this requires debug information to be enabled (e.g. pass ``-g`` to Clang).
+
+LLDB Data Formatters
+====================
+
+A handful of `LLDB data formatters
+<https://lldb.llvm.org/resources/dataformatters.html>`__ are
+provided for some of the core LLVM libraries. To use them, execute the
+following (or add it to your ``~/.lldbinit``)::
+
+  command script import /path/to/llvm/utils/lldbDataFormatters.py
+
+GDB pretty printers
+===================
+
+A handful of `GDB pretty printers
+<https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html>`__ are
+provided for some of the core LLVM libraries. To use them, execute the
+following (or add it to your ``~/.gdbinit``)::
+
+  source /path/to/llvm/utils/gdb-scripts/prettyprinters.py
+
+It also might be handy to enable the `print pretty
+<https://sourceware.org/gdb/current/onlinedocs/gdb.html/Print-Settings.html>`__
+option to avoid data structures being printed as a big block of text.
diff --git a/llvm/docs/DeveloperPolicy.rst b/llvm/docs/DeveloperPolicy.rst
index eb59c4953dc2..b54f111ed091 100644
--- a/llvm/docs/DeveloperPolicy.rst
+++ b/llvm/docs/DeveloperPolicy.rst
@@ -296,54 +296,53 @@ Quality
 The minimum quality standards that any change must satisfy before being
 committed to the main development branch are:
 
-#. Code must adhere to the `LLVM Coding Standards <CodingStandards.html>`_.
+#. Code must adhere to the :doc:`LLVM Coding Standards <CodingStandards>`.
 
 #. Code must compile cleanly (no errors, no warnings) on at least one platform.
 
 #. Bug fixes and new features should `include a testcase`_ so we know if the
    fix/feature ever regresses in the future.
 
-#. Code must pass the ``llvm/test`` test suite.
-
-#. The code must not cause regressions on a reasonable subset of llvm-test,
-   where "reasonable" depends on the contributor's judgement and the scope of
-   the change (more invasive changes require more testing). A reasonable subset
-   might be something like "``llvm-test/MultiSource/Benchmarks``".
+#. Pull requests should build and pass premerge checks. For first-time
+   contributors, this will require an initial cursory review to run the checks.
 
 #. Ensure that links in source code and test files point to publicly available
-   resources and are used primarily to add additional information rather than
-   to supply critical context. The surrounding comments should be sufficient
-   to provide the context behind such links.
+   resources and are used primarily to add additional information rather than to
+   supply critical context. The surrounding comments should be sufficient to
+   provide the context behind such links.
 
 Additionally, the committer is responsible for addressing any problems found in
 the future that the change is responsible for.  For example:
 
-* The code should compile cleanly on all supported platforms.
+* The code needs to compile cleanly and pass tests on all stable `LLVM
+  buildbots <https://lab.llvm.org/buildbot/>`_.
 
-* The changes should not cause any correctness regressions in the ``llvm-test``
-  suite and must not cause any major performance regressions.
+* The changes should not cause any correctness regressions in the
+  `llvm-test-suite <https://github.com/llvm/llvm-test-suite>`_
+  and must not cause any major performance regressions.
 
 * The change set should not cause performance or correctness regressions for the
-  LLVM tools.
+  LLVM tools. See `llvm-compile-time-tracker.com <https://llvm-compile-time-tracker.com>`_
 
 * The changes should not cause performance or correctness regressions in code
   compiled by LLVM on all applicable targets.
 
-* You are expected to address any `GitHub Issues <https://github.com/llvm/llvm-project/issues>`_ that
-  result from your change.
+* You are expected to address any `GitHub Issues
+  <https://github.com/llvm/llvm-project/issues>`_ that result from your change.
 
-We prefer for this to be handled before submission but understand that it isn't
-possible to test all of this for every submission.  Our build bots and nightly
-testing infrastructure normally finds these problems.  A good rule of thumb is
-to check the nightly testers for regressions the day after your change.  Build
-bots will directly email you if a group of commits that included yours caused a
+Our build bots and `nightly testing infrastructure
+<https://llvm.org/docs/lnt/intro.html>`_ find many of these issues. Build bots
+will directly email you if a group of commits that included yours caused a
 failure.  You are expected to check the build bot messages to see if they are
-your fault and, if so, fix the breakage.
+your fault and, if so, fix the breakage. However, keep in mind that if you
+receive such an email, it is highly likely that your change is not at fault.
+Changes are batched together precisely because these tests are generally too
+expensive to run continuously for every change.
+
+Commits that violate these quality standards may be reverted (see below). This
+is necessary when the change blocks other developers from making progress. The
+developer is welcome to re-commit the change after the problem has been fixed.
 
-Commits that violate these quality standards (e.g. are very broken) may be
-reverted. This is necessary when the change blocks other developers from making
-progress. The developer is welcome to re-commit the change after the problem has
-been fixed.
 
 .. _commit messages:
 
diff --git a/llvm/docs/ExceptionHandling.rst b/llvm/docs/ExceptionHandling.rst
index bb72e5a71a77..48974861b2a1 100644
--- a/llvm/docs/ExceptionHandling.rst
+++ b/llvm/docs/ExceptionHandling.rst
@@ -31,7 +31,7 @@ algorithm.  Thus, the specification is said to add "zero-cost" to the normal
 execution of an application.
 
 A more complete description of the Itanium ABI exception handling runtime
-support of can be found at `Itanium C++ ABI: Exception Handling
+support can be found at `Itanium C++ ABI: Exception Handling
 <http://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html>`_. A description of the
 exception frame format can be found at `Exception Frames
 <http://refspecs.linuxfoundation.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html>`_,
@@ -145,7 +145,7 @@ exception. In those circumstances, the LLVM C++ front-end replaces the call with
 an ``invoke`` instruction. Unlike a call, the ``invoke`` has two potential
 continuation points:
 
-#. where to continue when the call succeeds as per normal, and
+#. where to continue when the call succeeds normally, and
 
 #. where to continue if the call raises an exception, either by a throw or the
    unwinding of a throw
@@ -280,7 +280,7 @@ Throw Filters
 -------------
 
 Prior to C++17, C++ allowed the specification of which exception types may be
-thrown from a function. To represent this, a top level landing pad may exist to
+thrown from a function. To represent this, a top-level landing pad may exist to
 filter out invalid types. To express this in LLVM code the :ref:`i_landingpad`
 will have a filter clause. The clause consists of an array of type infos.
 ``landingpad`` will return a negative value
@@ -437,7 +437,7 @@ exception handling frame that defines information common to all functions in the
 unit.
 
 The format of this call frame information (CFI) is often platform-dependent,
-however. ARM, for example, defines their own format. Apple has their own compact
+however. ARM, for example, defines its own format. Apple has its own compact
 unwind info format.  On Windows, another format is used for all architectures
 since 32-bit x86.  LLVM will emit whatever information is required by the
 target.
@@ -467,7 +467,7 @@ on Itanium C++ ABI platforms. The fundamental difference between the two models
 is that Itanium EH is designed around the idea of "successive unwinding," while
 Windows EH is not.
 
-Under Itanium, throwing an exception typically involves allocating thread local
+Under Itanium, throwing an exception typically involves allocating thread-local
 memory to hold the exception, and calling into the EH runtime. The runtime
 identifies frames with appropriate exception handling actions, and successively
 resets the register context of the current thread to the most recently active
@@ -482,7 +482,7 @@ release its memory, and resume normal control flow.
 The Windows EH model does not use these successive register context resets.
 Instead, the active exception is typically described by a frame on the stack.
 In the case of C++ exceptions, the exception object is allocated in stack memory
-and its address is passed to ``__CxxThrowException``. General purpose structured
+and its address is passed to ``__CxxThrowException``. General-purpose structured
 exceptions (SEH) are more analogous to Linux signals, and they are dispatched by
 userspace DLLs provided with Windows. Each frame on the stack has an assigned EH
 personality routine, which decides what actions to take to handle the exception.
@@ -504,7 +504,7 @@ The C++ personality also uses funclets to contain the code for catch blocks
 (i.e. all user code between the braces in ``catch (Type obj) { ... }``). The
 runtime must use funclets for catch bodies because the C++ exception object is
 allocated in a child stack frame of the function handling the exception. If the
-runtime rewound the stack back to frame of the catch, the memory holding the
+runtime rewound the stack back to the frame of the catch, the memory holding the
 exception would be overwritten quickly by subsequent function calls.  The use of
 funclets also allows ``__CxxFrameHandler3`` to implement rethrow without
 resorting to TLS. Instead, the runtime throws a special exception, and then uses
@@ -512,7 +512,7 @@ SEH (``__try / __except``) to resume execution with new information in the child
 frame.
 
 In other words, the successive unwinding approach is incompatible with Visual
-C++ exceptions and general purpose Windows exception handling. Because the C++
+C++ exceptions and general-purpose Windows exception handling. Because the C++
 exception object lives in stack memory, LLVM cannot provide a custom personality
 function that uses landingpads.  Similarly, SEH does not provide any mechanism
 to rethrow an exception or continue unwinding.  Therefore, LLVM must use the IR
@@ -780,8 +780,8 @@ structure, which funclet-based personalities may require.
 Exception Handling support on the target
 =================================================
 
-In order to support exception handling on particular target, there are a few
-items need to be implemented.
+In order to support exception handling on a particular target, there are a few
+items that need to be implemented.
 
 * CFI directives
 
@@ -791,7 +791,7 @@ items need to be implemented.
   to specify how to calculate the CFA (Canonical Frame Address) and how register
   is restored from the address pointed by the CFA with an offset. The assembler
   is instructed by CFI directives to build ``.eh_frame`` section, which is used
-  by th unwinder to unwind stack during exception handling.
+  by the unwinder to unwind the stack during exception handling.
 
 * ``getExceptionPointerRegister`` and ``getExceptionSelectorRegister``
 
@@ -807,13 +807,13 @@ items need to be implemented.
   which adjusts the stack by offset and then jumps to the handler. ``__builtin_eh_return``
   is used in GCC unwinder (`libgcc <https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html>`_),
   but not in LLVM unwinder (`libunwind <https://clang.llvm.org/docs/Toolchain.html#unwind-library>`_).
-  If you are on the top of ``libgcc`` and have particular requirement on your target,
+  If you are on the top of ``libgcc`` and have a particular requirement on your target,
   you have to handle ``EH_RETURN`` in ``TargetLowering``.
 
 If you don't leverage the existing runtime (``libstdc++`` and ``libgcc``),
-you have to take a look on `libc++ <https://libcxx.llvm.org/>`_ and
+you have to take a look at `libc++ <https://libcxx.llvm.org/>`_ and
 `libunwind <https://clang.llvm.org/docs/Toolchain.html#unwind-library>`_
-to see what have to be done there. For ``libunwind``, you have to do the following
+to see what has to be done there. For ``libunwind``, you have to do the following:
 
 * ``__libunwind_config.h``
 
@@ -821,11 +821,11 @@ to see what have to be done there. For ``libunwind``, you have to do the followi
 
 * ``include/libunwind.h``
 
-  Define enum for the target registers.
+  Define an enum for the target registers.
 
 * ``src/Registers.hpp``
 
-  Define ``Registers`` class for your target, implement setter and getter functions.
+  Define a ``Registers`` class for your target, implement setter and getter functions.
 
 * ``src/UnwindCursor.hpp``
 
@@ -834,8 +834,8 @@ to see what have to be done there. For ``libunwind``, you have to do the followi
 
 * ``src/UnwindRegistersRestore.S``
 
-  Write an assembly function to restore all your target registers from the memory.
+  Write an assembly function to restore all your target registers from memory.
 
 * ``src/UnwindRegistersSave.S``
 
-  Write an assembly function to save all your target registers on the memory.
+  Write an assembly function to save all your target registers to memory.
diff --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst
index d8fb87b6998a..6b1d9fe08e91 100644
--- a/llvm/docs/Extensions.rst
+++ b/llvm/docs/Extensions.rst
@@ -410,8 +410,8 @@ two years old). Each function entry starts with a version byte which specifies
 the encoding version to use. This is followed by a feature byte which specifies
 the features specific to this particular entry. The function base address is
 stored as a full address. Other addresses in the entry (block begin and end
-addresses and callsite addresses) are stored in a running-offset fashion, as
-offsets relative to prior addresses.
+addresses and callsite end addresses) are stored in a running-offset fashion,
+as offsets relative to prior addresses.
 
 The following versioning schemes are currently supported (newer versions support
 features of the older versions).
@@ -438,8 +438,8 @@ Example:
    .byte     1                            # BB_1 ID
    .uleb128  .LBB0_1-.LBB_END0_0          # BB_1 offset relative to the end of last block (BB_0).
    .byte     2                            # number of callsites in this block
-   .uleb128  .LBB0_1_CS0-.LBB0_1          # offset of callsite relative to the previous offset (.LBB0_1)
-   .uleb128  .LBB0_1_CS1-.LBB0_1_CS0      # offset of callsite relative to the previous offset (.LBB0_1_CS0)
+   .uleb128  .LBB0_1_CS0-.LBB0_1          # offset of callsite end relative to the previous offset (.LBB0_1)
+   .uleb128  .LBB0_1_CS1-.LBB0_1_CS0      # offset of callsite end relative to the previous offset (.LBB0_1_CS0)
    .uleb128  .LBB_END0_1-.LBB0_1_CS1      # BB_1 size offset (Offset of the block end relative to the previous offset).
    .byte     y                            # BB_1 metadata
 
diff --git a/llvm/docs/GettingStartedTutorials.rst b/llvm/docs/GettingStartedTutorials.rst
index 55060343ba36..61253e39c34d 100644
--- a/llvm/docs/GettingStartedTutorials.rst
+++ b/llvm/docs/GettingStartedTutorials.rst
@@ -11,6 +11,7 @@ For those new to the LLVM system.
    GettingStarted
    GettingStartedVS
    ProgrammersManual
+   DebuggingLLVM
    tutorial/index
    MyFirstTypoFix
 
@@ -27,6 +28,9 @@ For those new to the LLVM system.
   Introduction to the general layout of the LLVM sourcebase, important classes
   and APIs, and some tips & tricks.
 
+:doc:`DebuggingLLVM`
+  Provides information about how to debug LLVM.
+
 :doc:`Frontend/PerformanceTips`
    A collection of tips for frontend authors on how to generate IR
    which LLVM is able to effectively optimize.
diff --git a/llvm/docs/GitHub.rst b/llvm/docs/GitHub.rst
index f43792be6f93..eae693b1d06c 100644
--- a/llvm/docs/GitHub.rst
+++ b/llvm/docs/GitHub.rst
@@ -449,9 +449,9 @@ interface.
 
 Here is an example of making a PR using git and the GitHub web interface:
 
-First follow the instructions to [fork the repository](https://docs.github.com/en/get-started/quickstart/fork-a-repo?tool=webui#forking-a-repository).
+First follow the instructions to `fork the repository <https://docs.github.com/en/get-started/quickstart/fork-a-repo?tool=webui#forking-a-repository>`_.
 
-Next follow the instructions to [clone your forked repository](https://docs.github.com/en/get-started/quickstart/fork-a-repo?tool=webui#cloning-your-forked-repository).
+Next follow the instructions to `clone your forked repository <https://docs.github.com/en/get-started/quickstart/fork-a-repo?tool=webui#cloning-your-forked-repository>`_.
 
 Once you've cloned your forked repository,
 
diff --git a/llvm/docs/GlobalISel/GenericOpcode.rst b/llvm/docs/GlobalISel/GenericOpcode.rst
index eefd76de9c33..b05532746673 100644
--- a/llvm/docs/GlobalISel/GenericOpcode.rst
+++ b/llvm/docs/GlobalISel/GenericOpcode.rst
@@ -1150,6 +1150,15 @@ An alignment value of `0` or `1` means no specific alignment.
 
   %8:_(p0) = G_DYN_STACKALLOC %7(s64), 32
 
+G_FREEZE
+^^^^^^^^
+
+G_FREEZE is used to stop propagation of undef and poison values.
+
+.. code-block:: none
+
+  %1:_(s32) = G_FREEZE %0(s32)
+
 Optimization Hints
 ------------------
 
diff --git a/llvm/docs/InstCombineContributorGuide.md b/llvm/docs/InstCombineContributorGuide.md
index cee0a7ce446a..12567fc36f1d 100644
--- a/llvm/docs/InstCombineContributorGuide.md
+++ b/llvm/docs/InstCombineContributorGuide.md
@@ -96,22 +96,9 @@ may be omitted.
 If the transform involves commutative operations, add tests with commuted
 (swapped) operands.
 
-Make sure that the operand order stays intact in the CHECK lines of your
-pre-commited tests. You should not see something like this:
-
-```llvm
-; CHECK-NEXT: [[OR:%.*]] = or i8 [[X]], [[Y]]
-; ...
-%or = or i8 %y, %x
-```
-
-If this happens, you may need to change one of the operands to have higher
-complexity (include the "thwart" comment in that case):
-
-```llvm
-%y2 = mul i8 %y, %y ; thwart complexity-based canonicalization
-%or = or i8 %y, %x
-```
+As an exception, it is not necessary to test commutation if one of the operands
+is a constant: In this case, the constant operand is always canonicalized to
+the right.
 
 ### Add vector tests
 
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 75f1c9b180be..45ae2327323d 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -415,6 +415,8 @@ added in the future:
 
     - On RISC-V the callee preserves x5-x31 except x6, x7 and x28 registers.
 
+    - On LoongArch the callee preserves r4-r31 except r12-r15 and r20-r21 registers.
+
     The idea behind this convention is to support calls to runtime functions
     that have a hot path and a cold path. The hot path is usually a small piece
     of code that doesn't use many registers. The cold path might need to call out to
@@ -665,7 +667,7 @@ representation; that is, the integral representation may be target dependent or
 unstable (not backed by a fixed integer).
 
 ``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
-integral (i.e. normal) pointers in that they convert integers to and from
+integral (i.e., normal) pointers in that they convert integers to and from
 corresponding pointer types, but there are additional implications to be
 aware of.  Because the bit-representation of a non-integral pointer may
 not be stable, two identical casts of the same operand may or may not
@@ -729,7 +731,7 @@ optimizations based on the 'constantness' are valid for the translation
 units that do not include the definition.
 
 As SSA values, global variables define pointer values that are in scope
-for (i.e. they dominate) all basic blocks in the program. Global variables
+for (i.e., they dominate) all basic blocks in the program. Global variables
 always define a pointer to their "content" type because they describe a
 region of memory, and all :ref:`allocated object<allocatedobjects>` in LLVM are
 accessed through pointers.
@@ -932,7 +934,7 @@ would be used implicitly.
 
 The first basic block in a function is special in two ways: it is
 immediately executed on entrance to the function, and it is not allowed
-to have predecessor basic blocks (i.e. there can not be any branches to
+to have predecessor basic blocks (i.e., there can not be any branches to
 the entry block of a function). Because the block can have no
 predecessors, it also cannot have any :ref:`PHI nodes <i_phi>`.
 
@@ -1485,7 +1487,7 @@ Currently, only the following parameter attributes are defined:
     a pointer is exactly one of ``dereferenceable(<n>)`` or ``null``,
     and in other address spaces ``dereferenceable_or_null(<n>)``
     implies that a pointer is at least one of ``dereferenceable(<n>)``
-    or ``null`` (i.e. it may be both ``null`` and
+    or ``null`` (i.e., it may be both ``null`` and
     ``dereferenceable(<n>)``). This attribute may only be applied to
     pointer typed parameters.
 
@@ -2346,7 +2348,7 @@ For example:
        fully changed via an atomic compare-and-swap instruction.
        While the first requirement can be satisfied by inserting large
        enough NOP, LLVM can and will try to re-purpose an existing
-       instruction (i.e. one that would have to be emitted anyway) as
+       instruction (i.e., one that would have to be emitted anyway) as
        the patchable instruction larger than a short jump.
 
        ``"prologue-short-redirect"`` is currently only supported on
@@ -3259,7 +3261,7 @@ the preceding ``:`` should also be omitted and ``<pref>`` will be equal to
 
 Unless explicitly stated otherwise, every alignment specification is provided in
 bits and must be in the range [1,2^16). The value must be a power of two times
-the width of a byte (i.e. ``align = 8 * 2^N``).
+the width of a byte (i.e., ``align = 8 * 2^N``).
 
 When constructing the data layout for a given target, LLVM starts with a
 default set of specifications which are then (possibly) overridden by
@@ -3271,7 +3273,6 @@ specifications are given in this list:
 -  ``p[n]:64:64:64`` - Other address spaces are assumed to be the
    same as the default address space.
 -  ``S0`` - natural stack alignment is unspecified
--  ``i1:8:8`` - i1 is 8-bit (byte) aligned
 -  ``i8:8:8`` - i8 is 8-bit (byte) aligned as mandated
 -  ``i16:16:16`` - i16 is 16-bit aligned
 -  ``i32:32:32`` - i32 is 32-bit aligned
@@ -4478,7 +4479,7 @@ the type size is smaller than the type's store size.
       < vscale x <# elements> x <elementtype> > ; Scalable vector
 
 The number of elements is a constant integer value larger than 0;
-elementtype may be any integer, floating-point, pointer type, or a sized  
+elementtype may be any integer, floating-point, pointer type, or a sized
 target extension type that has the ``CanBeVectorElement`` property. Vectors
 of size zero are not allowed. For scalable vectors, the total number of
 elements is a constant multiple (called vscale) of the specified number
@@ -4679,7 +4680,7 @@ Simple Constants
     '``s0x8000``' gives -32768.
 
     Note that hexadecimal integers are sign extended from the number
-    of active bits, i.e. the bit width minus the number of leading
+    of active bits, i.e., the bit width minus the number of leading
     zeros. So '``s0x0001``' of type '``i16``' will be -1, not 1.
 **Floating-point constants**
     Floating-point constants use standard decimal notation (e.g.
@@ -5551,9 +5552,9 @@ AArch64:
 
 - ``z``: An immediate integer 0. Outputs ``WZR`` or ``XZR``, as appropriate.
 - ``I``: An immediate integer valid for an ``ADD`` or ``SUB`` instruction,
-  i.e. 0 to 4095 with optional shift by 12.
+  i.e., 0 to 4095 with optional shift by 12.
 - ``J``: An immediate integer that, when negated, is valid for an ``ADD`` or
-  ``SUB`` instruction, i.e. -1 to -4095 with optional left shift by 12.
+  ``SUB`` instruction, i.e., -1 to -4095 with optional left shift by 12.
 - ``K``: An immediate integer that is valid for the 'bitmask immediate 32' of a
   logical instruction like ``AND``, ``EOR``, or ``ORR`` with a 32-bit register.
 - ``L``: An immediate integer that is valid for the 'bitmask immediate 64' of a
@@ -6890,7 +6891,7 @@ label identifier. The ``file:`` field is the :ref:`DIFile` the label is
 present in. The ``line:`` and ``column:`` field are the source line and column
 within the file where the label is declared.
 
-Furthermore, a label can be marked as artificial, i.e. compiler-generated,
+Furthermore, a label can be marked as artificial, i.e., compiler-generated,
 using ``isArtificial:``. Such artificial labels are generated, e.g., by
 the ``CoroSplit`` pass. In addition, the ``CoroSplit`` pass also uses the
 ``coroSuspendIdx:`` field to identify the coroutine suspend points.
@@ -7804,7 +7805,7 @@ identification metadata.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 This metadata defines which attributes extracted loops with no cyclic
-dependencies will have (i.e. can be vectorized). See
+dependencies will have (i.e., can be vectorized). See
 :ref:`Transformation Metadata <transformation-metadata>` for details.
 
 '``llvm.loop.distribute.followup_sequential``' Metadata
@@ -7832,7 +7833,7 @@ loop distribution pass. See
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 If a loop was successfully processed by the loop distribution pass,
-this metadata is added (i.e. has been distributed).  See
+this metadata is added (i.e., has been distributed).  See
 :ref:`Transformation Metadata <transformation-metadata>` for details.
 
 '``llvm.licm.disable``' Metadata
@@ -8349,6 +8350,44 @@ spaces. The interpretation of the address space values is target specific.
 The behavior is undefined if the runtime memory address does
 resolve to an object defined in one of the indicated address spaces.
 
+'``mmra``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``mmra`` metadata represents target-defined properties on instructions that
+can be used to selectively relax constraints placed by the memory model.
+
+Refer to :doc:`MemoryModelRelaxationAnnotations` for more information on how this metadata
+affects the memory model of a given target.
+
+It is attached to memory instructions such as:
+:ref:`atomicrmw <i_atomicrmw>`, :ref:`cmpxchg <i_cmpxchg>`, :ref:`load <i_load>`,
+:ref:`store <i_store>`, :ref:`fence <i_fence>` and
+:ref:`call <i_call>` instructions that read or write memory.
+
+The metadata is structured as pairs of strings: a prefix, and suffix that form a MMRA "tag".
+The ``!mmra`` operand can either point to a pair of metadata strings, or a tuple containing
+multiple pairs of metadata strings.
+
+Example:
+
+.. code-block:: llvm
+
+    ; Simple pair of strings used directly:
+    %rmw.valid = atomicrmw and ptr %ptr, i64 %value seq_cst, !mmra !0
+
+    ; Using multiple pairs of strings using a metadata tuple:
+    %rmw.valid = atomicrmw and ptr %ptr, i64 %value seq_cst, !mmra !2
+
+    !0 = !{!"amdgpu-synchronize-as", !"global"}
+    !1 = !{!"amdgpu-synchronize-as", !"private"}
+    !2 = !{!0, !1}
+
+'``nofree``' Metadata
+^^^^^^^^^^^^^^^^^^^^^
+
+The ``nofree`` metadata indicates the memory pointed by the pointer will not be
+freed after the attached instruction.
+
 
 Module Flags Metadata
 =====================
@@ -8684,7 +8723,7 @@ input file).
 Eventually, the summary will be parsed into a ModuleSummaryIndex object under
 the same conditions where summary index is currently built from bitcode.
 Specifically, tools that test the Thin Link portion of a ThinLTO compile
-(i.e. llvm-lto and llvm-lto2), or when parsing a combined index
+(i.e., llvm-lto and llvm-lto2), or when parsing a combined index
 for a distributed ThinLTO backend via clang's "``-fthinlto-index=<>``" flag
 (this part is not yet implemented, use llvm-as to create a bitcode object
 before feeding into thin link tools for now).
@@ -9111,7 +9150,7 @@ The '``llvm.global_ctors``' Global Variable
 The ``@llvm.global_ctors`` array contains a list of constructor
 functions, priorities, and an associated global or function.
 The functions referenced by this array will be called in ascending order
-of priority (i.e. lowest first) when the module is loaded. The order of
+of priority (i.e., lowest first) when the module is loaded. The order of
 functions with the same priority is not defined.
 
 If the third field is non-null, and points to a global variable
@@ -9132,7 +9171,7 @@ The '``llvm.global_dtors``' Global Variable
 The ``@llvm.global_dtors`` array contains a list of destructor
 functions, priorities, and an associated global or function.
 The functions referenced by this array will be called in descending
-order of priority (i.e. highest first) when the module is unloaded. The
+order of priority (i.e., highest first) when the module is unloaded. The
 order of functions with the same priority is not defined.
 
 If the third field is non-null, and points to a global variable
@@ -11177,7 +11216,7 @@ Arguments:
 
 The argument to the ``load`` instruction specifies the memory address from which
 to load. The type specified must be a :ref:`first class <t_firstclass>` type of
-known size (i.e. not containing an :ref:`opaque structural type <t_opaque>`). If
+known size (i.e., not containing an :ref:`opaque structural type <t_opaque>`). If
 the ``load`` is marked as ``volatile``, then the optimizer is not allowed to
 modify the number or order of execution of this ``load`` with other
 :ref:`volatile operations <volatile>`.
@@ -11320,7 +11359,7 @@ pointer to the :ref:`first class <t_firstclass>` type of the ``<value>``
 operand. If the ``store`` is marked as ``volatile``, then the optimizer is not
 allowed to modify the number or order of execution of this ``store`` with other
 :ref:`volatile operations <volatile>`.  Only values of :ref:`first class
-<t_firstclass>` types of known size (i.e. not containing an :ref:`opaque
+<t_firstclass>` types of known size (i.e., not containing an :ref:`opaque
 structural type <t_opaque>`) can be stored.
 
 If the ``store`` is marked as ``atomic``, it takes an extra :ref:`ordering
@@ -12461,7 +12500,7 @@ Syntax:
 
 ::
 
-      <result> = inttoptr <ty> <value> to <ty2>[, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>]             ; yields ty2
+      <result> = inttoptr <ty> <value> to <ty2>[, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>][, !nofree !<empty_node>]            ; yields ty2
 
 Overview:
 """""""""
@@ -12486,6 +12525,11 @@ metadata name ``<deref_bytes_node>`` corresponding to a metadata node with one
 ``i64`` entry.
 See ``dereferenceable_or_null`` metadata.
 
+The optional ``!nofree`` metadata must reference a single metadata name
+``<empty_node>`` corresponding to a metadata node with no entries.
+The existence of the ``!nofree`` metadata on the instruction tells the optimizer
+that the memory pointed by the pointer will not be freed after this point.
+
 Semantics:
 """"""""""
 
@@ -12609,7 +12653,7 @@ If the source is :ref:`poison <poisonvalues>`, the result is
 If the source is not :ref:`poison <poisonvalues>`, and both source and
 destination are :ref:`integral pointers <nointptrtype>`, and the
 result pointer is dereferenceable, the cast is assumed to be
-reversible (i.e. casting the result back to the original address space
+reversible (i.e., casting the result back to the original address space
 should yield the original bit pattern).
 
 Which address space casts are supported depends on the target. Unsupported
@@ -12869,7 +12913,7 @@ the value arguments to the PHI node. Only labels may be used as the
 label arguments.
 
 There must be no non-phi instructions between the start of a basic block
-and the PHI instructions: i.e. PHI instructions must be first in a basic
+and the PHI instructions: i.e., PHI instructions must be first in a basic
 block.
 
 For the purposes of the SSA form, the use of each incoming value is
@@ -14587,6 +14631,8 @@ Semantics:
       The return value type of :ref:`llvm.get.dynamic.area.offset <int_get_dynamic_area_offset>`
       must match the target's :ref:`alloca address space <alloca_addrspace>` type.
 
+.. _int_prefetch:
+
 '``llvm.prefetch``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -14875,7 +14921,7 @@ Semantics:
 
 This is lowered by contextual profiling. In contextual profiling, functions get,
 from compiler-rt, a pointer to a context object. The context object consists of
-a buffer LLVM can use to perform counter increments (i.e. the lowering of
+a buffer LLVM can use to perform counter increments (i.e., the lowering of
 ``llvm.instrprof.increment[.step]``. The address range following the counter
 buffer, ``<num-counters>`` x ``sizeof(ptr)`` - sized, is expected to contain
 pointers to contexts of functions called from this function ("subcontexts").
@@ -17826,7 +17872,7 @@ Syntax:
 """""""
 
 This is an overloaded intrinsic function. You can use bswap on any
-integer type that is an even number of bytes (i.e. BitWidth % 16 == 0).
+integer type that is an even number of bytes (i.e., BitWidth % 16 == 0).
 
 ::
 
@@ -20701,7 +20747,7 @@ Arguments:
 """"""""""
 
 The first argument is the search vector, the second argument the vector of
-elements we are searching for (i.e. for which we consider a match successful),
+elements we are searching for (i.e., for which we consider a match successful),
 and the third argument is a mask that controls which elements of the first
 argument are active. The first two arguments must be vectors of matching
 integer element types. The first and third arguments and the result type must
@@ -21078,6 +21124,53 @@ Example:
       %c = call i8 @llvm.fptosi.sat.i8.f32(float 999.0)              ; yields i8:  127
       %d = call i8 @llvm.fptosi.sat.i8.f32(float 0xFFF8000000000000) ; yields i8:    0
 
+Floating-Point Conversion Intrinsics
+------------------------------------
+
+This class of intrinsics is designed for floating-point conversions that do
+not fall into other categories. For example conversions with specified rounding
+mode or mini-float conversions.
+
+'``llvm.fptrunc.round``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare <ty2>
+      @llvm.fptrunc.round(<type> <value>, metadata <rounding mode>)
+
+Overview:
+"""""""""
+
+The '``llvm.fptrunc.round``' intrinsic truncates
+:ref:`floating-point <t_floating>` ``value`` to type ``ty2``
+with a specified rounding mode.
+
+Arguments:
+""""""""""
+
+The '``llvm.fptrunc.round``' intrinsic takes a :ref:`floating-point
+<t_floating>` value to cast and a :ref:`floating-point <t_floating>` type
+to cast it to. This argument must be larger in size than the result.
+
+The second argument specifies the rounding mode as described in the constrained
+intrinsics section.
+For this intrinsic, the "round.dynamic" mode is not supported.
+
+Semantics:
+""""""""""
+
+The '``llvm.fptrunc.round``' intrinsic casts a ``value`` from a larger
+:ref:`floating-point <t_floating>` type to a smaller :ref:`floating-point
+<t_floating>` type.
+This intrinsic is assumed to execute in the default :ref:`floating-point
+environment <floatenv>` *except* for the rounding mode.
+This intrinsic is not supported on all targets. Some targets may not support
+all rounding modes.
+
 Convergence Intrinsics
 ----------------------
 
@@ -23031,7 +23124,7 @@ Semantics:
 The '``llvm.vp.reduce.add``' intrinsic performs the integer ``ADD`` reduction
 (:ref:`llvm.vector.reduce.add <int_vector_reduce_add>`) of the vector argument
 ``val`` on each enabled lane, adding it to the scalar ``start_value``. Disabled
-lanes are treated as containing the neutral value ``0`` (i.e. having no effect
+lanes are treated as containing the neutral value ``0`` (i.e., having no effect
 on the reduction operation). If the vector length is zero, the result is equal
 to ``start_value``.
 
@@ -23089,7 +23182,7 @@ The '``llvm.vp.reduce.fadd``' intrinsic performs the floating-point ``ADD``
 reduction (:ref:`llvm.vector.reduce.fadd <int_vector_reduce_fadd>`) of the
 vector argument ``val`` on each enabled lane, adding it to the scalar
 ``start_value``. Disabled lanes are treated as containing the neutral value
-``-0.0`` (i.e. having no effect on the reduction operation). If no lanes are
+``-0.0`` (i.e., having no effect on the reduction operation). If no lanes are
 enabled, the resulting value will be equal to ``start_value``.
 
 To ignore the start value, the neutral value can be used.
@@ -23147,7 +23240,7 @@ Semantics:
 The '``llvm.vp.reduce.mul``' intrinsic performs the integer ``MUL`` reduction
 (:ref:`llvm.vector.reduce.mul <int_vector_reduce_mul>`) of the vector argument ``val``
 on each enabled lane, multiplying it by the scalar ``start_value``. Disabled
-lanes are treated as containing the neutral value ``1`` (i.e. having no effect
+lanes are treated as containing the neutral value ``1`` (i.e., having no effect
 on the reduction operation). If the vector length is zero, the result is the
 start value.
 
@@ -23205,7 +23298,7 @@ The '``llvm.vp.reduce.fmul``' intrinsic performs the floating-point ``MUL``
 reduction (:ref:`llvm.vector.reduce.fmul <int_vector_reduce_fmul>`) of the
 vector argument ``val`` on each enabled lane, multiplying it by the scalar
 `start_value``. Disabled lanes are treated as containing the neutral value
-``1.0`` (i.e. having no effect on the reduction operation). If no lanes are
+``1.0`` (i.e., having no effect on the reduction operation). If no lanes are
 enabled, the resulting value will be equal to the starting value.
 
 To ignore the start value, the neutral value can be used.
@@ -23264,7 +23357,7 @@ The '``llvm.vp.reduce.and``' intrinsic performs the integer ``AND`` reduction
 (:ref:`llvm.vector.reduce.and <int_vector_reduce_and>`) of the vector argument
 ``val`` on each enabled lane, performing an '``and``' of that with with the
 scalar ``start_value``. Disabled lanes are treated as containing the neutral
-value ``UINT_MAX``, or ``-1`` (i.e. having no effect on the reduction
+value ``UINT_MAX``, or ``-1`` (i.e., having no effect on the reduction
 operation). If the vector length is zero, the result is the start value.
 
 To ignore the start value, the neutral value can be used.
@@ -23321,7 +23414,7 @@ The '``llvm.vp.reduce.or``' intrinsic performs the integer ``OR`` reduction
 (:ref:`llvm.vector.reduce.or <int_vector_reduce_or>`) of the vector argument
 ``val`` on each enabled lane, performing an '``or``' of that with the scalar
 ``start_value``. Disabled lanes are treated as containing the neutral value
-``0`` (i.e. having no effect on the reduction operation). If the vector length
+``0`` (i.e., having no effect on the reduction operation). If the vector length
 is zero, the result is the start value.
 
 To ignore the start value, the neutral value can be used.
@@ -23377,7 +23470,7 @@ The '``llvm.vp.reduce.xor``' intrinsic performs the integer ``XOR`` reduction
 (:ref:`llvm.vector.reduce.xor <int_vector_reduce_xor>`) of the vector argument
 ``val`` on each enabled lane, performing an '``xor``' of that with the scalar
 ``start_value``. Disabled lanes are treated as containing the neutral value
-``0`` (i.e. having no effect on the reduction operation). If the vector length
+``0`` (i.e., having no effect on the reduction operation). If the vector length
 is zero, the result is the start value.
 
 To ignore the start value, the neutral value can be used.
@@ -23434,7 +23527,7 @@ The '``llvm.vp.reduce.smax``' intrinsic performs the signed-integer ``MAX``
 reduction (:ref:`llvm.vector.reduce.smax <int_vector_reduce_smax>`) of the
 vector argument ``val`` on each enabled lane, and taking the maximum of that and
 the scalar ``start_value``. Disabled lanes are treated as containing the
-neutral value ``INT_MIN`` (i.e. having no effect on the reduction operation).
+neutral value ``INT_MIN`` (i.e., having no effect on the reduction operation).
 If the vector length is zero, the result is the start value.
 
 To ignore the start value, the neutral value can be used.
@@ -23491,7 +23584,7 @@ The '``llvm.vp.reduce.smin``' intrinsic performs the signed-integer ``MIN``
 reduction (:ref:`llvm.vector.reduce.smin <int_vector_reduce_smin>`) of the
 vector argument ``val`` on each enabled lane, and taking the minimum of that and
 the scalar ``start_value``. Disabled lanes are treated as containing the
-neutral value ``INT_MAX`` (i.e. having no effect on the reduction operation).
+neutral value ``INT_MAX`` (i.e., having no effect on the reduction operation).
 If the vector length is zero, the result is the start value.
 
 To ignore the start value, the neutral value can be used.
@@ -23548,7 +23641,7 @@ The '``llvm.vp.reduce.umax``' intrinsic performs the unsigned-integer ``MAX``
 reduction (:ref:`llvm.vector.reduce.umax <int_vector_reduce_umax>`) of the
 vector argument ``val`` on each enabled lane, and taking the maximum of that and
 the scalar ``start_value``. Disabled lanes are treated as containing the
-neutral value ``0`` (i.e. having no effect on the reduction operation). If the
+neutral value ``0`` (i.e., having no effect on the reduction operation). If the
 vector length is zero, the result is the start value.
 
 To ignore the start value, the neutral value can be used.
@@ -23605,7 +23698,7 @@ The '``llvm.vp.reduce.umin``' intrinsic performs the unsigned-integer ``MIN``
 reduction (:ref:`llvm.vector.reduce.umin <int_vector_reduce_umin>`) of the
 vector argument ``val`` on each enabled lane, taking the minimum of that and the
 scalar ``start_value``. Disabled lanes are treated as containing the neutral
-value ``UINT_MAX``, or ``-1`` (i.e. having no effect on the reduction
+value ``UINT_MAX``, or ``-1`` (i.e., having no effect on the reduction
 operation). If the vector length is zero, the result is the start value.
 
 To ignore the start value, the neutral value can be used.
@@ -23663,7 +23756,7 @@ The '``llvm.vp.reduce.fmax``' intrinsic performs the floating-point ``MAX``
 reduction (:ref:`llvm.vector.reduce.fmax <int_vector_reduce_fmax>`) of the
 vector argument ``val`` on each enabled lane, taking the maximum of that and the
 scalar ``start_value``. Disabled lanes are treated as containing the neutral
-value (i.e. having no effect on the reduction operation). If the vector length
+value (i.e., having no effect on the reduction operation). If the vector length
 is zero, the result is the start value.
 
 The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
@@ -23730,7 +23823,7 @@ The '``llvm.vp.reduce.fmin``' intrinsic performs the floating-point ``MIN``
 reduction (:ref:`llvm.vector.reduce.fmin <int_vector_reduce_fmin>`) of the
 vector argument ``val`` on each enabled lane, taking the minimum of that and the
 scalar ``start_value``. Disabled lanes are treated as containing the neutral
-value (i.e. having no effect on the reduction operation). If the vector length
+value (i.e., having no effect on the reduction operation). If the vector length
 is zero, the result is the start value.
 
 The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
@@ -23797,7 +23890,7 @@ The '``llvm.vp.reduce.fmaximum``' intrinsic performs the floating-point ``MAX``
 reduction (:ref:`llvm.vector.reduce.fmaximum <int_vector_reduce_fmaximum>`) of
 the vector argument ``val`` on each enabled lane, taking the maximum of that and
 the scalar ``start_value``. Disabled lanes are treated as containing the
-neutral value (i.e. having no effect on the reduction operation). If the vector
+neutral value (i.e., having no effect on the reduction operation). If the vector
 length is zero, the result is the start value.
 
 The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
@@ -23867,7 +23960,7 @@ The '``llvm.vp.reduce.fminimum``' intrinsic performs the floating-point ``MIN``
 reduction (:ref:`llvm.vector.reduce.fminimum <int_vector_reduce_fminimum>`) of
 the vector argument ``val`` on each enabled lane, taking the minimum of that and
 the scalar ``start_value``. Disabled lanes are treated as containing the neutral
-value (i.e. having no effect on the reduction operation). If the vector length
+value (i.e., having no effect on the reduction operation). If the vector length
 is zero, the result is the start value.
 
 The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
@@ -23942,8 +24035,7 @@ indexed by ``i``,  and ``%base``, ``%n`` are the two arguments to
 ``llvm.get.active.lane.mask.*``, ``%icmp`` is an integer compare and ``ult``
 the unsigned less-than comparison operator.  Overflow cannot occur in
 ``(%base + i)`` and its comparison against ``%n`` as it is performed in integer
-numbers and not in machine numbers.  If ``%n`` is ``0``, then the result is a
-poison value. The above is equivalent to:
+numbers and not in machine numbers.  The above is equivalent to:
 
 ::
 
@@ -23974,6 +24066,130 @@ Examples:
       %wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)
 
 
+.. _int_loop_dependence_war_mask:
+
+'``llvm.loop.dependence.war.mask.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+      declare <4 x i1> @llvm.loop.dependence.war.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+      declare <8 x i1> @llvm.loop.dependence.war.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+      declare <16 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+      declare <vscale x 16 x i1> @llvm.loop.dependence.war.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+
+
+Overview:
+"""""""""
+
+Given a vector load from %ptrA followed by a vector store to %ptrB, this
+instruction generates a mask where an active lane indicates that the
+write-after-read sequence can be performed safely for that lane, without the
+danger of a write-after-read hazard occurring.
+
+A write-after-read hazard occurs when a write-after-read sequence for a given
+lane in a vector ends up being executed as a read-after-write sequence due to
+the aliasing of pointers.
+
+Arguments:
+""""""""""
+
+The first two arguments are pointers and the last argument is an immediate.
+The result is a vector with the i1 element type.
+
+Semantics:
+""""""""""
+
+``%elementSize`` is the size of the accessed elements in bytes.
+The intrinsic returns ``poison`` if the distance between ``%prtA`` and ``%ptrB``
+is smaller than ``VF * %elementsize`` and either ``%ptrA + VF * %elementSize``
+or ``%ptrB + VF * %elementSize`` wrap.
+The element of the result mask is active when loading from %ptrA then storing to
+%ptrB is safe and doesn't result in a write-after-read hazard, meaning that:
+
+* (ptrB - ptrA) <= 0 (guarantees that all lanes are loaded before any stores), or
+* (ptrB - ptrA) >= elementSize * lane (guarantees that this lane is loaded
+  before the store to the same address)
+
+Examples:
+"""""""""
+
+.. code-block:: llvm
+
+      %loop.dependence.mask = call <4 x i1> @llvm.loop.dependence.war.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4)
+      %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrA, i32 4, <4 x i1> %loop.dependence.mask, <4 x i32> poison)
+      [...]
+      call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrB, i32 4, <4 x i1> %loop.dependence.mask)
+
+.. _int_loop_dependence_raw_mask:
+
+'``llvm.loop.dependence.raw.mask.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+      declare <4 x i1> @llvm.loop.dependence.raw.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+      declare <8 x i1> @llvm.loop.dependence.raw.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+      declare <16 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+      declare <vscale x 16 x i1> @llvm.loop.dependence.raw.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+
+
+Overview:
+"""""""""
+
+Given a vector store to %ptrA followed by a vector load from %ptrB, this
+instruction generates a mask where an active lane indicates that the
+read-after-write sequence can be performed safely for that lane, without a
+read-after-write hazard or a store-to-load forwarding hazard being introduced.
+
+A read-after-write hazard occurs when a read-after-write sequence for a given
+lane in a vector ends up being executed as a write-after-read sequence due to
+the aliasing of pointers.
+
+A store-to-load forwarding hazard occurs when a vector store writes to an
+address that partially overlaps with the address of a subsequent vector load,
+meaning that the vector load can't be performed until the vector store is
+complete.
+
+Arguments:
+""""""""""
+
+The first two arguments are pointers and the last argument is an immediate.
+The result is a vector with the i1 element type.
+
+Semantics:
+""""""""""
+
+``%elementSize`` is the size of the accessed elements in bytes.
+The intrinsic returns ``poison`` if the distance between ``%prtA`` and ``%ptrB``
+is smaller than ``VF * %elementsize`` and either ``%ptrA + VF * %elementSize``
+or ``%ptrB + VF * %elementSize`` wrap.
+The element of the result mask is active when storing to %ptrA then loading from
+%ptrB is safe and doesn't result in aliasing, meaning that:
+
+* abs(ptrB - ptrA) >= elementSize * lane (guarantees that the store of this lane
+  occurs before loading from this address), or
+* ptrA == ptrB (doesn't introduce any new hazards that weren't in the scalar
+  code)
+
+Examples:
+"""""""""
+
+.. code-block:: llvm
+
+      %loop.dependence.mask = call <4 x i1> @llvm.loop.dependence.raw.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4)
+      call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrA, i32 4, <4 x i1> %loop.dependence.mask)
+      [...]
+      %vecB = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrB, i32 4, <4 x i1> %loop.dependence.mask, <4 x i32> poison)
+
 .. _int_experimental_vp_splice:
 
 '``llvm.experimental.vp.splice``' Intrinsic
@@ -25976,7 +26192,7 @@ Semantics:
 The '``llvm.vp.cttz.elts``' intrinsic counts the trailing (least
 significant / lowest-numbered) zero elements in the first argument on each
 enabled lane. If the first argument is all zero and the second argument is true,
-the result is poison. Otherwise, it returns the explicit vector length (i.e. the
+the result is poison. Otherwise, it returns the explicit vector length (i.e., the
 fourth argument).
 
 .. _int_vp_sadd_sat:
@@ -29555,9 +29771,15 @@ Arguments:
 """"""""""
 
 The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
+pointer to or into the ``object``.
+
+The second argument determines whether ``llvm.objectsize`` returns the minimum
+(if true) or maximum (if false) object size. The minimum size may be any size
+smaller than or equal to the actual object size (including 0 if unknown). The
+maximum size may be any size greater than or equal to the actual object size
+(including -1 if unknown).
+
+The third argument controls how ``llvm.objectsize`` acts when ``null``
 in address space 0 is used as its pointer argument. If it's ``false``,
 ``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if
 the ``null`` is in a non-zero address space or if ``true`` is given for the
@@ -30000,7 +30222,7 @@ In words, ``@llvm.experimental.guard`` executes the attached
 ``"deopt"`` continuation if (but **not** only if) its first argument
 is ``false``.  Since the optimizer is allowed to replace the ``undef``
 with an arbitrary value, it can optimize guard to fail "spuriously",
-i.e. without the original condition being false (hence the "not only
+i.e., without the original condition being false (hence the "not only
 if"); and this allows for "check widening" type optimizations.
 
 ``@llvm.experimental.guard`` cannot be invoked.
@@ -31121,42 +31343,3 @@ Semantics:
 The '``llvm.preserve.struct.access.index``' intrinsic produces the same result
 as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.
 
-'``llvm.fptrunc.round``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <ty2>
-      @llvm.fptrunc.round(<type> <value>, metadata <rounding mode>)
-
-Overview:
-"""""""""
-
-The '``llvm.fptrunc.round``' intrinsic truncates
-:ref:`floating-point <t_floating>` ``value`` to type ``ty2``
-with a specified rounding mode.
-
-Arguments:
-""""""""""
-
-The '``llvm.fptrunc.round``' intrinsic takes a :ref:`floating-point
-<t_floating>` value to cast and a :ref:`floating-point <t_floating>` type
-to cast it to. This argument must be larger in size than the result.
-
-The second argument specifies the rounding mode as described in the constrained
-intrinsics section.
-For this intrinsic, the "round.dynamic" mode is not supported.
-
-Semantics:
-""""""""""
-
-The '``llvm.fptrunc.round``' intrinsic casts a ``value`` from a larger
-:ref:`floating-point <t_floating>` type to a smaller :ref:`floating-point
-<t_floating>` type.
-This intrinsic is assumed to execute in the default :ref:`floating-point
-environment <floatenv>` *except* for the rounding mode.
-This intrinsic is not supported on all targets. Some targets may not support
-all rounding modes.
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 629bf2ea5afb..4c8c605edfdd 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -57,6 +57,19 @@ not.
 
 When compiled, the PTX kernel functions are callable by host-side code.
 
+
+Parameter Attributes
+--------------------
+
+``"nvvm.grid_constant"``
+    This attribute may be attached to a ``byval`` parameter of a kernel function
+    to indicate that the parameter should be lowered as a direct reference to
+    the grid-constant memory of the parameter, as opposed to a copy of the
+    parameter in local memory. Writing to a grid-constant parameter is
+    undefined behavior. Unlike a normal ``byval`` parameter, the address of a
+    grid-constant parameter is not unique to a given function invocation but
+    instead is shared by all kernels in the grid.
+
 .. _nvptx_fnattrs:
 
 Function Attributes
@@ -2289,9 +2302,9 @@ The Kernel
   ; Intrinsic to read X component of thread ID
   declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
 
-  define void @kernel(ptr addrspace(1) %A,
-                      ptr addrspace(1) %B,
-                      ptr addrspace(1) %C) {
+  define ptx_kernel void @kernel(ptr addrspace(1) %A,
+                                 ptr addrspace(1) %B,
+                                 ptr addrspace(1) %C) {
   entry:
     ; What is my ID?
     %id = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
@@ -2314,9 +2327,6 @@ The Kernel
     ret void
   }
 
-  !nvvm.annotations = !{!0}
-  !0 = !{ptr @kernel, !"kernel", i32 1}
-
 
 We can use the LLVM ``llc`` tool to directly run the NVPTX code generator:
 
@@ -2442,34 +2452,6 @@ and non-generic address spaces.
 See :ref:`address_spaces` and :ref:`nvptx_intrinsics` for more information.
 
 
-Kernel Metadata
-^^^^^^^^^^^^^^^
-
-In PTX, a function can be either a `kernel` function (callable from the host
-program), or a `device` function (callable only from GPU code). You can think
-of `kernel` functions as entry-points in the GPU program. To mark an LLVM IR
-function as a `kernel` function, we make use of special LLVM metadata. The
-NVPTX back-end will look for a named metadata node called
-``nvvm.annotations``. This named metadata must contain a list of metadata that
-describe the IR. For our purposes, we need to declare a metadata node that
-assigns the "kernel" attribute to the LLVM IR function that should be emitted
-as a PTX `kernel` function. These metadata nodes take the form:
-
-.. code-block:: text
-
-  !{<function ref>, metadata !"kernel", i32 1}
-
-For the previous example, we have:
-
-.. code-block:: llvm
-
-  !nvvm.annotations = !{!0}
-  !0 = !{ptr @kernel, !"kernel", i32 1}
-
-Here, we have a single metadata declaration in ``nvvm.annotations``. This
-metadata annotates our ``@kernel`` function with the ``kernel`` attribute.
-
-
 Running the Kernel
 ------------------
 
@@ -2669,9 +2651,9 @@ Libdevice provides an ``__nv_powf`` function that we will use.
   ; libdevice function
   declare float @__nv_powf(float, float)
 
-  define void @kernel(ptr addrspace(1) %A,
-                      ptr addrspace(1) %B,
-                      ptr addrspace(1) %C) {
+  define ptx_kernel void @kernel(ptr addrspace(1) %A,
+                                 ptr addrspace(1) %B,
+                                 ptr addrspace(1) %C) {
   entry:
     ; What is my ID?
     %id = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
@@ -2694,9 +2676,6 @@ Libdevice provides an ``__nv_powf`` function that we will use.
     ret void
   }
 
-  !nvvm.annotations = !{!0}
-  !0 = !{ptr @kernel, !"kernel", i32 1}
-
 
 To compile this kernel, we perform the following steps:
 
diff --git a/llvm/docs/ProgrammersManual.rst b/llvm/docs/ProgrammersManual.rst
index 1e1e5b3e55b0..0fa91bcfe2d1 100644
--- a/llvm/docs/ProgrammersManual.rst
+++ b/llvm/docs/ProgrammersManual.rst
@@ -1143,8 +1143,8 @@ be passed by value.
 
 .. _DEBUG:
 
-The ``LLVM_DEBUG()`` macro and ``-debug`` option
-------------------------------------------------
+The ``LDBG`` and ``LLVM_DEBUG()`` macros and ``-debug`` option
+--------------------------------------------------------------
 
 Often when working on your pass you will put a bunch of debugging printouts and
 other code into your pass.  After you get it working, you want to remove it, but
@@ -1154,36 +1154,65 @@ Naturally, because of this, you don't want to delete the debug printouts, but
 you don't want them to always be noisy.  A standard compromise is to comment
 them out, allowing you to enable them if you need them in the future.
 
-The ``llvm/Support/Debug.h`` (`doxygen
-<https://llvm.org/doxygen/Debug_8h_source.html>`__) file provides a macro named
-``LLVM_DEBUG()`` that is a much nicer solution to this problem.  Basically, you can
-put arbitrary code into the argument of the ``LLVM_DEBUG`` macro, and it is only
-executed if '``opt``' (or any other tool) is run with the '``-debug``' command
-line argument:
+The ``llvm/Support/DebugLog.h`` file provides a macro named ``LDBG`` that is a
+more convenient way to add debug output to your code. It is a macro that
+provides a raw_ostream that is used to write the debug output.
+
+.. code-block:: c++
+
+  LDBG() << "I am here!";
+
+It'll only print the output if the debug output is enabled.
+It also supports a `level` argument to control the verbosity of the output.
 
 .. code-block:: c++
 
-  LLVM_DEBUG(dbgs() << "I am here!\n");
+  LDBG(2) << "I am here!";
+
+A ``DEBUG_TYPE`` macro should be defined in the file before using ``LDBG()``.
+The file name and line number are automatically added to the output, as well as
+a terminating newline.
 
-Then you can run your pass like this:
+The debug output can be enabled by passing the ``-debug`` command line argument.
 
 .. code-block:: none
 
   $ opt < a.bc > /dev/null -mypass
   <no output>
   $ opt < a.bc > /dev/null -mypass -debug
-  I am here!
+  [my-pass:2] MyPass.cpp:123 I am here!
+
+While `LDBG()` is useful to add debug output to your code, there are cases
+where you may need to guard a block of code with a debug check. The
+``llvm/Support/Debug.h`` (`doxygen
+<https://llvm.org/doxygen/Debug_8h_source.html>`__) file provides a macro named
+``LLVM_DEBUG()`` that offers a solution to this problem.  You can put arbitrary
+code into the argument of the ``LLVM_DEBUG`` macro, and it is only executed if
+'``opt``' (or any other tool) is run with the '``-debug``' command
+line argument.
+
+.. code-block:: c++
+
+  LLVM_DEBUG({
+    llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> logBuffer =
+        llvm::MemoryBuffer::getFile(logFile->first);
+    if (logBuffer && !(*logBuffer)->getBuffer().empty()) {
+      LDBG() << "Output:\n" << (*logBuffer)->getBuffer();
+    }
+  });
 
-Using the ``LLVM_DEBUG()`` macro instead of a home-brewed solution allows you to not
-have to create "yet another" command-line option for the debug output for your
-pass.  Note that ``LLVM_DEBUG()`` macros are disabled for non-asserts builds, so they
-do not cause a performance impact at all (for the same reason, they should also
-not contain side-effects!).
 
-One additional nice thing about the ``LLVM_DEBUG()`` macro is that you can enable or
-disable it directly in gdb.  Just use "``set DebugFlag=0``" or "``set
-DebugFlag=1``" from the gdb if the program is running.  If the program hasn't
-been started yet, you can always just run it with ``-debug``.
+Using these macros instead of a home-brewed solution allows you to not have to
+create "yet another" command-line option for the debug output for your pass.
+Note that ``LDBG()`` and ``LLVM_DEBUG()`` macros are disabled for non-asserts
+builds, so they do not cause a performance impact at all (for the same reason,
+they should also not contain side-effects!).
+
+One additional nice thing about the ``LDBG()`` and ``LLVM_DEBUG()`` macros is
+that you can enable or disable it directly in gdb.  Just use
+"``set DebugFlag=0``" or "``set DebugFlag=1``" from the gdb if the program is
+running.  If the program hasn't been started yet, you can always just run it
+with ``-debug``.
 
 .. _DEBUG_TYPE:
 
@@ -1199,52 +1228,74 @@ follows:
 .. code-block:: c++
 
   #define DEBUG_TYPE "foo"
-  LLVM_DEBUG(dbgs() << "'foo' debug type\n");
-  #undef  DEBUG_TYPE
-  #define DEBUG_TYPE "bar"
-  LLVM_DEBUG(dbgs() << "'bar' debug type\n");
-  #undef  DEBUG_TYPE
+  LDBG(2) << "Hello,";
+  // DEBUG_TYPE can be overridden locally, here with "bar"
+  LDBG("bar", 3) << "'bar' debug type";
+
 
-Then you can run your pass like this:
+A more fine-grained control can be achieved by passing the ``-debug-only``
+command line argument:
 
 .. code-block:: none
 
-  $ opt < a.bc > /dev/null -mypass
-  <no output>
-  $ opt < a.bc > /dev/null -mypass -debug
-  'foo' debug type
-  'bar' debug type
   $ opt < a.bc > /dev/null -mypass -debug-only=foo
-  'foo' debug type
-  $ opt < a.bc > /dev/null -mypass -debug-only=bar
-  'bar' debug type
+  [foo:2] MyPass.cpp:123 Hello,
   $ opt < a.bc > /dev/null -mypass -debug-only=foo,bar
-  'foo' debug type
-  'bar' debug type
-
-Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file,
-to specify the debug type for the entire module. Be careful that you only do
-this after including ``Debug.h`` and not around any #include of headers. Also, you
-should use names more meaningful than "foo" and "bar", because there is no
-system in place to ensure that names do not conflict. If two different modules
-use the same string, they will all be turned on when the name is specified.
+  [foo:2] MyPass.cpp:123 Hello,
+  [bar:3] MyPass.cpp:124 World!
+  $ opt < a.bc > /dev/null -mypass -debug-only=bar
+  [bar:3] MyPass.cpp:124 World!
+
+The debug-only argument is a comma separated list of debug types and levels.
+The level is an optional integer setting the maximum debug level to enable:
+
+.. code-block:: none
+
+  $ opt < a.bc > /dev/null -mypass -debug-only=foo:2,bar:2
+  [foo:2] MyPass.cpp:123 Hello,
+  $ opt < a.bc > /dev/null -mypass -debug-only=foo:1,bar:3
+  [bar:3] MyPass.cpp:124 World!
+
+Instead of opting in specific debug types, the ``-debug-only`` option also
+works to filter out debug output for specific debug types, by omitting the
+level (or setting it to 0):
+
+.. code-block:: none
+
+  $ opt < a.bc > /dev/null -mypass -debug-only=foo:
+  [bar:3] MyPass.cpp:124 World!
+  $ opt < a.bc > /dev/null -mypass -debug-only=bar:0,foo:
+
+
+In practice, you should only set ``DEBUG_TYPE`` at the top of a file, to
+specify the debug type for the entire module. Be careful that you only do
+this after you're done including headers (in particular ``Debug.h``/``DebugLog.h``).
+Also, you should use names more meaningful than "foo" and "bar", because there
+is no system in place to ensure that names do not conflict. If two different
+modules use the same string, they will all be turned on when the name is specified.
 This allows, for example, all debug information for instruction scheduling to be
 enabled with ``-debug-only=InstrSched``, even if the source lives in multiple
 files. The name must not include a comma (,) as that is used to separate the
 arguments of the ``-debug-only`` option.
 
-For performance reasons, -debug-only is not available in optimized build
-(``--enable-optimized``) of LLVM.
+For performance reasons, -debug-only is not available in non-asserts build
+of LLVM.
 
-The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would
-like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement.  It
-takes an additional first parameter, which is the type to use.  For example, the
-preceding example could be written as:
+The ``DEBUG_WITH_TYPE`` macro is an alternative to the ``LLVM_DEBUG()`` macro
+for situations where you would like to set ``DEBUG_TYPE``, but only for one
+specific ``LLVM_DEBUG`` statement.  It takes an additional first parameter,
+which is the type to use. The example from the previous section could be
+written as:
 
 .. code-block:: c++
 
-  DEBUG_WITH_TYPE("foo", dbgs() << "'foo' debug type\n");
-  DEBUG_WITH_TYPE("bar", dbgs() << "'bar' debug type\n");
+  DEBUG_WITH_TYPE("special-type", {
+    llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> logBuffer =
+        llvm::MemoryBuffer::getFile(logFile->first);
+    if (logBuffer && !(*logBuffer)->getBuffer().empty()) {
+      LDBG("special-type") << "Output:\n" << (*logBuffer)->getBuffer();
+    }
+  });
 
 .. _Statistic:
 
@@ -2641,16 +2692,7 @@ with an ``assert``).
 Debugging
 =========
 
-A handful of `GDB pretty printers
-<https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html>`__ are
-provided for some of the core LLVM libraries. To use them, execute the
-following (or add it to your ``~/.gdbinit``)::
-
-  source /path/to/llvm/src/utils/gdb-scripts/prettyprinters.py
-
-It also might be handy to enable the `print pretty
-<http://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_57.html>`__ option to
-avoid data structures being printed as a big block of text.
+See :doc:`Debugging LLVM <DebuggingLLVM>`.
 
 .. _common:
 
diff --git a/llvm/docs/QualGroup.rst b/llvm/docs/QualGroup.rst
index 63520cf401d3..61ef4fd2d4b0 100644
--- a/llvm/docs/QualGroup.rst
+++ b/llvm/docs/QualGroup.rst
@@ -1,3 +1,8 @@
+.. CHANGE TRACKER for reference
+.. Purpose: Fixed document location and added Current Topics & Backlog
+.. Author: Carlos Andres Ramirez
+.. Last updated: 2025-09-08 by Carlos Ramirez
+
 ========================
 LLVM Qualification Group
 ========================
@@ -48,15 +53,40 @@ Participation is open to anyone interested. There are several ways to get involv
 
 We welcome contributors from diverse backgrounds, organizations, and experience levels.
 
-Meeting Minutes
-===============
+Current Topics & Backlog
+========================
+
+Our working group is actively engaged in discussions about the project's
+direction and tackling technical challenges. You can find our current 
+discussions, challenges, and the project backlog in the following 
+document.
+
+`Backlog document <https://docs.google.com/document/d/10YZZ72ba09Ck_OiJaP9C4-7DeUiveaIKTE3IkaSKjzA/edit?usp=sharing>`
+
+This document serves as our central hub for all ongoing topics and will
+be updated regularly to reflect our progress. We welcome your 
+contributions and feedback.
+
+Meeting Materials
+=================
+
+Agendas, meeting notes, and presentation slides for the LLVM Qualification Working Group sync-ups
+are shared to ensure transparency and continuity.
+
+Upcoming and past meeting agendas, and meeting minutes are published in a dedicated thread
+on the LLVM Discourse forum: `Meeting Agendas and Minutes <https://discourse.llvm.org/t/llvm-qualification-wg-sync-ups-meeting-minutes/87148>`_ 
 
-Meeting notes for the LLVM Qualification Working Group are published on the 
-LLVM Discourse forum. These notes provide a summary of topics discussed, 
-decisions made, and next steps. 
+Slides used to support discussions during sync-up meetings are stored in LLVM's GitHub repository.
 
-You can access all minutes here:
-https://discourse.llvm.org/t/llvm-qualification-wg-sync-ups-meeting-minutes/87148
+Available slides:
+
+* `September 2025 <qual-wg/slides/202509_llvm_qual_wg.pdf>`_
+* `August 2025 <qual-wg/slides/202508_llvm_qual_wg.pdf>`_
+* `July 2025 <qual-wg/slides/202507_llvm_qual_wg.pdf>`_
+* (add future entries here)
+
+A future patch will migrate these slide files to the `llvm-www` repository, once
+a suitable hosting location is confirmed with the community.
 
 Contributors
 ============
@@ -65,23 +95,12 @@ The LLVM Qualification Working Group is a collaborative effort involving partici
 from across the LLVM ecosystem. These include community members and industry contributors
 with experience in compiler development, tool qualification, and functional safety.
 
-While contributor names are recorded in the `Meeting Minutes`_ for those who attend 
+While contributor names are recorded in the meeting minutes for those who attend 
 sync-up calls, we also recognize contributions made asynchronously via Discord, GitHub, 
 and other discussion channels.
 
 All forms of constructive participation are valued and acknowledged.
 
-Presentation Slides
-===================
-
-Slides used to support discussions during sync-up meetings are stored in the
- `qual-wg/slides/` directory of the LLVM repository.
-
- Available slides:
-
-* :download:`July 2025 <qual-wg/slides/202507_llvm_qual_wg.pdf>`
-* (add future entries here)
-
 Code of Conduct
 ===============
 
diff --git a/llvm/docs/RISCV/RISCVVectorExtension.rst b/llvm/docs/RISCV/RISCVVectorExtension.rst
index 525b986f98df..6f64ddb4f329 100644
--- a/llvm/docs/RISCV/RISCVVectorExtension.rst
+++ b/llvm/docs/RISCV/RISCVVectorExtension.rst
@@ -298,7 +298,7 @@ Register allocation is split between vector and scalar registers, with vector al
 
 There are four register classes for vectors:
 
-- ``VR`` for vector registers (``v0``, ``v1,``, ..., ``v32``). Used when :math:`\text{LMUL} \leq 1` and mask registers.
+- ``VR`` for vector registers (``v0``, ``v1,``, ..., ``v31``). Used when :math:`\text{LMUL} \leq 1` and mask registers.
 - ``VRM2`` for vector groups of length 2 i.e., :math:`\text{LMUL}=2` (``v0m2``, ``v2m2``, ..., ``v30m2``)
 - ``VRM4`` for vector groups of length 4 i.e., :math:`\text{LMUL}=4` (``v0m4``, ``v4m4``, ..., ``v28m4``)
 - ``VRM8`` for vector groups of length 8 i.e., :math:`\text{LMUL}=8` (``v0m8``, ``v8m8``, ..., ``v24m8``)
diff --git a/llvm/docs/RISCVUsage.rst b/llvm/docs/RISCVUsage.rst
index f9f3e39727a5..d6c7b46485cc 100644
--- a/llvm/docs/RISCVUsage.rst
+++ b/llvm/docs/RISCVUsage.rst
@@ -231,6 +231,7 @@ on support follow.
      ``Zve64x``        Supported
      ``Zve64f``        Supported
      ``Zve64d``        Supported
+     ``Zvfbfa``        Assembly Support
      ``Zvfbfmin``      Supported
      ``Zvfbfwma``      Supported
      ``Zvfh``          Supported
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 3b90c964ac53..16174553ba7f 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -75,6 +75,10 @@ Changes to TableGen
 Changes to Interprocedural Optimizations
 ----------------------------------------
 
+* Added `-enable-machine-outliner={optimistic-pgo,conservative-pgo}` to read
+  profile data to guide the machine outliner
+  ([#154437](https://github.com/llvm/llvm-project/pull/154437)).
+
 Changes to Vectorizers
 ----------------------------------------
 
@@ -119,6 +123,7 @@ Changes to the RISC-V Backend
   and data using mapping symbols such as `$x` and `$d`. Switching architectures
   using `$x` with an architecture string suffix is not yet supported.
 * Ssctr and Smctr extensions are no longer experimental.
+* Add support for Zvfbfa (Additional BF16 vector compute support)
 
 Changes to the WebAssembly Backend
 ----------------------------------
@@ -150,6 +155,9 @@ Changes to the Debug Info
 Changes to the LLVM tools
 ---------------------------------
 
+* `llvm-readelf` now dumps all hex format values in lower-case mode.
+* Some code paths for supporting Python 2.7 in `llvm-lit` have been removed.
+
 Changes to LLDB
 ---------------------------------
 
diff --git a/llvm/docs/qual-wg/slides/202508_llvm_qual_wg.pdf b/llvm/docs/qual-wg/slides/202508_llvm_qual_wg.pdf
new file mode 100644
index 000000000000..356442d5f3b1
--- /dev/null
+++ b/llvm/docs/qual-wg/slides/202508_llvm_qual_wg.pdf
diff --git a/llvm/docs/qual-wg/slides/202509_llvm_qual_wg.pdf b/llvm/docs/qual-wg/slides/202509_llvm_qual_wg.pdf
new file mode 100644
index 000000000000..2804e03398aa
--- /dev/null
+++ b/llvm/docs/qual-wg/slides/202509_llvm_qual_wg.pdf
author	Mingming Liu <mingmingl@google.com>	2025-09-10 15:25:31 -0700
committer	GitHub <noreply@github.com>	2025-09-10 15:25:31 -0700
commit	1417dafa1db9cb1b2b09438aa9f53ea5ab6e36e2 (patch)
tree	57f4b1f313c8cf74eed8819870f39c36ea263c68 /llvm/docs
parent	898b813bc8a6d0276bf0f4769f5f2f64b34e632d (diff)
parent	b8cefcb601ddaa18482555c4ff363c01a270c2fe (diff)