.. _topology:

CPU Topology & Thread Configuration
-----------------------------------------------------------------------------------

TooManyCooks can use the `hwloc <https://www.open-mpi.org/projects/hwloc/>`_ library to query CPU topology and automatically configure executor threads for optimal performance.
This functionality is enabled by defining :literal_ref:`TMC_USE_HWLOC<tmc_use_hwloc>` and linking to libhwloc.
When hwloc is enabled:

* You can query the system topology using ``tmc::topology::query()``. This includes information about cache groups, physical cores, SMT levels, and CPU kinds (P-cores vs E-cores).
* You can call ``.add_partition()`` to restrict executor threads to specific cores, groups, or NUMA nodes
* You can call a 2nd overload of ``.set_thread_init_hook()`` which receives detailed information about the thread, group, and CPU kind that an executor thread runs on.
* :literal_ref:`tmc::ex_cpu<ex_cpu_hwloc>` also gains some additional capabilities, which you can read about :ref:`here<ex_cpu_hwloc>`.

Querying CPU Topology
-----------------------------------------------------------------------------------

Use ``tmc::topology::query()`` to get a snapshot of the system's CPU topology. For full info, try running the `hwloc_topo example <https://github.com/tzcnt/tmc-examples/blob/main/examples/hwloc/topo.cpp>`_.

.. code-block:: cpp

  #define TMC_IMPL
  #include "tmc/all_headers.hpp"

  int main() {
    tmc::topology::cpu_topology topo = tmc::topology::query();
    
    std::cout << "Logical processors: " << topo.pu_count() << std::endl;
    std::cout << "Physical cores: " << topo.core_count() << std::endl;
    std::cout << "Core groups: " << topo.group_count() << std::endl;
    std::cout << "NUMA nodes: " << topo.numa_count() << std::endl;
    std::cout << "Hybrid architecture: " << (topo.is_hybrid() ? "yes" : "no") << std::endl;
  }

.. _cpu_kinds:

CPU Kinds
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

On hybrid architectures (such as Intel hybrid or Apple M-series), cores are classified by their ``cpu_kind``:

.. list-table::
   :widths: 25 75
   :header-rows: 1

   * - cpu_kind
     - Description
   * - ``PERFORMANCE``
     - P-Cores, or regular cores on non-hybrid systems
   * - ``EFFICIENCY1``
     - E-Cores, Compact Cores, or Dense Cores
   * - ``EFFICIENCY2``
     - Low Power E-Cores (e.g. Intel Meteor Lake LP E-cores)
   * - ``ALL``
     - Matches all CPU kinds (convenience value for filtering)

``cpu_kind`` is a flags bitmap, so you can OR together multiple values when constructing a filter:

.. code-block:: cpp

  using tmc::topology::cpu_kind;
  
  // Match both P-cores and E-cores
  size_t kinds = cpu_kind::PERFORMANCE | cpu_kind::EFFICIENCY1;

.. _core_groups:

Core Groups
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

TMC groups cores together based on shared cache and CPU kind.
The topology query exposes a ``core_group`` data structure with the following info:

* ``numa_index`` - Index of the NUMA node this group belongs to
* ``index`` - Unique index among all groups on the machine
* ``core_indexes`` - Indexes of cores in this group (global across all groups)
* ``cpu_kind`` - The CPU kind of all cores in this group
* ``smt_level`` - SMT/hyperthreading level (1 if no SMT, typically 2 for x86 monolithic/P-cores)

Groups are sorted so that Performance cores always come first, followed by Efficiency cores. On multi-NUMA systems with multiple CPU kinds, NUMA node is the major sort dimension.

.. _topology_filter:

Topology Filter
-----------------------------------------------------------------------------------

``topology_filter`` is an input to an executor's ``add_partition()`` function which allows you to specify which physical cores that executor can use.
If multiple ``set_*()`` operations are combined on the same filter, they are exclusive - only cores that match all of the criteria will be used.

.. code-block:: cpp

  tmc::topology::topology_filter filter;
  
  // Use only P-cores
  filter.set_cpu_kinds(tmc::topology::cpu_kind::PERFORMANCE);
  
  // Use only specific NUMA nodes
  filter.set_numa_indexes({0, 1});
  
  // Use only specific core groups
  filter.set_group_indexes({0, 2, 4});
  
  // Use only specific cores
  filter.set_core_indexes({0, 1, 2, 3});

.. _pin_thread:

Pinning External Threads
-----------------------------------------------------------------------------------

Use ``tmc::topology::pin_thread()`` to pin a non-executor thread to specific hardware resources:

.. code-block:: cpp

  tmc::topology::topology_filter numa_filter;
  numa_filter.set_numa_indexes({0});

  auto external_thread = std::thread([&filter](){
    // This thread will only run on NUMA node 0.
    tmc::topology::pin_thread(filter);
  });
  
  // This executor will also only run on NUMA node 0.
  // This prevents cross-NUMA latency between the executor and the external thread.
  tmc::ex_cpu ex;
  ex.add_partition(numa_filter).init();

Apple platforms do not allow thread pinning. Instead, this sets the QoS class based on the CPU kind of the allowed resources.

API Reference
-----------------------------------------------------------------------------------

Topology Types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. doxygenstruct:: tmc::topology::cpu_kind
   :members:

.. doxygenstruct:: tmc::topology::core_group
   :members:

.. doxygenstruct:: tmc::topology::cpu_topology
   :members:

.. doxygenclass:: tmc::topology::topology_filter
   :members:

Topology Functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. doxygenfunction:: tmc::topology::query

.. doxygenfunction:: tmc::topology::pin_thread