The “pause” Instruction: Conjecture of the Generalized Micro-architecture for Hyper-Threading Technology

Recently, I was thinking about virtual machine scheduling algorithm. Soon my mind went to intercepting pause-loops, in that pause-loop is indicator of spin-lock. Intercepting it and dispatching the vCPU out from the logical core can save computing resources from stalls so that host threads or other vCPUs can be dispatched onto it. In this regard, I remembered that pause instruction in x86 can hint the processor that the program is executing a pause loop. Both Intel 64 and AMD64 architecture manuals state that this instruction can reduce power consumption induced by the spin loop.

However, I conjecture there are something other than this. In that modern advanced processors have dynamic schedulers, functional units can be utilized at maximum. In certain circumstances, some functional units are rarely invoked: a thread that never execute instructions requiring such functional units. Therefore, the Hyper-Threading Technology is introduced to maximize the utilization. Hyper-Threading Technology divides each single physical core into multiple logical cores. As such, I conjecture that each logical core has its own pipeline and dynamic scheduler, and their physical core has a master scheduler. All, at least most, of the functional units are owned by the physical core. Dynamic scheduler within the logical core at the “issue” stage would request functional unit from the master scheduler of the physical core.

As a possible corollary, the “pause” instruction, as a hint to processor for a spin-lock, can ask the scheduler either to reduce the clock rate of the current logical core or to stall the current logical core for a number of cycles. I think both cases are possible, and they can even work together. In this regard, the spin-lock code in one logical core could occupy less functional units than other logical core(s). Therefore, the master scheduler could allocate more functional units to the other logical core(s). In this regard, we may conclude that the Hyper-Threading Technology could be quite a great improvement of performance for spin-locks by leaving the least computing resources to spin-locks.

To summarize, Hyper-Threading Technology is a hardware approach to maximize Thread-Level Parallelism (TLP) by not increasing the number of physical cores. It is also an approach to reduce the waste of functional units inside a physical core. Some may claim that enabling Hyper-Threading Technology could decrease performance of a single-threaded program. This is not necessarily true. The OS can purposefully schedule “complementary threads” onto the logical cores in the same physical core. The “complementary threads” refer to the threads that would run codes that micro-architecturally require quite different types of functional units (for example, a single-threaded program and a thread waiting for a spin-lock). In this regard, there is minor amount of or even no decrement of performance for a single-threaded program.

Zero's Blog

Blog of Zero's Lab

The “pause” Instruction: Conjecture of the Generalized Micro-architecture for Hyper-Threading Technology

Leave a Reply Cancel reply