Professor Dr. Lucian Vintan,
Computer Science Department, “Lucian Blaga” University of Sibiu, Romania
Donnerstag, 18. Februar 2010
Raum 2045N, MM-Hörsaal
über das Thema:
Improving processor architecture still represents an important challenge in order to exploit fine grain parallelism. Fetch bottleneck and issue (data-flow) bottleneck represent two fundamental limitations in this sense. Accurately predicting dynamic branches, dynamic instruction reuse and instruction value prediction could be efficient solutions to these limitations.
We discovered some very difficult to predict branches, having a “random” dynamical behavior, called unbiased branches. Despite of our efforts and progresses in understanding their behavior, accurately predicting unbiased branches remains an open problem. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches represent a source of important performance penalties.
Our statistics show that about 28% of branches are dependent on critical Load instructions (missing in L1 D-cache). Moreover, 5.61% of branches are unbiased and depend on critical Loads, too. In the same way, about 21% of branches depend on MUL/DIV instructions whereas 3.76% are unbiased and depend on MUL/DIV instructions. These dependences involve high-penalty mispredictions becoming serious performance obstacles and causing significant performance degradation in executing instructions from wrong paths. Therefore, the negative impact of (unbiased) branches over global performance should be seriously attenuated by anticipating the results of these long-latency instructions. On the other hand, hiding instructions’ long latencies represents an important challenge itself.
We developed a superscalar architecture that selectively anticipates the values produced by some high-latency instructions. We were focusing on implementing a dynamic instruction reuse scheme for the MUL/DIV instructions and a last value predictor (LVP) for the critical Load instructions. A LVP is an architectural enhancement which speculates over the results of a Load instruction to speedup the execution of the following instructions. Our selective LVP allows activating the predictor only when a miss occurs in the first level of D-cache. The improved superscalar architecture achieves significant IPC speedups, and an improvement in energy-delay product, too. We also quantified the impact of our developed selective instruction reuse and value prediction techniques in a Simultaneous Multithreaded Architecture (SMT) that implies per thread reuse buffers and load value prediction tables. Our simulation results showed average IPC speedups (SPEC2000) between 5.95% and 16.51%, and EDP gains from 10.44% to 25.94%.
Further we developed a design space exploration of a selective load value prediction scheme suitable for energy aware SMT architectures. We analyze the effectiveness of the selective predictor in terms of overall energy reduction and performance improvement. We have shown that a selective LVP can reduce the overall number of accesses and the energy consumption of the on-chip memory comparing it with a non-selective LVP scheme. Also it creates room for a reduction of the data-cache size by preserving performance, thus enabling a reduction of the system cost. The experimental results have been gathered with a state-of-the-art SMT simulator running the SPEC2000 benchmark suite.
|Hierzu ergeht herzliche Einladung.|