CS7290 Advanced Microarchitecture
Fall 2017
|
Reading Papers
Please install the web localizer to access the papers
Modeling
Power
Front-end and Branch Predictors
- [TRACE1] Eric
Rotenberg, Steve Bennett, Jim Smith. Trace Cache: a Low Latency
Approach to High Bandwidth Instruction Fetching. April, 1996
- [BR1] Marius Evers, Sanjay J. Patel, Robert
S. Chappell, and Yale N. Patt. 1998. An analysis of correlation and
predictability: what makes two-level branch predictors work. In
Proceedings of the 25th annual international symposium on Computer
architecture (ISCA '98),
- Jimenez, D.A; Lin, C., "Dynamic branch prediction
with perceptrons,
" High-Performance Computer Architecture,
2001. HPCA. The Seventh International Symposium on , vol., no.,
pp.197,206, 2001
- [RDI] RDIP: Return-address-stack Directed Instruction Prefetching (MICRO13)
- [PIF] Proactive Instruction Fetch (MICRO11)
Scheduler
- [LLS]Eric Borch, Srilatha Manne, Joel Emer, and Eric Tune. 2002. Loose Loops Sink Chips. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA '02). IEEE Computer Society, Washington, DC, USA, 299-.
- [MAC]Ilhyun Kim and Mikko H. Lipasti. 2003. Macro-op Scheduling: Relaxing Scheduling Loop Constraints. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture (MICRO 36). IEEE Computer Society, Washington, DC, USA, 277-.
- [CYL] D. Ernst, A. Hamel and T. Austin, "Cyclone: a broadcast-free dynamic instruction scheduler with selective replay," Computer Architecture, 2003. Proceedings. 30th Annual International Symposium on, 2003, pp. 253-262.
Cache Optimizations
- [UCP] Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. Moinuddin K. Qureshi and Yale N. Patt. MICRO'06.
- [AIP] Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th annual international symposium on Computer architecture (ISCA '07).
- Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. < 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th annual international symposium on Computer architecture (ISCA '10)
- [TAP] Jaekyu Lee; Hyesoon Kim, "TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture," High Performance Computer Architecture (HPCA), 2012
- [ACC]Alaa R. Alameldeen and David A. Wood. 2004. Adaptive Cache Compression for High-Performance Processors. In Proceedings of the 31st annual international symposium on Computer architecture(ISCA '04)
- [COM] Alaa R. Alameldeen and David A. Wood. 2004. Adaptive Cache Compression for High-Performance Processors. In Proceedings of the 31st annual international symposium on Computer architecture (ISCA '04)
- [COM2] Magnus Ekman and Per Stenstrom. 2005. A Robust Main-Memory Compression Scheme. In Proceedings of the 32nd annual international symposium on Computer Architecture (ISCA '05)
Coherence
TLB
- [TLB1] Gokul B. Kandiraju and Anand Sivasubramaniam. 2002. Going the distance for TLB prefetching: an application-driven study. In Proceedings of the 29th annual international symposium on Computer architecture (ISCA '02). IEEE Computer Society, Washington,
- [TLB2] Abhishek Bhattacharjee and Margaret Martonosi. 2010. Inter-core cooperative TLB for chip multiprocessors. In Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS XV).
- [TLB3] Shekhar Srikantaiah and Mahmut Kandemir. 2010. Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '43)
- [TLB4] Abhishek Bhattacharjee, Daniel Lustig, and Margaret Martonosi. 2011. Shared last-level TLBs for chip multiprocessors. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA '11).
- [COLT] Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. CoLT: Coalesced Large-Reach TLBs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45)
- [EVBS] Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient virtual memory for big memory servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13).
Non-conventional architecture
- [DFA1] Dennis and Misunas, “A Preliminary Architecture for a Basic Data Flow Processor,” ISCA 1974.
- [DFA2]Arvind and Nikhil, “Executing a Program on the MIT Tagged-Token Dataflow Architecture,” IEEE TC 1990
- [ATU] Micron Automata Processor
GPU architectures
[BGPU]“Performance analysis and tuning for GPGPUs.” Synthesis Lectures on Computer Architecture, Morgan & Claypool
|