CS4290/CS6290 High Performance Computer Architecture
Fall 2010
Reading List
Basic Superscalar architecture
- [MA1]James E. Smith and Gurindar S. Sohi , "The Microarchitecture of Superscalar Processors" ,in Proceedings of the IEEE, December 1995
- [MA2]Subbarao Palacharla, Norman P. Jouppi, J. E. Smith, Complexity-effective superscalar processors, ISCA-24(1997)
- [INT] James E. Smith and Andrew R. Pleszkun, "Implementing Precise Interrupts in Pipelined Processors" IEEE Transactions on Computers, vol. 37, NO. 5 May 1988
- The Microarchitecture of the Pentium 4 Processor, external link: http://www.intel.com/technology/itj/q12001/pdf/art_2.pdf
Instruction Scheduling
Branch predictor and predication
- [BP1] Two Level Branch Predictor paper by Tse-Yu Yeh and Yale Patt in ISCA-19, 1992.
- [BP2] Combining Branch Predictor by Scott McFarling, 1993.
- J. R. Allen, Ken Kennedy, Carrie Porterfield, Joe Warren , "Conversion of control dependence to data dependence," Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, 1983
- Hyesoon Kim, Onur Mutlu, Jared Stark, and Yale N. Patt,
"Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution"
Proceedings of the 38th International Symposium on Microarchitecture (MICRO), pages 43-54, Barcelona, Spain, November 2005
Cache and Memory
- [CAC1] Non-blocking cache external link: http://users.ece.gatech.edu/%7Eleehs/ECE6100/papers/nonblocking-cache.pdf paper by David Kroft, ISCA-08, 1981.
- [CAC2] Norman P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers" external link: http://ieeexplore.ieee.org/iel4/289/3676/00134547.pdf?isnumber=3676&prod=CNF&arnumber=134547&arSt=364&ared=373&arAuthor=Jouppi%2C+N.P.>, ISCA-17, 1990
- [SC]"Shared Memory Consistency Models: A Tutorial", Adve and Gharachorloo, [external link: http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf]
- S. Palacharla, R. E. Kessler, "Evaluating stream buffers as a secondary cache replacement", April 1994 ACM SIGARCH Computer Architecture News, Volume 22 Issue 2
- DirectRambus
- [MEMSCH1]Rixner et al. "Memory access scheduling", ISCA 2000
- [MEMSCH2]Mutlu and Moscibroda, "Parallelism-aware batch scheduling: enhacning both performance and fairness of shared DRAM systems," ISCA 2009
- [PREF]Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32 Issue 2 (June 2000)
- Doug Joseph, Dirk Grunwald, "Prefetching using Markov predictors," June 1997 ISCA '97
- T.-F. Chen and J.-L. Baer. A performance study of software and hardware data
prefetching schemes. In ISCA-21, pages 223Nb232, 1994.
- "A stateless, content-directed data prefetching mechanism," Robert Cooksey, Stephan Jourdan, Dirk Grunwald, ASPLOS 2002.
- Santhosh Srinath, Onur Mutlu, Hyesoon Kim, and Yale N. Patt,
"Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers"
Proceedings of the 13th International Symposium on High-Performance Computer Architecture (HPCA), pages 63-74, Phoenix, AZ,
GPU architecture
- Programming Graphics Hardware
- [TES] NVIDIA Tesla: A Unified Graphics and Computing Architecture, Lindholm, E.; Nickolls, J.; Oberman, S.; Montrym, J.;
Micro, IEEE, Volume 28, Issue 2, March-April 2008 Page(s):39 - 55,
- Sunpyo Hong, Hyesoon Kim, "An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness", Proceedings of the 36th International Symposium on Computer Architecture (ISCA), Austin, TX, June 2009.
Intel Larrabee
- [LRB] Larrabee: a many-core x86 architecture for visual computing, Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, Roger Espasa, Ed Grochowski, Toni Juan, Pat Hanrahan, August 2008,SIGGRAPH '08: SIGGRAPH 2008 papers
Memory schedulers
- Onur Mutlu and Thomas Moscibroda,"Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems" Proceedings of the 35th International Symposium on Computer Architecture (ISCA), pages 63-74, Beijing, China, June 2008.
- C. Isci and M. Martonosi, "Runtime power monitoring in high-end processors: Methodology and
empirical data," in MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on
Microarchitecture, (Washington, DC, USA), p. 93, IEEE Computer Society, 2003.
- K. Skadron, M. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D.Tarjan,
``Temperature-aware microarchitecture,'' in Computer Architecture, 2003.
Proceedings. 30th Annual International Symposium on , June 2003.
- Y.Zhang, D.Parikh, K.Sankaranarayanan, K.Skadron, and M.Stan,
``Hotleakage: A temperature-aware model of subthreshold and gate leakage for
architects,'' tech. rep., University of Virginia, 2003.
Heterogeneous architectures
- R. Kumar, K.I. Farkas, N.P. Jouppi, P. Ranganathan, and D.M. Tullsen, "Single-ISA Heterogeneous Multi-core Architectures: The Potential for Processor Power Reduction," in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36), Dec. 2003
- Boris Grot, Joel Hestness, Steve Keckler, and Onur Mutlu,
"Express Cube Topologies for On-Chip Interconnects"
Proceedings of the 15th International Symposium on High-Performance Computer Architecture (HPCA), pages 163-174, Raleigh, NC,
Cell and Power 5
[CELL1] Introduction to the Cell Multiprocessor
[CELL2] Synergistic Processing in Cell's Multicore Architecture, IEEE Micro, Vol. 26, No. 2, March-April 2006, pp.10-24.
IBM Power5 Chip: A Dual-Core Multithreaded Processor. Ronald N. Kalla, Balaram Sinharoy, and Joel M. Tendler. IEEE Micro. 24(2), 2004