CO460 - High Performance Computing, Jan-May, 2017

Welcome to the CO460 - High Performance Computing course page.

Course Syllabus

Instruction Level Parallelism: Pipelining, Hazards, Compiler techniques for ILP, Branch prediciton, Static and Dynamic Scheduling, Speculation, Limits of ILP. Multicore Memory Hierarchy: Cache tradeoffs, Basic and Advanced optimizations, Virtual Memory, DRAM optimizations. Multiprocessors: Symmetric and Distributed architectures, Cache coherence protocols - Snoopy and Directory based, ISA support for Synchronization, Memory Consistency Models. Interconnection Networks: Architectures, Topologies, Performance, Routing, Flow control, Future of NoCs. VLSI: Transistor Theory. Moore's Law. Delay, Power, Energy, Temperature dependence in integrated circuits.

Reference Materials

Reference Books/Textbooks:

  • [HP5e] John Hennessy and David Patterson. Computer Architecture - A Quantitative Approach. 5ed. Morgan Kaufmann.
  • John P. Shen and Mikko H. Lipasti. Modern Processor Design - Fundamentals of Superscalar Processors. Tata McGraw Hill.
  • William J Dally and Brian Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann. 2004.
  • [SLoCA] Mark Hill/Margaret Martonosi (eds.). Synthesis Lectures on Computer Architecture, Morgan and Claypool, 2006 -- 2016.
  • Important publications in Computer Architecture.

Course Evaluation

Course components: Qtorials - 20%, Programming assignments - 35%, Midsem and Endsem examinations - 45%.

Assignments/Lab Work

Submit input, code, screenshots, in an archive. Email to

Course Schedule

Week Type
1 Lecture Class Zero.
Tutorial Tutorial 0 - Questions
2 Lecture Technology Trends - Moore's Law, Power trends.
Reading: 1. Chapter 1, HP5e.
2. Wang et. al., Orion: A Power-Performance Simulator for Interconnection Networks, MICRO-35, 2002.
3. Hamerly, Perelman, Lau, Calder, Sherwood, Using Machine Learning to Guide Architecture Simulation, J. of Machine Learning Research, 2006
4. Wunderlich, Wenisch, Falsafi, and Hoe, SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling, ISCA, 2002.
3,4 Lecture Pipelining - Data dependences. Exceptions.
Reading 1. Chapter 3, Appendices A, and C. HP5e.
2. Smith and Sohi, The microarchitecture of Superscalar Processors, Proc of IEEE, 1995.
Moshovos and Sohi, Microarchitectural Innovations: Boosting Microprocessor Performance Beyond Semiconductor Technology ScalingProc. of the IEEE, 2001
4. Smith and Pleszkun, Implementing Precise Interrupts in Pipelined Processors, IEEE Trans. on Computers, 1988.
Tutorial Tutorial 1 - Questions
4 Tutorial Tutorial 2 - Questions
5Lecture Pipelining - Control dependences.
Reading 1. Chapter 3 and Appendix C. HP5e.
2. S. McFarling. Combining Branch Predictors, Tech. Note TN-36, DEC WRL, 1993.
3. T.Y. Yeh, and Y.N. Patt. Alternative Implementations of Two-Level Adaptive Brach Prediction., ISCA, 1992.
Tutorial Tutorial 3 - Questions
6,7 Lecture Dynamic Scheduling
8 Midsem Exam Sept 9. 330PM
9 Tutorial Tutorial 4 - Questions
10 Tutorial Tutorial 5 - Branch Predictors. Questions.
11 LectureMemory Hierarchy - Caches.
Reading Appendix B, HP5e.
12 Tutorial Tutorial 6 - Caches. Questions.
13 LectureMemory Hierarchy - Virtual Memory.
Reading Appendix B, HP5e.
LectureMemory Hierarchy - Cache Aware Programming..
14 Lecture Multiprocessors - SMP, Distributed Multiprocessors, Programming models, Snooping and Directory Coherence Protocols
15 Lecture Multiprocessors - Implementation of Locks..
Reading 1. Sections 5.1 - 5.4. HP5e.
2. Sorin, Hill and Wood, A Primer on Memory Consistency and Cache Coherence, SLoCA#12. Chapters 1, 2, 6, 7 and 8.