CO460 - High Performance Computing, Jan-May, 2017

Welcome to the CO460 - High Performance Computing course page.

Course Syllabus

Instruction Level Parallelism: Pipelining, Hazards, Compiler techniques for ILP, Branch prediciton, Static and Dynamic Scheduling, Speculation, Limits of ILP. Multicore Memory Hierarchy: Cache tradeoffs, Basic and Advanced optimizations, Virtual Memory, DRAM optimizations. Multiprocessors: Symmetric and Distributed architectures, Cache coherence protocols - Snoopy and Directory based, ISA support for Synchronization, Memory Consistency Models. Interconnection Networks: Architectures, Topologies, Performance, Routing, Flow control, Future of NoCs. VLSI: Transistor Theory. Moore's Law. Delay, Power, Energy, Temperature dependence in integrated circuits.

Reference Materials

Reference Books/Textbooks:

  • [HP5e] John Hennessy and David Patterson. Computer Architecture - A Quantitative Approach. 5ed. Morgan Kaufmann.
  • John P. Shen and Mikko H. Lipasti. Modern Processor Design - Fundamentals of Superscalar Processors. Tata McGraw Hill.
  • William J Dally and Brian Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann. 2004.
  • [SLoCA] Mark Hill/Margaret Martonosi (eds.). Synthesis Lectures on Computer Architecture, Morgan and Claypool, 2006 -- 2016.
  • Important publications in Computer Architecture.

Course Evaluation

Course components: Qtorials - 20%, Programming assignments - 35%, Midsem and Endsem examinations - 45%.

Assignments/Lab Work

Submit input, code, screenshots, in an archive. Email to co460.nitk@gmail.com.

Course Schedule

Week Type
1 Lecture Class Zero.
Tutorial Tutorial 0 - Questions
2 Lecture Technology Trends - Moore's Law, Power trends.
Reading: 1. Chapter 1, HP5e.
2. Wang et. al., Orion: A Power-Performance Simulator for Interconnection Networks, MICRO-35, 2002.
3. Hamerly, Perelman, Lau, Calder, Sherwood, Using Machine Learning to Guide Architecture Simulation, J. of Machine Learning Research, 2006
4. Wunderlich, Wenisch, Falsafi, and Hoe, SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling, ISCA, 2002.
3,4 Lecture Pipelining - Data dependences. Exceptions.
Reading 1. Chapter 3, Appendices A, and C. HP5e.
2. Smith and Sohi, The microarchitecture of Superscalar Processors, Proc of IEEE, 1995.
3.
Moshovos and Sohi, Microarchitectural Innovations: Boosting Microprocessor Performance Beyond Semiconductor Technology ScalingProc. of the IEEE, 2001
4. Smith and Pleszkun, Implementing Precise Interrupts in Pipelined Processors, IEEE Trans. on Computers, 1988.
Tutorial Tutorial 1 - Questions
4 Tutorial Tutorial 2 - Questions
5Lecture Pipelining - Control dependences.
Reading 1. Chapter 3 and Appendix C. HP5e.
2. S. McFarling. Combining Branch Predictors, Tech. Note TN-36, DEC WRL, 1993.
3. T.Y. Yeh, and Y.N. Patt. Alternative Implementations of Two-Level Adaptive Brach Prediction., ISCA, 1992.
Tutorial Tutorial 3 - Questions
6,7 Lecture Dynamic Scheduling
8 Midsem Exam Sept 9. 330PM
9 Tutorial Tutorial 4 - Questions
10 Tutorial Tutorial 5 - Branch Predictors. Questions.
11 LectureMemory Hierarchy - Caches.
Reading Appendix B, HP5e.
12 Tutorial Tutorial 6 - Caches. Questions.
13 LectureMemory Hierarchy - Virtual Memory.
Reading Appendix B, HP5e.
LectureMemory Hierarchy - Cache Aware Programming..
14 Lecture Multiprocessors - SMP, Distributed Multiprocessors, Programming models, Snooping and Directory Coherence Protocols
15 Lecture Multiprocessors - Implementation of Locks..
Reading 1. Sections 5.1 - 5.4. HP5e.
2. Sorin, Hill and Wood, A Primer on Memory Consistency and Cache Coherence, SLoCA#12. Chapters 1, 2, 6, 7 and 8.

Leased Line IP : Load Balanced