Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
ECE454 Midterm

Ece454 Midterm

by kirkmax, Oct. 2014

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/50

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

50 Cards in this Set

Front
Back

	What drives CPU speed?	- Faster clock cycle (usually through increased transistor density) - Smaller Cycles Per Instruction (CPI)
	What is a control hazard?	When the processor must stall (can't fetch next instruction) while waiting for the result of a branch instruction.
	What is the CPI due to pipelining?	Ideally, 1.
	Why is branch misprediction a problem? (4)	- Branch instructions are 15-25% of all instructions. - Object-oriented programming has more indirect branches harder to predict by the compiler. - Processors with deeper pipelines suffer larger penalties. - Processors using superscalar instructions must flush and refetch more instructions.
	In general, how does branch prediction work?	Uses history to predict the future. More specifically, a global predictor uses a global history vector to index into a local history vector.
	What is the CPI due to branch prediction?	Ideally, 1 (down from >1 in non-ideal case of pipelining).
	What is a superscalar architecture?	A superscalar architecture executes multiple instructions at the same time on a single processor core (instruction-level parallelism) by using the multiple functional units on the core.
	What is the CPI due to a superscalar architecture?	<1
	What is out-of-order execution?	Re-ordering instructions without control/data hazards to reduce stalls from long instruction cycles.
	What is the CPI due to out-of-order execution?	<<1
	What are some challenges of out-of-order execution? (2)	- Detecting the dependency is hard. - You might introduce new hazards.
	What is deep pipelining?	Simply a deeper (more stages) pipeline. Since more operations happen, circuitry simplifies and clock speed can be increased, decreasing cycle time (per stage).
	What is Simultaneous Multithreading (SMT) (a.k.a. Hyperthreading)?	SMT is when a single core executes multiple instructions from multiple threads at the same time.
	What is the CPI due to SMT?	<<<1
	What's are 3 advantages and 1 disadvantage of 64-bit vs. 32-bit architectures?	Advantages: - Larger address space for large databases, etc. - Near impossible register overflow. - Better performance for precise digit math. Disadvantage: - Code size increases (instruction size increases).
	Why measure a program? (3) Why measure a computer? (1)	Program: - To optimize a program, find the bottleneck. - To compare implementations. - To find a bug (why so slow?). Computer - To compare which is better/faster.
	What are 3 bad ways of comparing processors and why?	- Clock frequency: IPC for each could be very different. - CPI or IPC: Depends on the instruction set used and the compiler. - FLOPS: Depends on the instruction set used and only matters if you use floating point heavily.
	How do you measure a processor?	Use wall-clock time, time = IC * CPI * ClockPeriod
	What is Amdahl's Law?	The expected improvement after only optimizing a portion of a program. Simply put, Speed-Up = Old-Time / New-Time = 1/(1 - f + f/s) f = Normal time for portion being optimized s = Speed-up of portion due to optimization
	How is Amdahl's Law useful?	It allows you to optimize the common case. However, note that the common case may change after one round of optimization.
	What are 3 tools for profiling?	- Software timers (C library/OS-level timers) - Hardware timers and performance counters - Instrumentation (gprof/gcov)
	What information does /usr/bin/time give?	The wall-clock time (real), the time spent in user mode (user), and the time spent in kernel mode (sys). Note that the user time is the sum of time taken on all cores (so it could be larger than wall-clock).
	What information does the x86 rdtsc instruction give? How can Linux programs use it?	rdtsc (read time stamp counter) returns the current value of the core's cycle count. It can be accessed via the get_tsc() function. Note however that multiple calls to get_tsc() may return different core's counters altogether because the program started on one core and was scheduled to another later.
	Give 2 examples of software timers.	/usr/bin/time C library time (sys/times.h)
	What are some disadvantages of hardware performance counters?	- Require OS support. - Can overflow. - Must be sampled carefully.
	What is one (new) tool to easily access hardware performance counters?	perf
	What are some disadvantages of instrumentation?	The observer effect - You can't measure the system without disturbing it. Instrumentation code can slow down execution.
	How does gprof work?	- Adds code to all function calls to track caller and number of times called; Uses this to create a call-graph with number of times called. - FIXME:Adds an interrupt handler which interrupts program every ~10ms; On interrupt, checks current user frame to determine running function and uses this to approximate how long function ran.
	What are some limitations of gprof?	- Periodic interrupts to measure timing is relatively inaccurate by nature. - gprof measures time programming is running; So blocking calls when program is waiting for I/O won't be measured. - Uses user frame to determine currently running function and to determine parent; Because of this, it can't work with 3rd party binaries (note that "-pg" will use "-pg" C libraries).
	How does gcov work?	- Adds code to every basic block to count number of times each basic block is entered (and thus each line is executed). - Can be used to determine important loops. - Can be used to determine if code is executed at all.
	What are some limitations of gcov?	- It provides no timing information. - For complex conditions, it provides no information on which sub-expression evaluated to true.
	How does valgrind work?	- It is a virtual machine that does just-in-time (JIT) compilation. - Adds instrumentation dynamically. - Uses this instrumentation to determine memory issues like leaks or overruns and where in the code they occurred.
	What are some disadvantages of valgrind?	- Emulation makes profiled program 4-5x slower. - (Kirk-special) Doesn't detect stack overruns.
	What are the goals of a compiler? (5)	- Correctness - Support for debugging - Compile for performance - Compilation is fast - Small code size
	What is a basic block?	A group of consecutive instructions with a single entry point and a single exit point.
	What are the 3 requirements of performance optimization?	- Preserve correctness. - Improve average performance. - Be "worth the effort."
	What is a good formula for execution time and what does it imply about the focus of optimizations?	Execution time = #instructions * CPI * time/cycle Thus, focus on - Decreasing CPI (avoid hazards, improve cache/memory behaviour) - Decreasing #instructions (use special/new instructions)
	What is the role of an optimizing compiler?	To provide an efficient mapping of the program to the machine. - Eliminate minor inefficiencies - Code selection and ordering - Register allocation
	What are 3 limitations of optimizing compilers?	- They are constrained in an effort to maintain program behaviour. - Most analysis is only within procedures (inter-procedural is too expensive). - Most analysis is based on static information (run-time inputs are difficult for compiler to predict). When in doubt, the compiler must be conservative.
	What is the role of a programmer with an optimizing compiler?	- Don't smash code into oblivion. - Select the best algorithm. - Write readable, maintainable code. - Eliminate optimization blockers. - Focus on inner loops.
	What are 6 machine independent compiler optimizations?	- Constant propagation - Constant folding - Common subexpression elimination (CSE) - Dead code elimination (DCE) - Loop invariant code motion (LICM) - Function inlining
	What are 3 machine dependent compiler optimizations?	- Instruction scheduling - Loop unrolling - Parallel unrolling
	What are the 3 types of cache misses?	- Cold (compulsory) miss - Miss due to block not in cache. - Conflict miss - Miss because two blocks map to same cache slot. - Capacity miss - Number of active blocks is larger than cache.
	Why do caches work?	- Temporal locality - Spatial locality
	What 6 things are necessary for prefetching to be effective?	- There is spare memory before prefetching - Prefetches are accurate - Prefetches are timely - Prefetched data doesn't replace in-use data - Speed gains must outweigh prefetch cost
	Are conflict misses a concern nowadays?	No, because L1 & L2 caches have high associativity.
	What is your main concern when working with caches?	Keep the working set within on-chip cache capacity (focus on L1/L2 as necessary).
	What is your main concern when working with virtual memory?	- Keep working set within main memory. - May also want to keep # working set pages < # TLB entries
	What is your main concern when working with prefetching?	- Try to arrange your data structures and access patterns to favour sequential/strided access. - If not possible or if ambitious, try compiler or manually-inserted prefetch instructions.
	Why worry about alignment?	Hardware is partitioned into blocks. If we don't align, our requests may span multiple blocks wasting time.

Share This Flashcard Set

Set the Language

Ece454 Midterm

Add to Folders

Upgrade to Cram Premium

Card Range To Study

50 Cards in this Set