Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
54 Cards in this Set
- Front
- Back
Organization & Architechture |
-Same architecture different organization -architecture is the same for family of products. |
|
ENIAC computers |
-Decimal not binary -Vaccumtubes -Programmed manually by switches - no stored programs. |
|
IAS computer |
-Designed by Jon con Neumann -stored programs |
|
Transistors |
-Replaced vacuum tubes -smaller cheaper -solid state device -made from silicon. |
|
Stored programs |
-Stored in main memory -Components ALU Control unit Input & Output equipment. |
|
Moore's Law |
-transistors will double every 18 - 20 months, but has slown down. |
|
4 - bit processors |
4004 4040 |
|
8 - bit processors |
8008 8080 8085 |
|
16 - bit processors |
8086 8088 80186 80286 |
|
32 - bit processors |
80386 80486 80586 Pentium Pro. |
|
IA - 64 chip |
Not extension of IA-32 |
|
AMD64 |
-64 bit extension - IA-32e - EM64T - Intel 64 |
|
Strategies to improve CPU Preformance |
1. Increased hardware speed. -shrink logic gates. - increased clock rate. 2. Increase size speed of cache 3. Change processor organization and architecture. - Increased effective speed of execution with some kind of paralism |
|
Problems with strategy 1 (increasing speed of processor) |
-Power-RC delay-Memory latency |
|
Problems with strategy 2 |
-Benefits from cache are reaching limit (already 3 levels). |
|
Problems with strategy 3 |
-Internal organization of modern processors are complex and can squeeze a great deal of parallelism. -Further significant increases likely to be relatively modest. |
|
Multiple Cores |
- Doubling number of processors almost doubles performance. E.g., 1 gate = 1 MIPS 2 gates = 2 MIPS 3 gates = 3 MIPS - it is better too use, two simpler processors on the chip rather than one more complex processor |
|
GPU |
Graphics Processing Unit - Designed to perform parallel operations on graphics data. -also known as General Purpose Computing on GPU (GPGPU). |
|
APU |
-CPU + GPU = APU -Accelerated Processor Unit -uses same bus. -lower cost and power consumption. |
|
Limitations of MIPS |
-MIPS cannot compare different architectures -MIPS rate is higher with RISC, but they do the same!-Not useful for different applications
|
|
SPEC |
-Standard Performance Evaluation Corporation. -They measure System speed System throughput -Two modes Base, compiled with default flags . Aggressive, optimized flags for target system
|
|
SPEC Speed metric |
-Results are reported as ratio of reference time to system run time Tref/Tsut
Trefi : execution time for benchmark i on reference machine. Tsuti : execution time of benchmark i on test system.
-Overall performance calculated by averaging ratios for all 12 integer benchmarks. Use geometric mean |
|
Instruction Cycle |
Fetch and execute. |
|
Fetch |
1. Processor fetches instruction from memory location pointed to by PC •Program Counter (PC) holds address of next instruction to fetch 2. Fetched instruction is loaded into Instruction Register (IR) 3. Processor Increments PC, unless told otherwise |
|
Execute |
Processor interprets instruction in IR and performs required actions. |
|
Interrupts Processing in Detail |
1. Suspend execution of current program 2. Save context (the address of next instruction and any other data) 3. Set PC to starting addr. of interrupt handler routine 4. Process interrupt (i.e., execute int. handler routine) 5. Restore context and continue interrupted program at the point of interruption
|
|
Different interupts |
1.Sequential - Disable Interrupts 2. Nested - Define Priorities |
|
Sequential - Disable Interrupts |
Processor will ignore further interrupts whilst processing one interrupt. Interrupts remain pending and are checked after first interrupt has been processed.
No relative priority or time-critical needs.
|
|
Nested - Define Priorities |
Each type of interrupt is assigned a priority
Low priority interrupts can be interrupted by higher priority interrupts When higher priority interrupt has been processed, processor returns to previous interrupt
|
|
Memory to CPU |
Read Write Data Address |
|
Input/Output to CPU |
Internal data External data |
|
CPU Connection |
Instruction Data Interupts |
|
Bus Groups |
Data Address Control |
|
Data bus line |
-Carries data -Width is a key determinant of performance |
|
Address bus |
-Identify the source or destination of data. -Bus width determines maximum memory capacity of system. |
|
Control bus |
-Carries command and timing information -Memory read/write, I/O read/write -Bus request/grant -Interrupt request/ACK -Clock, reset |
|
Point-to-Point Interconnect |
-Traditional bus = shared bus -Worse with multicore chip |
|
QPI |
-Quick Path Interconnect Multiple direct connections. Layered protocol architecture. Packetized data transfer. |
|
PCI |
Peripheral Component Interconnect
PCI-Express (PCI-E or PCIe) •NOT PCI-X. |
|
Access methods |
1. Sequential 2.Direct 3. Random 4. Associative |
|
Cache mapping algorithms |
1. Direct mapped -Only one place to go in cache
2. Fully associative -Anywhere to go in cache
3. Set associative -A memory block is mapped to a set -Within a set, it can be assigned to anywhere -If the number of elements in a set is k, it is called k-way |
|
Replacement Algorithms Associative & Set Associative |
-Least Recently Used (LRU) -Best hit ratio -LRU is too costly
-First In First Out (FIFO) replace block that has been in cache longest
-Least Frequently Used (LFU) replace block which has had fewest. hits
-Random not intuitive? It is shown not bad Must be implemented in Hardware. for speed -used for larger associativity |
|
Write through |
- Simplest solution
- All writes go to main memory as well as cache
- Disadvantages -Lots of traffic. All write traffic must go to memory. -Slow, Cache is basically useless for writing |
|
Write back |
-Updates initially made in cache only (no memory write)
-Disadvantages -Discrepancy exists between cache and memory for some duration -Requires complex circuitry, and potential bottleneck |
|
Cache coherence solutions |
-Directory protocols - Snoopy protocols |
|
Direct protocol |
Bad -Central bottleneck and communication overhead Good -Effective in large-scale systems that involve multiple buses or some other complex interconnection scheme |
|
Snoopy Protocols |
-Distribute the responsibility among all of the cache controllers in a multiprocessor -Suited to bus-based multiprocessor because the shared bus provides a simple means for broadcasting and snooping |
|
Unified Cache |
-Advantages of unified cache Automatic balancing: if one of them uses more cache, there is no hard boundary
-Simpler design and implementation |
|
Split cache |
Advantages of Split Cache -Instruction Fetch to Decode to Operand Fetch to Execution -No contention between instruction fetch and operand fetch in pipelining |
|
Dynamic RAM (DRAM) |
-Essentially an analog device
-Simpler construction Less expensive
-Main memory - needs refreshing |
|
Static RAM (SRAM) |
- Faster/ Cache - digital device - no refreshing - more complex/ more $ |
|
DRAM Types - I |
- Conventional DRAM - FPM (Fast page mode) DRAM - EDO (Extended data-out) DRAM (1995) |
|
DRAM Types - II |
- SDRAM (Synchronous) (1997) -Burst Mode |
|
DRAM Types - III |
- SDRAM (more) -DDR (Double Data Rate) SDRAM - DDR2 / DDR3 / DDR4 (x2, x4, x8 speed) |