• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/54

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

54 Cards in this Set

  • Front
  • Back

Organization & Architechture

-Same architecture different organization


-architecture is the same for family of products.

ENIAC computers

-Decimal not binary


-Vaccumtubes


-Programmed manually by switches


- no stored programs.

IAS computer

-Designed by Jon con Neumann


-stored programs

Transistors

-Replaced vacuum tubes


-smaller cheaper


-solid state device


-made from silicon.

Stored programs

-Stored in main memory


-Components


ALU


Control unit


Input & Output equipment.


Moore's Law

-transistors will double every 18 - 20 months, but has slown down.

4 - bit processors

4004


4040

8 - bit processors

8008


8080


8085

16 - bit processors

8086


8088


80186


80286

32 - bit processors

80386


80486


80586


Pentium Pro.

IA - 64 chip

Not extension of IA-32

AMD64

-64 bit extension


- IA-32e


- EM64T


- Intel 64

Strategies to improve CPU Preformance

1. Increased hardware speed.


-shrink logic gates.


- increased clock rate.



2. Increase size speed of cache



3. Change processor organization


and architecture.


- Increased effective speed of


execution with some kind of


paralism



Problems with strategy 1


(increasing speed of processor)

-Power-RC delay-Memory latency



Problems with strategy 2

-Benefits from cache are reaching limit (already 3 levels).

Problems with strategy 3

-Internal organization of modern processors are complex and can squeeze a great deal of parallelism.



-Further significant increases likely to be relatively modest.

Multiple Cores

- Doubling number of processors almost doubles performance.


E.g., 1 gate = 1 MIPS


2 gates = 2 MIPS


3 gates = 3 MIPS



- it is better too use, two simpler processors on the chip rather than one more complex processor

GPU

Graphics Processing Unit


- Designed to perform parallel operations on graphics data.



-also known as General Purpose Computing on GPU (GPGPU).

APU

-CPU + GPU = APU


-Accelerated Processor Unit


-uses same bus.


-lower cost and power


consumption.

Limitations of MIPS

-MIPS cannot compare different architectures



-MIPS rate is higher with RISC, but they do the same!-Not useful for different applications


SPEC

-Standard Performance Evaluation Corporation.


-They measure


System speed


System throughput


-Two modes


Base, compiled with default flags .


Aggressive, optimized flags for


target system


SPEC Speed metric

-Results are reported as ratio of reference time to system run time


Tref/Tsut



Trefi : execution time for benchmark i on reference machine.


Tsuti : execution time of benchmark i on test system.



-Overall performance calculated by averaging ratios for all 12 integer benchmarks.


Use geometric mean

Instruction Cycle


Fetch and execute.

Fetch

1. Processor fetches instruction from memory location pointed to by PC


•Program Counter (PC) holds address of next instruction to fetch



2. Fetched instruction is loaded into Instruction Register (IR)



3. Processor Increments PC, unless told otherwise

Execute

Processor interprets instruction in IR and performs required actions.

Interrupts Processing in Detail

1. Suspend execution of current program



2. Save context (the address of next instruction and any other data)



3. Set PC to starting addr. of interrupt handler routine



4. Process interrupt (i.e., execute int. handler routine)



5. Restore context and continue interrupted program at the point of interruption


Different interupts

1.Sequential - Disable Interrupts



2. Nested - Define Priorities

Sequential - Disable Interrupts

Processor will ignore further interrupts whilst processing one interrupt.


Interrupts remain pending and are checked after first interrupt has been processed.



No relative priority or time-critical needs.


Nested - Define Priorities

Each type of interrupt is assigned a priority



Low priority interrupts can be interrupted by higher priority interrupts



When higher priority interrupt has been processed, processor returns to previous interrupt


Memory to CPU

Read


Write


Data


Address

Input/Output to CPU

Internal data


External data

CPU Connection

Instruction


Data


Interupts

Bus Groups

Data


Address


Control

Data bus line

-Carries data


-Width is a key determinant of performance

Address bus

-Identify the source or destination of data.


-Bus width determines maximum memory capacity of system.

Control bus

-Carries command and timing information


-Memory read/write, I/O read/write


-Bus request/grant


-Interrupt request/ACK


-Clock, reset

Point-to-Point Interconnect

-Traditional bus = shared bus


-Worse with multicore chip


QPI

-Quick Path Interconnect


Multiple direct connections.


Layered protocol architecture.


Packetized data transfer.

PCI

Peripheral Component Interconnect



PCI-Express (PCI-E or PCIe)


•NOT PCI-X.

Access methods

1. Sequential


2.Direct


3. Random


4. Associative

Cache mapping algorithms

1. Direct mapped


-Only one place to go in cache



2. Fully associative


-Anywhere to go in cache



3. Set associative


-A memory block is mapped to a set


-Within a set, it can be assigned to anywhere


-If the number of elements in a set is k, it is called k-way

Replacement Algorithms


Associative & Set Associative

-Least Recently Used (LRU)


-Best hit ratio


-LRU is too costly



-First In First Out (FIFO)


replace block that has been in


cache longest



-Least Frequently Used (LFU)


replace block which has had


fewest. hits



-Random


not intuitive? It is shown not bad


Must be implemented in Hardware.


for speed


-used for larger associativity

Write through

- Simplest solution



- All writes go to main memory as well as cache



- Disadvantages


-Lots of traffic. All write traffic must


go to memory.



-Slow, Cache is basically useless for


writing

Write back

-Updates initially made in cache only (no memory write)



-Disadvantages


-Discrepancy exists between cache


and memory for some duration



-Requires complex circuitry, and


potential bottleneck

Cache coherence solutions

-Directory protocols


- Snoopy protocols

Direct protocol

Bad


-Central bottleneck and


communication overhead



Good


-Effective in large-scale systems


that involve multiple buses or


some other complex


interconnection scheme

Snoopy Protocols

-Distribute the responsibility among all of the cache controllers in a multiprocessor



-Suited to bus-based multiprocessor because the shared bus provides a simple means for broadcasting and snooping

Unified Cache

-Advantages of unified cache


Automatic balancing: if one of


them uses more cache, there is


no hard boundary



-Simpler design and implementation

Split cache

Advantages of Split Cache


-Instruction Fetch to Decode to Operand Fetch to Execution



-No contention between instruction fetch and operand fetch in pipelining

Dynamic RAM (DRAM)

-Essentially an analog device



-Simpler construction


Less expensive



-Main memory



- needs refreshing

Static RAM (SRAM)

- Faster/ Cache



- digital device



- no refreshing



- more complex/ more $

DRAM Types - I

- Conventional DRAM



- FPM (Fast page mode) DRAM



- EDO (Extended data-out) DRAM (1995)

DRAM Types - II

- SDRAM (Synchronous) (1997)



-Burst Mode

DRAM Types - III

- SDRAM (more)



-DDR (Double Data Rate) SDRAM



- DDR2 / DDR3 / DDR4 (x2, x4, x8 speed)