Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
Computer Architecture

Computer Architecture

by brossen7, Jan. 2008

Subjects: architecture computer

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/213

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

213 Cards in this Set

Front
Back

	Dies per wafer =	PI(Wafer diameter/2)^2 / Die area – PI * Wafer diameter / (2*Die area)^1/2
	What are the parts of an ISA?	set of instructions (arg fields, assembly syntax, machine encoding), named storage locations (reg&mem), addressing modes (naming locations), types and sizes of operands, control flow instructions, memory-mapped i/o interface
	Stack code for D = A – (B+C)	Push B Push C Add Push A Sub Pop D
	Accumulator code for D = A – (B+C)	Load B Add C Store X Load A Sub X Store D
	Reg-Reg code for D = A – (B+C)	Load R1, B Load R2, C Add R3, R1, R2 Load R4, A Sub R5, R4, R3 Store R5, D
	Reg-Mem code for D = A – (B+C)	Load R1, B Add R2, R1, C Sub R3, A, R2 Store R3, D
	Big endian little endian	least significant bit is on the left strings appear backwards
	Calling Conventions: Caller saves Callee Saves MIPS	call, caller saves registers that will be needed later, even if callee did not use them inside the call, called procedure saves regs it will overwrite, more efficient if many small procedures some registers caller-saves, some callee saves for optimal performance
	Local vs global optimizations	local is within a basic block, global is across branches
	Pipelining	multiple instructions are overlapped. Takes advantage of parallelism. (ILP)
	5 Basic Pipestages	fetch (get inst from mem), decode (figure out what to do), read operands, execute, write back
	What is pipeline Efficiency	Speedup / num stages
	Throughput	Efficiency/Clock period
	Structural Hazard	Not enough functional units
	Data Hazard	Results of earlier instructions not yet available
	Control hazards	decisions from branches are not yet available so we don't know which instruction to execute
	Precise exception handling	Exceptions are maintained in correct program order i.e., when an exception occurs all instructions before the exception-causing instruction will be allowed to complete and all the ones behind (including this one) will be killed from the pipeline. 10x slower, easier for integer than FP, useful for debug
	Imprecise exception handling	When exceptions are not handled in program order e.g., out-of-order exceptions. Only correct on most common cases, statistical guarantee of correctness through testing
	What 4 properties of the ISA DESIGN can create problems? State the problems with each property.	1) Variable instruction length & runtimes: introduces delays, complicates hazard-detection & precise exceptions 2) Sophisticated addressing modes: post-auto increment complicates hazard-detection, restarting, introduces WAR/WAW hazards 3) Multiple-indirect modes complicate pipeline control and timing 4) Self-modifying code: could overwrite an instruction in the pipe
	5 Steps of scoreboarding	1) Fetch instruction from cache 2) Issue inst to exec (when no struct or WAW hazards), 3) Read ops (when no RAW hazards) 4) Execute, notify scoreboard on completion 5) Write (when no WAR hazards)
	4 Limiting factors of scoreboarding	1) available parallelism among instrs (crossing basic block can help), 2) number of scoreboard inst table entries, 3) number and types of FUs, 4) presence of name dependences (WAR/WAW hazards)
	Pipeline CPI =	ideal CPI + hazard stalls
	Forwarding helps with	data hazard stalls
	Delayed branches & branch scheduling helps with	control hazard stalls
	Dynamic scoreboarding helps with	RAW stalls
	Register renaming helps with	WAR & WAW stalls
	Branch Prediction helps with	control hazard stalls
	Multiple inst per cycle increases	ideal CPI
	Hardware speculation helps with	data and control hazard stalls
	Dynamic memory disambiguation helps with	data hazard mem stalls
	Loop unrolling helps with	control hazard stalls
	Compiler scheduling helps with	data hazard stalls
	Compiler dependence analysis increases what and reduces what	ideal CPI data hazard stalls
	Software pipelining & trace scheduling increases what and reduces what	ideal CPI, data hazard stalls
	Hardware support for compiler speculation increases what and reduces what	ideal CPI, data hazard stalls
	3 Unrolling considerations	1) decreased returns with each unroll, 2) growth in code size, 3) register pressure
	Branch-prediction buffer (branch history table (BHT))	low-order n bits of branch address used to index a table of branch history data. May have collisions between distant branches.
	Problems with BHT in RISC. What can fix it?	in fetch, don’t know if inst is a branch, don’t know target yet, by the time you know this (in ID), you already know if it’s taken (no time savings). BTargetBuffers fix this.
	How can we reduce misprediction frequency?	increase buffer size, use different prediction scheme (correlated)
	How to implement a correlated predictor?	implemented as m-bit shift register
	How can we get perfect prediction? What are the disadvantages?	Take both paths (parallel speculative execution). Large penalty in area, energy, and clock speed.
	5 Advantages of Tomasulo	1) Compiler can’t take care of memory dependences it can’t see. SW 100(R1),R6; LW R7, 36 (R2); 2)gets rid of WAW and WAR through register renaming. (Doesn’t get rid of RAW) 3) distributed: hazard det. & inst issue is done per exec unit (scoreboarding goes through central unit) 4) Data results go straight to where they are needed. 5)Loads/stores have their own exec units.
	4 Types of Tomasulo Unit Components What do they each do?	1) Reservation stations (RSs dynamic renaming registers), 2) issue logic (redirects instrs outputs to reservation station slots, results direct to RSs), 3) Distributed hazard detection (handled separately by each FU), 4) load & store buffers (queue memory access requests).
	3 Key concepts of tomasulo	1) dynamic scheduling, 2) register renaming, 3) dynamic memory disambiguation
	3 Steps in tomasulo	1)Issue: get inst from queue. If RS slot open, send inst, else stall. Send operands to RS if available, else note the names in the RS. Rename the registers. 2)Execute: while operands are not available, monitor CDB. When operands are in RS, start executing. 3)Write results: when result avail & CDB free, write to CDB, then registers, & RS/store slots.
	3 Tomasulo Drawbacks	1) complex (lots of hardware), less important as transistors/die increases (imp for low-power processors and chip multiprocessors – CMPs) 2) difficult to perform associative access to many RS entries at high speed 3) CDB can be a limiting factor (multi CDBs possible, but adds overhead in RS write ports)
	4 Cases when Tomasulo is Most Useful	1) One needs to run binaries for earlier pipeline implementations 2) Code is difficult to schedule statically – many dependences through memory 3) Not enough programmer-visible regs to do static reg renaming 4) There are many FUs available, and WAR and WAW makes scoreboarding bad
	Speculation	Improves ILP by overcoming control dependence. Fetch, issue, exec as if predictions always correct. Data flow execution model. Ability to undo effects. Uses commit update reg and mem. Dynamic scheduling = only fetch and issue. Deals with scheduling different combinations of BBs.
	What is ROB? What is it used for? What info does it store?	Reorder Buffer Passes results from instructions that are currently speculated, extends RSs. Stores: inst type, dest, value, ready. FIFO as issued. Allows undo. Handles exceptions on commit.
	4 Steps in Speculative Tomasulo Algorithm:	1. Issue (dispatch): if RS and ROB slot free, issue inst and send ops & ROB no for dest 2. Exec: when both operands ready, exec. If not ready, watch CDB. Checks RAW. 3. Write result: write to CDB to all waiting FUs & ROB; mark RS available 4. Commit: when instr at head of ROB & result present, update register with result (or store to mem) and remove instr from ROB. Mispredicted branches flush ROB.
	What does speculation do to memory hazards?	1) avoids WAW and WAR hazards b/c updating occurs in-order 2) RAW hazards are maintained by not allowing load to initiate second step of exec if any active ROB store entry has dest that matches the value of the address field of the load. Maintaining prog order for comp of effective address with respect to all earlier stores
	Value prediction	predicts val of load that changes infrequently, only good if value does not change often.
	Fine grain thread switching Advantage: Disadvantage:	alternate thread per inst. Round-robin, skipping stalled threads. Can hide stalls. Slows down exec of individual threads.
	Coarse grain thread switching Advantage: Disadvantages(2):	alternate when a thread is stalled (L2 cache miss). Advantages, doesn’t need very fast switching. Doesn’t slow indiv thread. Disadvantages: losses of shorter stalls, when stall occurs must empty pipe, new thread must fill pipe.
	SMT: simultaneous multithreading Requires:	Large set of virtual regs hold reg sets of threads. Reg renaming gives unique identifiers for multiple threads. Out-of-order completion allows threads to utilize max HW. 1) Large reg file needed. 2) Keeping separate PC and ROB for each thread. 3) Uses fine grained, but can use a preferred thread approach.
	What is SISD? What kind of processors use this?	single instruction stream single data stream – uniprocessors
	What is SIMD? What is it's purpose? What kind of processors use this?	Single instruction multiple data stream – exploits data level parallelism by applying the same op to multiple pieces of data at once. Single instruction mem and control processor which dispatches instructions. Particularly good for graphics.
	What is MISD?	Multiple Instruction Single Data No commercial ones to date
	What is MIMD? What kind of processors use this?	Multiple Instruction Multiple Data thread level parallelism. Each processor fetches its own instructions and operates on its own data. Flexible and generally applicable.
	What is a Commodity cluster? What is it used for?	Uses third-party processors and networking. Web-servers and other applications that require a lot of TLP on separate processes.
	What is a Custom cluster? What is it used for?	Specialized programmer created node designs and/or networking code. For scientific applications and other problems requiring a lot of power for a single problem.
	What is SMP? When does it become less efficient? What kind of memory access does it use?	symmetric shared memory multiprocessor: share a single centralized memory, large caches, possibly with several banks. Uses multiple point to point connections or a switch. Becomes less efficient as the number of processors increases. Employs UMA (uniform memory access), all processors have the same memory latency
	What is a distributed memory multiprocessor? What does kind of memory access does it use?	Multiprocessor with individual nodes containing processor, memory, I/O, and an interconnection interface. Nodes could contain multiple processors. Much more scalable since most memory accesses are local. Reduces the latency for access to the local memory. Data communication between processors can become complex. Uses DSM and NUMA.
	What is DSM?	distributed shared memory. Any mem ref can be made by any processor to any location assuming they have the proper access rights. They have a shared address space.
	What is NUMA?	Non Uniform Memory Access. Access times depend on the location of a word in memory.
	What are the hurdles of parallel programming?	limited parallelism and high cost of communications
	Directory based CCP	the sharing status of a block is kept in the shared directory.
	Coherence defines	behavior of reads and writes to the same memory location
	Consistency defines	behavior of reads and writes with respect to accesses to other memory locations
	Cache coherence protocols (CCP)	hardware implementation that allows migration and replication
	What is Snooping? What does it require?	Every cache that has a copy of the data also has a copy of the sharing status, no centralized state is kept. Requires a broadcast mechanism via bus or switch. Cache controllers snoop the bus to see if they have a copy of the requested block. Uses a pre-existing physical connection (bus). Broadcasting makes it simple, but limits scalability.
	What is write invalidate protocol? What does it require to serialize writes?	Ensures exclusive access before writing to a location. Most common protocol. Requires serialized invalidate access to the bus to serialize writes. If a processor has a dirty copy of the block on snooping invalidate on the bus, the value is supplied in response and causes mem access to be aborted. Existing cache tags and valid bit are used for snooping, add shared state bit. Absence of centralized structure is both main advantage and prevents scalability.
	What is write update (broadcast) protocol? What is it's disadvantage?	Sends writes to all cache lines containing the block. Requires lots of bandwidth.
	What is a true sharing miss?	It arises from the communication of data through the cache coherence mechanism. The word being read is invalid.
	What is a false sharing miss?	Use of an invalidation based coherence mechanism through a single valid bit per cache block. As in, another word in the block is invalid, not the one we’re reading.
	What is directory based cache coherency protocol?	Single location for each block’s information. When shared, one directory has a vector for each word indicating which other processors have a copy of the word.
	Cache Coherency Protocol States: Shared Uncached Modified	One or more processors have the block cached and the val in mem is up to date No processor has a copy of the block cache One processor has a copy of the cache block, and it has written the block, so the value in memory is out of date. The processor is called the owner.
	Nodes in directory based cache coherency protocol Local Home Remote	Node where a request originates Node where the memory location and the directory entry of an address reside. Address space is statically distributed, so the directory for a particular address is always the same. Node that has a copy of the block in cache.
	What are Load linked/store conditional? How are they used?	Assembly commands Used together, if the value is changed after LL but before SC, the SC fails. Used for locking. Can insert reg-reg instructions between and check if they were done atomically.
	Assembly code for a LL/SC spin lock:	lockit: LL R2, 0(R1) BNEZ R2, lockit ;not avail, spin DADDUI R2,R0,#1 ;locked value SC R2,0(R1) ;store BEQZ R2,lockit ;branch if fails
	What is a Data race free program?	A fully synchronized program
	What does sequential consistency require?	Requires that the result of any execution be the same as if the memory accesses within a processor act as if they are executed in order and the accesses among different processors were arbitrarily interleaved. This could be done by requiring that the processor delay all memory accesses until the completion of any invalidations caused by that access. We could also delay the next memory access until the previous one has completed.
	Total store ordering (Processor consistency) relaxes what?	Relaxes W->R consistency but maintains write consistency.
	Weak ordering (release consistency) relaxes what?	Relaxes R->W and R->R consistency.
	What is Strict ordering?	A read to a memory location returns the most recent write
	What is relative speedup?	Comparison of the same program
	What is true speedup?	Comparisons of the best available version of the program for each platform
	What is atomic exchange?	Interchanging a value in register for a value in memory
	How does test and set work?	Tests the value and sets it if the value passes a test for atomic operation
	How does fetch and increment work?	Returns a value of a mem location and increments it.
	What is cache? What principle supports the use of cache?	Originally meant the highest or first level of memory hierarchy once the address leaves the processor. Now applied whenever buffering is employed to reuse commonly occurring items. May be SRAM or fast DRAM. Works on the Principle of Locality (temporal and spatial).
	What is a cache block?	A fixed size collection of data in the cache
	What is the principle of temporal locality?	Words used now are likely to be needed again in the near future.
	What is the principle of spatial locality?	When an memory location is accessed, it is likely that other nearby memory locations will be accessed soon.
	Formula for Cache size	block size * num sets * set associativity
	Block size =	2^num offset bits
	Num sets =	2^num index bits
	#tag bits	#mem address bits – num index bits – num offset bits
	LRU Bits =	set size! permutations, or 4! = 24 For 4 way set associative, or 5 bits
	What is cache Latency?	time to retrieve the first word of the block
	What is cache bandwidth?	Determines time to retrieve a block after the first word is found
	What does in-order-execution do?	blocks/stalls all instructions until data is available
	How does out-of-order-execution work?	instruction using the result of a cache miss must wait, but other instructions can continue
	What is virtual memory?	cache objects residing on disk, broken into fixed sized blocks called pages.
	What is a page fault?	When a processor references an item in a page that is not in cache or main memory. The entire page is moved from disk to main memory.
	CPU execution time =	(CPU clock cycles + Memory stall cycles)*Clock cycle time
	Memory stalls =	IC * Mem accesses per instruction * miss rate * miss penalty
	Where is a block placed in direct mapped cache?	block has only one place it can be in the cache, block address mod num blocks in cache
	Where is a block placed in fully associative cache?	block can be anywhere in the cache
	Where is a block placed in set associative cache?	block can only be in a restrictive set of locations. Block is mapped onto a set, then the block can be anywhere in that set.
	What is Direct Mapped cache?	1-way set associative. All blocks have a specific location using block address MOD number frames (sets) in cache. No LRU replacement can be used, every block just maps to a specific frame. Faster to find a block (faster clock cycle), but more likely to have a miss because of temporal locality.
	What is fully associative cache? What is it's advantage? What is it's disadvantage?	there is only 1 frame set. Meaning, when we need to get a block out, we will have to search the whole cache and find the tag that matches. But we can throw blocks into any empty memory location or the LRU location. Less likely to have a miss Slower to find a block (slower clock cycle)
	What is a cache set?	a group of blocks in the cache, usually chosen by bit selection.
	What is the formula for bit selection?	(Block address) MOD (Number of sets in cache)
	What is the Block offset?	Used to select the desired data from the block. Should not be used in the comparison, since the entire block is either present or not.
	What is the tag field used for?	compared against for a hit
	What is the index field used for?	Selects the set. Not used in comparison since it would be redundant since the index was used to select the set in the first place.
	What is LRU block replacement?	Record when each block is used, replace the one that was used longest ago. Can be expensive.
	What is FIFO block replacement?	Just replace the oldest block. Cheaper than LRU.
	What is a write through cache?	Data is written to both the block in the cache and the lower level memory at the same time. Easier to implement than write back, cache is always clean. Simplifies data coherency.
	What is a write back cache?	Data written to cache only, it is written to main memory when it is replaced. Writes occur at the speed of cache memory. Multiple writes require only one lower level write. Uses less mem bandwidth, good for multiprocessors. Also uses less power.
	What does the dirty bit indicate on a block?	Bit indicating whether the block has been modified while in the cache. If it is clean, there’s no reason to write it back on a miss, since it already exists in the lower memory.
	What is a write stall?	processor waiting for writes to complete during write through
	What is a write buffer?	Reduces write stall delays, allows overlapping of processor execution with memory updating
	What is write allocate?	Block is allocated on a write miss. Write misses act like read misses.
	What is no-write allocate	Write misses do not affect the cache. Block is modified only in lower level memory. Blocks stay out of cache until the processor tries to read the block.
	What is the advantage of separating instruction and data caches? What is the disadvantage?	Adv: can optimize for each, double the bandwidth. Disadv: more complex design, can’t dynamically adjust space
	What is a compulsory miss?	A miss that will occur even if you had an unlimited cache. Such as the very first access to a block.
	What is a capacity miss?	If all the blocks needed during execution of a program cannot be kept in cache, there will be capacity misses. This will result in blocks being discarded and later retrieved.
	What is a conflict miss?	If the cache is not fully associative, blocks will be discarded because another block will be put into the same set.
	What 4 cache optimizations reduce the miss rate?	Larger block size (compulsory) Bigger cache (capacity) Higher associativity (conflict) Compiler optimizations
	What 3 cache optimizations reduce miss penalty?	Multilevel cache Critical word first merging write buffers
	What 5 cache optimizations reduce hit time?	Give priority to reads over writes Avoid address translation during indexing of cache Small and simple caches Way prediction Trace caches
	What 3 cache optimizations increase cache bandwidth?	Pipelined, multibanked, and nonblocking caches
	What can hardware prefetching and compiler prefetching do for caches?	Reduce miss penalty or miss rate via parallelism
	What is way prediction in caches?	use prediction bits in the cache to predict the next block to be accessed (85% accuracy)
	What is hit under miss in caches?	continue to supply cache hits during a miss.
	What is sequential interleaving in caches?	A method for deciding where to put blocks in multibanked caches. Divide the cache into banks, locations being address modulo number banks. That way if we have a request for sequential memory blocks, 90..94, we can serve up as many blocks as we have banks at once.
	What is critical word first in caches?	Request the missed word first from memory and send it to the processor as soon as it arrives. Processor continues execution while filling the rest of the words in the block.
	What is early restart in caches?	Fetch words in normal order, but as soon as the requested word arrives, send it to the processor so the proc can continue.
	What is write merging in caches?	If a write buffer already contains data for an address that is being written, combine that data with the entry.
	Describe the following compiler cache optimizations: Loop interchange Loop fusion Blocking	Make the loop access words sequentially, not skip around Take advantage of row major or column major order Divide the accesses into blocks of size B
	What is SRAM? How many transistors per bit? What happens in standby mode? What does it emphasize?	static ram; don’t need to refresh so the access time is close to the cycle time. Six transistors per bit to prevent info from being disturbed when read. Needs only minimal power to maintain charge in standby mode. Emphasizes speed. 8-16 times faster than DRAM and 8-16 times as expensive.
	What is DRAM? How many transistors per bit? What does it emphasize? How do you refresh the bits? How much of the total time is used refreshing the bits?	dynamic ram; requires data be written back after being read. Requires cycle time be greater than access time so that address lines are stable between accesses. Also requires a refresh. Use a single transistor to store a bit. Reading the bit destroys the info, so it must be restored. To prevent loss when not being read, the bit must be periodically refreshed by reading the row. Emphasizes cost per bit and capacity. 4-8 times the capacity of SRAM. Organized as a rectangular matrix, strobe 1 is RAS, strobe 2 is CAS. Reading the row refreshes all bits in that row. Number of steps in a refresh is the square root of the capacity (rows). Should be less than 5% of total time.
	What is a DIMM?	dual inline memory module; contains 4-16 DRAMs organized to be 8 bytes wide in desktops.
	What is fast page mode?	Repeatedly accessing the row buffer without another row access time.
	What does Synchronous DRAM add?	Added clock signal to interface so that repeated transfers would not bear synchronization overhead.
	What is DDR?	Double data rate. Transfers data on the rising and falling edge of the clock signal. Doubles peak data rate. Activates multiple banks internally.
	PC2100 = DDR266 =	133 MHz X 2 X 8 bytes or 2100 MB/sec 133 MHz DDR chip (transferring on both the rising and falling edge)
	What is a Virtual Machine? Is it safer than a full OS? Why?	An efficient, isolated duplicate of the real machine in complete control of system resources. Safer because it is a smaller code base so that there are less bugs and higher security.
	What is protection via virtual memory? Is it safe? What 4 things does it require from the OS?	Protects processes from each other. Not safe enough because OS may have bugs. 1: two modes – user and OS processes (kernel or supervisor). 2: provide processor state that are read only for user processes. 3: provide mechanisms for processor to change from user to kernel access. 4: provide mechanisms for limiting mem access without swapping the process to disk on context switch. Usually done by adding protection to each page of VMem.
	What is the process space?	a programs living space. The program itself plus any state needed.
	What is a Translation lookaside buffer (TLB)?	Allows avoiding address translation during indexing of cache to reduce hit time. Translates virtual address to physical address to access memory.
	What is a System virtual machine? What are the 2 advantages?	VMs running ISAs that match the hardware. Benefits include managing software (old OSes or Beta OSes), managing hardware (sharing hardware resources).
	What is a VM Monitor (VMM)? What service does it perform?	Software that supports VMs. It presents a software interface to guest software, isolates the state of guests from each other, and protects itself from guest software. Behave as if it were running on native hardware (except performance).
	What is virtualizable hardware?	Hardware that allows VMs to execute directly on hardware.
	What is real/machine memory? What are used to map virtual mem to real mem? what maps real memory to physical memory?	The intermediate level between virtual mem and physical mem. The guest OS's page tables The VMM's page tables
	What is a shadow page table?	It's a page table used by VMM to map directly from guest virtual address space to physical address space, skipping the intermediate real memory.
	What is Paravirtualization?	Allowing small modifications to the guest OS to simplify virtualization. Eg. A guest OS could assume a real memory as large as its virtual memory so that no mem management is required by the guest OS.
	Cache index =	2^index = cache size / (block size * set associativity) = 512 = 2^9 = 9 bit
	What are the three disk metrics?	Bits PI, Tracks PI, Areal Density (BPSqI) = BPI * TPI
	What is the Reliability of N Disks?	Reliability 1 Disk / N
	How much power does a disk take?	Diamater^4.6 * RPM^2.8 * number platters
	What is RAID? Why is it more dependable?	Redundant array of inexpensive disks. Can be more dependable because MTTF is in years and MTTR is in hours. Unless more than one disk fails within the MTTR, it should be fine.
	What is RAID 0?	no redundancy, data may be striped across the disks
	What is RAID 1?	mirroring or shadowing, two copies of every piece of data. May optimize read by reading parts from each disk, but may take longer for writes because both disks must have all data at all times.
	What is RAID 2?	memory-style error correcting code in disks.
	What is RAID 3?	High level interfaces figure out which disk failed. When a failure occurs, you “subtract” the good data from the good blocks, and what remains is the missing data (parity). Data is spread across all disks. Single parity disk.
	What is RAID 4?	Allow each disk to perform independent small reads. Small writes are slower than small reads, but reads have low overhead as RAID 3. Single parity disk.
	What is RAID 5?	Distributes parity info across all disks in the array, removing the bottleneck of raid 4 needing to read/write the same check disk.
	What is RAID 6?	Uses two blocks per stripe of data and row diagonal parity to recover from more than 1 failure at a time. Recover along the diagonal, then the data recovered can be used to recover horizontally.
	What is a storage failure?	When actual behavior deviates from specified behavior
	What is a fault?	The cause of an error.
	What is a latent error?	The error caused when the fault occurs (but not yet encountered)
	What is an effective error?	When a latent error is activated
	What is error latency?	The time between an error occurring (latent error) and when it becomes a failure (effective error)
	What is a hardware fault?	A device failure (eg hit by an alpha particle)
	What is a design fault?	Faults in software (occurs more than in hardware)
	What is an operation fault?	Fault occurring from a mistake made by maintenance personnel
	What is an environmental fault?	Fire, flood, earthquake, sabotage, etc
	What is a transient fault?	Exists for a limited time and does not recur
	What is an intermittent fault?	System oscillates between faulty and fault free
	What is a permanent fault?	Fault that is not corrected over time
	What is the storage response time?	queue + device service time
	What does Linux emphasize in storage? What does Solaris emphasize? What does Windows emphasize?	performance over data availability (auto 1 hr recovery) data availability over performance (auto 10 min recovery) Favors neither (manual 23 min recovery)
	What is Little's law?	mean number of tasks = arrival rate * mean response time
	branch folding	For unconditional branches, when it's basically just a jump. Instead of evaluating the branch, just execute the target. This only works if all of the target instructions can be stored in the BTB. When the branch-target buffer signals a hit and indicates that the branch is unconditional, the pipeline can simply substitute the instruction from the branch- target buffer in place of the instruction that is returned from the cache (which is the unconditional branch). The point of this is to achieve a 0-cycle branch.
	branch target buffer	Buffer that contains the calculated targets of branches for if they are going to be taken. If the prediction is that the branch is taken, use the target (stored) PC to start fetching the next instructions.
	Compulsory miss:	First access of data in a block
	Capacity miss:	Kicked out because the cache is full
	Conflict miss:	Due to set associativity, kick out because something else has taken the spot
	Coherency miss:	Miss because of coherency mechanism (invalidated)
	False sharing miss:	Set invalid because of a line in a shared block was replaced. So this line was set invalid as well.
	True sharing miss:	Coherency miss where another Processor has written the line
	TTS lock using exch	try: li R2,#1 lockit: lw R3,0(R1) ;load var bnez R3,lockit ;not free=>spin exch R2,0(R1) ;atomic exchange bnez R2,try ;already locked?
	LLSC Atomic Swap	try: mov R3,R4 ; mov exchange value ll R2,0(R1) ; load linked sc R3,0(R1) ; store conditional beqz R3,try ; branch store fails (R3 = 0), no store mov R4,R2 ; put load value in R4
	LLSC Fetch and Increment	try: ll R2,0(R1) ; load linked addi R2,R2,#1 ; increment (OK if reg–reg) sc R2,0(R1) ; store conditional beqz R2,try ; branch store fails (R2 = 0)
	Advantages and Disadvantages of the following ISAs	Answer:
	2-bit Predictor Table	2-bit Predictor Table
	Draw the 2-bit predictor diagram	2-bit predictor diagram
	Pieces of FP Tomasulo Diagram: 1) Load buffers 2) FP Operation Queue 3) FP Registers 4) Store Buffers 5) Reservation Stations 6) FP Adders 7) FP Multipliers 8) CDB (Common Data Bus)	Tomasulo Diagram
	kernel tests	small, key pieces of real applications; (better because they are real progs)
	toy programs	100-line programs from beginning programming assignments, such as quicksort
	synthetic benchmarks	fake programs invented to try to match the profile and behavior of real applications, such as Dhrystone
	Normalized arithmetic mean	the average of the execution times, divide by a particular one to normalize
	What is the Geometric mean formula? 2 reasons it is a good measure? Why is it a bad measure?	for i = 1 to n (Mean *= sample[i]); mean = mean^1/n consistent no matter which machine is the Base alleviates the problems from outliers not related to actual execution time, rewards easy enhancements. Reducing 2 to 1 = 200 to 100
	Normalized geometric mean	the geometric mean of the programs normalized to a base machine
	Speedup	1/(fraction enhanced/speedup enhanced + (1-fraction enhanced)) original time / new time (num stages * num instructions) / (num stages + (num instructions – 1))
	CPU time =	Instruction count * CC time * CPI
	MIPS	Inst Count / (ExecTime * 106) = Clock rate/ (CPI * 106) Accurate measure
	Cost of die	Cost of wafer / (Dies per wafer × Die yield)
	Cost of integrated circuit	(Cost of die + Cost of testing die + Cost of packaging and final test) / Final test yield
	Die yield =	Wafer yield * (1 + (Defects per unit area × Die area) / a)-a ) a = 4

Share This Flashcard Set

Set the Language

Related Flashcards

Computer Architecture

Add to Folders

Upgrade to Cram Premium

Card Range To Study

213 Cards in this Set