Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
Parallel

Parallel

by idunnomanomg, Mar. 2015

Subjects: Parallel

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/60

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

60 Cards in this Set

Front
Back

	SISD	von Neumann architecture
	MISD	Introduced fault tolerance
	SIMD	Early parallelism Vector processing Takes advantage of data parallelism Synchronous Arch. for GPUs
	Speedup	SU(p) = T1/Tp Measures how much we're speeding up program execution with parallelization
	Linear speedup	SU(p) = p Very unlikely, 'perfect' scalability
	Efficiency	Ef(p) = SU(p)/p Measures how well we're utilizing hardware resources
	How do we represent a computation?	Directed Acyclic Graph (DAG)
	Amdahl's Law	Speedup will never be perfect There will always be part of a program that must be serial Maximum expected improvement based on the # of processor T1 = W/r Tp = (fW/r) + ((1 - f)W/pr)
	True Error	True Error = True Value - Approximate Value
	Relative True Error	Relative True Error = True Error / True Value
	Approximate Error	Approximate Error = Present Approximation - Previous Approximation
	Relative Approximate Error	Relative Approximate Error = Approximate Error / Present Approximation
	Round Off Error	Caused by the computers ability to only computer a certain precision
	Truncation Error	Caused by the programmer approximating a number or truncating a result
	Fosters Methodology	Process to turn a serial program into a parallel one Partitioning - Divide serial program into small tasks Communication - Identify communication needs between tasks Aggregation - Combine tasks where it makes sense, minimize communication Mapping - Map tasks to process/threads and share the load Iterative process until we find the best design
	Gustafson-Barsis Law	Serial time stays fixed, parallel time scales inversley to n Scaled Speedup
	MPI_Init	Tells MPI to do necessary speedup, called before other MPI functions Should only be called once
	MPI_Finalize	Cleans up resources allocated for the program, called after all MPI functions are called Should only be called once
	MPI_COMM_WORLD	MPI communicator, collection of processes that can send messages to each other
	MPI_Comm_size	MPI_Comm_size(MPI_COMM_WORLD, num_of_processes) Used to get the number of processes our program is running
	MPI_Comm_rank	MPI_Comm_rank(MPI_COMM_WORLD, my_rank) Gets the rank of the process and sticks it in my_rank
	MPI_Send	MPI_send(message_buffer, message_size, message_type, destination, tag, communicator)
	MPI_Recv	MPI_Recv(message)buffer, message_size, message_type, source, tag, communicator, status)
	MPI_Send & MPI_Recv message matching	Tags & communicators much match on MPI_Send/MPI_Recv MPI_Recv always blocks until a matching message is received
	What does OpenMPI stand for?	Open Message Passing Interface
	Scalable Parallel System	Efficiency can be kept constant as the number or processors increases
	Strongly Scalable System	Keep efficiency fixed, add processors/threads without increasing the problem size
	Weakly Scalable System	Keep efficiency fixed, increase problem size as we increase the number of processors/threads
	IEE Floating Point Standard	32 bits 31st bit is the sign bit 30-23bits are the exponent, exponent affects range 22-0bits is the mantissa, affects precision
	Given an OpenMPI program running 4 processes, draw a a tree structured summation diagram.	0 1 2 3 \| / \| / \| .. / \| ./
	Point to Point Communications	MPI_Send & MPI_Recv Matched on tags and communicators
	Collective Communication	MPI_Reduce, MPI_Allreduce, MPI_Bcast, MPI_Scatter, MPI_Gather, MPI_Allgather, MPI_Barrier All processes call same collective function or else processes will crash or hang No tags, matched on communicators + order in which they're called Arguments passed must be "compatible", or else more hanging and crashing
	Denormalized	Zero exponent on floating point value All Zeros or all ones denote special values Zero denormalized infinity NaN
	Round Off Error	Determined by number of bits in mantissa
	Loss of Precision	Adding two floating point numbers, must shift to align the radix Number with the smaller exponent is denormalized to match larger exponent
	Add these two numbers, floating point style, show loss of precision: 1.250 x 10E2 2.000 x 10E-2	1.250\|0x10E2 0.000\|2x10E2 125.0 + 0.02 = 125.0
	Machine Epsilon	Defined as measure of accuracy Found by difference between 1 and the next number that can be represented
	32 Bits for single Precission	Value = -1^(signbit) * (1.mantissa) * 2^(exponent-127)
	Positive Overflow	Trying to represent a number larger than the maximum positive representable value
	Negative overflow	Trying to represent a number larger than the maximum negative representable value
	Positive Underflow	Trying to represent a number smaller than the minimum positive representable value
	Negative Underflow	Trying to represent a number smaller than the minimum negative representable value
	Add these two numbers, floating point style, show loss of precision: 125.68 125.31	1.256 x 10E2 1.253 x 10E2 0.003 x 10E2 0.3
	Block Partitioning	Assign blocks of consecutive components to each process
	Cyclic Partitioning	Assign components in a round robin fashion
	Block-Cyclic Partitioning	Use cyclic distribution of blocks of components
	Partition the following array in a block manner, cynclic manner, and block-cyclic manner across 2 processors. For block-cyclic use a block size of 2 array = [21, 22, 23, 24]	Block partitioning 0 \| 21, 22 1 \| 23, 24 Cyclic Partitioning 0 \| 21, 23 1 \| 22, 24 Block-Cyclic Partitioning 0 \| 21, 22 1 \| 23, 24
	MPI_Scatter	Used to distribute data to processes from an array or vector Divides data into pieces First piece goes to process 0 Preforms a BLOCK DISTROBUTION
	MPI_Gather	Collects array data from different processes Receives from a block distribution
	MPI_Allgather	Both gathers and distributes data (butterfly structure)
	Asynch Communication	Non-blocking All MPI Communication calls have an asynchronous counter-part
	Preformance evaluation	Elapsed Time (Wall-block time), not CPU or System time We want idle process time included if it's part of the parallel overhead No I/O
	How do we get timing information with OpenMPI?	MPI_Wtime - returns elapsed time in seconds
	Give good and bad examples of data distribution.	BAD - loop 10000 times calling MPI_Send with one element at a time GOOD - call MPI_Send once with all 10000 elements
	What is BLAS?	BLAS stands for Basic Linear Algebra Subprograms It is a standardized library of linear algebra routines
	Give the different BLAS level standards	Level 1 = vector to vector routines Level 2 = Matrix to vector routines Level 3 = Matrix to Matrix routines
	What is Odd Even Transposition Sort? List out what odd and even phases look like with an array.	Variant of bubble sort, parallelized Even phase: (a[0], a[1]), (a[2], a[3]) Odd phase: (a[1], a[2]), (a[3], a[4]) Assign a block of keys to each process Sort keys Have two processes exchange all keys We might hang in deadlock
	IJK forms	Inner product model: ijk and jik Middle product model: ijk and jki Outer product model: kij and kji
	SPMD	Single program multiple data A single program contains all the code for all task can implement task and data parallelism
	Load balancing	Each process and thread gets the same amount of work No execution unit should be idle Synchronization & communication

Share This Flashcard Set

Set the Language

Related Flashcards

Parallel

Add to Folders

Upgrade to Cram Premium

Card Range To Study

60 Cards in this Set