You are on page 1of 14

Database Management Systems

Unit 4 – Parallel DBMS

George Klington.A 1
What is a Parallel Database?

A database system which seeks to improve


performance by parallelizing its operations, such
as loading data, building indexes, and evaluating
queries.

George Klington.A 2
Parallel Database Management Systems

 Why parallel DBMS?

 Architecture

 Parallelism

– Intraquery parallelism
– Interquery parallelism
– Intraoperation parallelism
– Interoperation parallelism

George Klington.A 3
Performance of a database system

 Throughput: the number of tasks finished in a given time interval


 Response time: the amount of time to finish a single task from
the time it is submitted
Can we do better given more resources (CPU, disk, …)?
Parallel DBMS: exploring parallelism
 Divide a big problem into many smaller ones to be solved in parallel
 improve performance
parallel DBMS Traditional DBMS
interconnection network query answer

P P P DBMS

M M M
DB
DB George Klington.A
DB 4
DB
Why parallel DBMS?

 Improve performance:
Almost died 15 years ago; with renewed interests because
– Extremely large databases -- data collected from the Web
– Decision support queries -- costly on large data
– Hardware have become cheap
– Set-oriented nature of relational DB: paralization
 Improve reliability and availability: when one processor goes
down

George Klington.A 5
Shared Memory Architecture

Processors

Interconnection Network

Globally Shared Memory

Data Storage

George Klington.A 6
Shared Disk Architecture

Memory

Processors

Interconnection Network

Data Storage

George Klington.A 7
Shared Nothing Architecture

Interconnection Network

Processors Best architecture for large


systems

Memory
Provides linear speed up

Data
Storage Provides linear scale up

George Klington.A 8
parallelism

 Pipeline parallelism: the output of operation A is consumed


by another operation B, before A has produced the entire
output

 Data Partitioned parallelism: many machines performing


the same operation on different pieces of data
– Intraquery, interquery, intraoperation, interoperation

George Klington.A 9
Data Partitioning…

Round-robin partitioning
•Efficient for queries that access the entire set
of data

Range partitioning
•Can lead to data skew

Hash partitioning
•Data is evenly distributed

George Klington.A 10
Interquery vs. Intraquery parallelism

 interquery: different queries or transactions execute in parallel

 Intraquery: a single query in parallel on multiple processors


– Interoperation: operator tree
– Intraoperation: parallelize the same operation on different
sets of the same relations
• Parallel sorting
• Parallel join
• Selection, projection, aggregation

George Klington.A 11
Intraoperation parallelism -- loading/projection

A R, where R is partitioned across n processors


 Read tuples of R at all processors involved, in parallel
 Conduct projection on tuples
 Merge local results
– Duplicate elimination: via sorting

George Klington.A 12
interoperation parallelism

Execute different operations in a single query in parallel


Consider R1 R2 R3 R4
 Pipelined:
– temp1  R1 R2
– temp2  R3 temp1
– result  R4 temp2
 Independent:
– temp1  R1 R2
– temp2  R3 R4
– result  temp1 temp2 -- pipelining

George Klington.A 13
Thank You !!!

George Klington.A 14

You might also like