









## Taking Advantage of Locality

- Memory hierarchy
- Store everything on disk
- Copy recently accessed (and nearby) items from disk to smaller DRAM memory
   Main memory
- Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory
  - Cache memory attached to CPU

<u>×</u> M<























| Ca                                                                            | che   | E> | am      | ole                                     |               |  |  |  |  |  |  |
|-------------------------------------------------------------------------------|-------|----|---------|-----------------------------------------|---------------|--|--|--|--|--|--|
| <ul><li>8-blocks, 1 word/block, direct mapped</li><li>Initial state</li></ul> |       |    |         |                                         |               |  |  |  |  |  |  |
|                                                                               | Index | V  | Tag     | Data                                    |               |  |  |  |  |  |  |
|                                                                               | 000   | N  | Ť       |                                         |               |  |  |  |  |  |  |
|                                                                               | 001   | N  |         |                                         |               |  |  |  |  |  |  |
|                                                                               | 010   | N  |         |                                         |               |  |  |  |  |  |  |
|                                                                               | 011   | N  |         |                                         |               |  |  |  |  |  |  |
|                                                                               | 100   | Ν  |         |                                         |               |  |  |  |  |  |  |
|                                                                               | 101   | N  |         |                                         |               |  |  |  |  |  |  |
|                                                                               | 110   | N  |         |                                         |               |  |  |  |  |  |  |
|                                                                               | 111   | N  |         |                                         |               |  |  |  |  |  |  |
| ⊼ M<                                                                          |       |    | Chapter | 5 — Large and Fast: Exploiting Memory H | ierarchy — 12 |  |  |  |  |  |  |



| Ca | che                                                             | E>     | camp        | le    |                |                  |
|----|-----------------------------------------------------------------|--------|-------------|-------|----------------|------------------|
|    | Word                                                            | addr   | Binary ad   | dr    | Hit/miss       | Cache block      |
|    | 22                                                              |        | 10 110      |       | Miss           | 110              |
|    | Index         V           000         N           001         N |        | Tag         | Dat   | a              |                  |
|    | 010<br>011                                                      | N<br>N |             |       |                |                  |
|    | 100                                                             | N      |             |       |                |                  |
|    | 101<br>110                                                      | N<br>Y | 10          | Me    | m[10110]       |                  |
|    | 111                                                             | Ν      |             |       |                |                  |
| M< |                                                                 |        | Chapter 5 - | – Lar | ge and Fast: E | xploiting Memory |

| Са   | che    | E>   | amp       | ble    | •              |                      |          |
|------|--------|------|-----------|--------|----------------|----------------------|----------|
|      | Word a | ıddr | Binary a  | ddr    | Hit/miss       | Cache block          |          |
|      | 26     | 26   |           | 11 010 |                | 010                  |          |
|      | Index  | V    | Tag       | Dat    | a              |                      |          |
|      | 000    | N    |           |        |                |                      |          |
|      | 001    | N    |           |        |                |                      |          |
|      | 010    | Y    | 11        | Me     | m[11010]       |                      |          |
|      | 011    | Ν    |           |        |                |                      |          |
|      | 100    | Ν    |           |        |                |                      |          |
|      | 101    | Ν    |           |        |                |                      |          |
|      | 110    | Y    | 10        | Me     | m[10110]       |                      |          |
|      | 111    | Ν    |           |        |                |                      |          |
| K MK |        |      | Chapter 5 | — Lar  | ge and Fast: E | xploiting Memory Hie | rarchy — |



| Cac | che     | Ex  | amp         | le    |                |                  |  |  |  |
|-----|---------|-----|-------------|-------|----------------|------------------|--|--|--|
| Г   | Word a  | ldr | Binary ad   | dr    | Hit/miss       | Cache block      |  |  |  |
| F   | 22      | 101 | 10 110      | ui    | Hit            | 110              |  |  |  |
| F   | 26      |     | 11 010      |       | Hit            | 010              |  |  |  |
| -   |         |     |             |       |                |                  |  |  |  |
| 1   | Index V |     | Tag         | Dat   | ta             |                  |  |  |  |
| (   | 000     | N   |             |       |                |                  |  |  |  |
| (   | 001     | Ν   |             |       |                |                  |  |  |  |
| (   | 010     | Y   | 11          | Mer   | n[11010]       |                  |  |  |  |
| (   | 011     | Ν   |             |       |                |                  |  |  |  |
| ŀ   | 100     | Ν   |             |       |                |                  |  |  |  |
| •   | 101     | Ν   |             |       |                |                  |  |  |  |
| ·   | 110     | Y   | 10          | Mer   | n[10110]       |                  |  |  |  |
|     | 111     | Ν   |             |       |                |                  |  |  |  |
| 1<  |         |     | Chapter 5 - | – Lar | ge and Fast: E | xploiting Memory |  |  |  |



| Ca | che     | E>   | amp       | le    | ļ              |                  |  |  |  |
|----|---------|------|-----------|-------|----------------|------------------|--|--|--|
|    |         |      | 1         |       |                |                  |  |  |  |
|    | Word a  | lddr | Binary ad | ddr   | Hit/miss       | Cache block      |  |  |  |
|    | 16      |      | 10 000    | )     | Miss           | 000              |  |  |  |
|    | 3       | 3    |           |       | Miss           | 011              |  |  |  |
|    | 16      |      | 10 000    | )     | Hit            | 000              |  |  |  |
|    | Index V |      | Tag       | Dat   | ita            |                  |  |  |  |
|    | 000     | Y    | 10        | Me    | m[10000]       |                  |  |  |  |
|    | 001     | N    |           |       |                |                  |  |  |  |
|    | 010     | Y    | 11        | Me    | Mem[11010]     |                  |  |  |  |
|    | 011     | Y    | 00        | Me    | Mem[00011]     |                  |  |  |  |
|    | 100     | N    |           |       |                |                  |  |  |  |
|    | 101     | N    |           |       |                |                  |  |  |  |
|    | 110     | Y    | 10        | Me    | m[10110]       |                  |  |  |  |
|    | 111     | N    |           |       |                |                  |  |  |  |
| M< |         |      | Chapter 5 | — Lar | ge and Fast: E | xploiting Memory |  |  |  |

| <br> |
|------|
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |
|      |

| Ca | che    | E>  | amp       | le    | !              |                  |  |  |  |  |
|----|--------|-----|-----------|-------|----------------|------------------|--|--|--|--|
|    | Word a | ddr | Binary ad | dr    | Hit/miss       | Cache block      |  |  |  |  |
|    | 18     |     | 10 010    | )     | Miss           | 010              |  |  |  |  |
|    | Index  | V   | Tag       | Dat   | ata            |                  |  |  |  |  |
|    | 000    |     |           |       |                |                  |  |  |  |  |
|    |        | ·   | 10        | we    | Mem[10000]     |                  |  |  |  |  |
|    | 001    | N   |           |       |                |                  |  |  |  |  |
|    | 010    | Y   | 10        | Me    | Mem[10010]     |                  |  |  |  |  |
|    | 011    | Y   | 00        | Me    | m[00011]       |                  |  |  |  |  |
|    | 100    | N   |           |       |                |                  |  |  |  |  |
|    | 101    | N   |           |       |                |                  |  |  |  |  |
|    | 110    | Y   | 10        | Me    | m[10110]       |                  |  |  |  |  |
|    | 111    | N   |           |       |                |                  |  |  |  |  |
| M< |        |     | Chapter 5 | – Lar | ge and Fast: E | xploiting Memory |  |  |  |  |





















|                                              | Word<br>Address                                              | Cache<br>Line index                      | Hit/<br>Miss                                         | n: Conflict Misses                                                                                                                        |
|----------------------------------------------|--------------------------------------------------------------|------------------------------------------|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| Pgm at<br>Pgm at<br>1024,<br>data at<br>37:  | 1024<br>37<br>1025<br>38<br>1026<br>39<br>1024<br>37         | 0<br>37<br>1<br>38<br>2<br>39<br>0<br>37 | HIT<br>HIT<br>HIT<br>HIT<br>HIT<br>HIT               | Assume:<br>1024-line DM cache<br>Block size = 1 word<br>Consider looping code, in<br>steady state<br>Assume WORD, not BYTE,<br>addressing |
| op B:<br>'gm at<br>1024,<br>data at<br>2048: | 1024<br>2048<br>1025<br>2049<br>1026<br>2050<br>1024<br>2048 | 0<br>0<br>1<br>2<br>2<br>0<br>0          | MISS<br>MISS<br>MISS<br>MISS<br>MISS<br>MISS<br>MISS | Inflexible mapping (each<br>address can only be in one<br>cache location) → Conflict<br>misses!                                           |





















| Sto      | opping point                                                 |
|----------|--------------------------------------------------------------|
|          |                                                              |
|          |                                                              |
|          |                                                              |
|          |                                                              |
|          |                                                              |
| <u> </u> | Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 33 |











|                                                                                                                                                                                  | A | sso     | ciati | vity     | Exa         | amp         | le              |                   |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|---------|-------|----------|-------------|-------------|-----------------|-------------------|--|--|
| <ul> <li>Compare 4-block caches</li> <li>Direct mapped, 2-way set associative, fully associative</li> <li>Block access sequence: 0, 8, 0, 6, 8</li> <li>Direct mapped</li> </ul> |   |         |       |          |             |             |                 |                   |  |  |
|                                                                                                                                                                                  |   | Block   | Cache | Hit/miss |             | Cache conte | nt after access | 3                 |  |  |
|                                                                                                                                                                                  |   | address | index |          | 0           | 1           | 2               | 3                 |  |  |
|                                                                                                                                                                                  |   | 0       | 0     | miss     | Mem[0]      |             |                 |                   |  |  |
|                                                                                                                                                                                  |   | 8       | 0     | miss     | Mem[8]      |             |                 |                   |  |  |
|                                                                                                                                                                                  |   | 0       | 0     | miss     | Mem[0]      |             |                 |                   |  |  |
|                                                                                                                                                                                  |   | 6       | 2     | miss     | Mem[0]      |             | Mem[6]          |                   |  |  |
|                                                                                                                                                                                  |   | 8       | 0     | miss     | Mem[8]      |             | Mem[6]          |                   |  |  |
| Ž                                                                                                                                                                                | M | <       |       | Chapter  | 5 — Large a | nd Fast: Ex | ploiting Memo   | ory Hierarchy — : |  |  |



















































|                                                                                                                                                                                                         | Encoding SEC |              |      |    |     |       |       |     |       |       |        |       |        |     |      |                |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|--------------|------|----|-----|-------|-------|-----|-------|-------|--------|-------|--------|-----|------|----------------|--|
| <ul> <li>To calculate Hamming code:</li> <li>Number bits from 1 on the left</li> <li>All bit positions that are a power 2 are parity bits</li> <li>Each parity bit checks certain data bits:</li> </ul> |              |              |      |    |     |       |       |     |       |       |        |       |        |     |      |                |  |
|                                                                                                                                                                                                         |              | Bit positi   | ion  | 1  | 2   | 3     | 4     | 5   | 6     | 7     | 8      | 9     | 10     | 11  | 12   |                |  |
|                                                                                                                                                                                                         |              | Encoded date | bits | p1 | p2  | d1    | p4    | d2  | d3    | d4    | p8     | d5    | d6     | d7  | d8   |                |  |
|                                                                                                                                                                                                         |              |              | p1   | Х  |     | х     |       | х   |       | х     |        | х     |        | х   |      |                |  |
|                                                                                                                                                                                                         |              | Parity       | p2   |    | Х   | Х     |       |     | Х     | Х     |        |       | х      | Х   |      |                |  |
|                                                                                                                                                                                                         |              | coverate     | p4   |    |     |       | Х     | Х   | Х     | Х     |        |       |        |     | Х    |                |  |
|                                                                                                                                                                                                         |              |              | p8   |    |     |       |       |     |       |       | Х      | Х     | х      | Х   | Х    |                |  |
| Ň                                                                                                                                                                                                       | M<           |              |      |    | Cha | apter | · 5 — | Lar | ge ar | nd Fa | ast: E | Explo | oiting | Mer | nory | Hierarchy — 55 |  |















- User and System modes
- Privileged instructions only available in system mode
  - Trap to system if executed in user mode
- All physical resources only accessible using privileged instructions
  - Including page tables, interrupt controls, I/O registers

- Renaissance of virtualization support
  - Current ISAs (e.g., x86) adapting

**≤** Μ<





































## **Memory Protection**

- Different tasks can share parts of their virtual address spaces
  - But need to protect against errant access
  - Requires OS assistance
- Hardware support for OS protection
  - Privileged supervisor mode (aka kernel mode)
  - Privileged instructions

M<

- Page tables and other state information only accessible in supervisor mode
- System call exception (e.g., syscall in MIPS)





|                                   | 1                                             |                 |
|-----------------------------------|-----------------------------------------------|-----------------|
| Associativity                     | Location method                               | Tag comparisons |
| Direct mapped                     | Index                                         | 1               |
| n-way set<br>associative          | Set index, then search entries within the set | n               |
| Fully associative                 | Search all entries                            | #entries        |
|                                   | Full lookup table                             | 0               |
| Virtual memory<br>Full table look | arisons to reduce cost                        | vity feasible   |









| Design change          | Effect on miss rate         | Negative performance effect                                                                             |
|------------------------|-----------------------------|---------------------------------------------------------------------------------------------------------|
| Increase cache size    | Decrease capacity<br>misses | May increase access time                                                                                |
| Increase associativity | Decrease conflict<br>misses | May increase access time                                                                                |
| Increase block size    | Decrease compulsory misses  | Increases miss<br>penalty. For very large<br>block size, may<br>increase miss rate<br>due to pollution. |

















|    | Са           | che Cohe                                            | rence              | Prob               | em                  |
|----|--------------|-----------------------------------------------------|--------------------|--------------------|---------------------|
|    | ad           | ippose two CPU<br>dress space<br>Write-through cach |                    | are a phys         | sical               |
|    | Time<br>step | Event                                               | CPU A's<br>cache   | CPU B's<br>cache   | Memory              |
|    | 0            |                                                     |                    |                    | 0                   |
|    | 1            | CPU A reads X                                       | 0                  |                    | 0                   |
|    | 2            | CPU B reads X                                       | 0                  | 0                  | 0                   |
|    | 3            | CPU A writes 1 to X                                 | 1                  | 0                  | 1                   |
|    |              | -                                                   |                    |                    |                     |
| M< | M<           | Chapte                                              | er 5 — Large and F | Fast: Exploiting M | emory Hierarchy — 8 |

















| Characteristic   | ARM Cortex-A8                                                                                                     | Intel Core i7                                                                                                                              |
|------------------|-------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
| Virtual address  | 32 bits                                                                                                           | 48 bits                                                                                                                                    |
| Physical address | 32 bits                                                                                                           | 44 bits                                                                                                                                    |
| Page size        | Variable: 4, 16, 64 KiB, 1, 16 MiB                                                                                | Variable: 4 KiB, 2/4 MiB                                                                                                                   |
| TLB organization | 1 TLB for instructions and 1 TLB<br>for data                                                                      | 1 TLB for instructions and 1 TLB for<br>data per core                                                                                      |
|                  | Both TLBs are fully associative,<br>with 32 entries, round robin<br>replacement<br>TLB misses handled in hardware | Both L1 TLBs are four-way set<br>associative, LRU replacement<br>L1 I-TLB has 128 entries for small<br>pages, 7 per thread for large pages |
|                  |                                                                                                                   | L1 D-TLB has 64 entries for small<br>pages, 32 for large pages<br>The L2 TLB is four-way set associative<br>LRU replacement                |
|                  |                                                                                                                   | The L2 TLB has 512 entries                                                                                                                 |
|                  |                                                                                                                   | TLB misses handled in hardware                                                                                                             |

















