Memory Mapping
The mapping from the virtual address space to physical address space is complicated and goes through several intermediate address spaces.
The bits in an address can be split into bits that address the page, the cache line, the vector line, the word and the byte. By 'vector line' we mean 'number of lanes * word size'.
|--- Page ---|--- Cache line --|--- Vector line ---|--- Word --- |--- Byte ---|
Virtual Address -> Alpha Physical Address
The standard mapping done by the TLB. Only changes the page bits.
Alpha Physical Address -> Beta Physical Address
This is a permutation of the bytes in a vector line. It depends on the 'element-width' configuration of the page, which is a configuration variable stored in a page table. It modifies the 'word' and 'byte' address bits. It will be discussed more later.
Beta Physical Address -> Gamma Physical Address
This is a permutation of the order of the words in a vector line. The purpose of this is to arrange the words in the grid so that words close in the vector line are also close in the 2-dimensional mesh. It modifies the 'word' address bits.
Gamma Physical Address -> (DRAM instance, DRAM address)
Maps the address space into the address space of the individual DRAMs. It extracts some of the 'word' bits to define the DRAM instance, and the rest are used to define the DRAM address.
Caching
The cache is distributed across the lanes. Each word of memory from a DRAM is bound to a specific lane, which is responsible for caching it. All requests for that word of memory must pass through that lane. This makes memory accesses fast and efficient when the accesses are aligned to the vector line width, and very inefficient when they are not.
Alpha <-> Beta Mapping
In the RISC-V vector extension the elements are arranged in a vector consecutively. This means that if we have 64-bit words and 16-bit elements and a 4 word vector we would have.
|---Word 1--|---Word 2--|---Word 3--|---Word 4--|
e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 eA eB eC eD eE eF
And if we have another vector with elements of size 32-bit using LMUL=2 it would be arranged as
|---Word 1--|---Word 2--|---Word 3--|---Word 4--|
f0 f1 f2 f3 f4 f5 f6 f7
f8 f9 fA fB fC fD fE fF
This is a problem if we want to add these two vectors. Adding e0 with f0 is no problem, adding e1 and f1 is ok, but when we try to add e2 with f2 we see that they are in different lanes, and so we would have to do lane-to-lane communication to support a simple vector add.
We can't change how the elements are laid out in the address space since that is defined in the riscv vector extension, however we can change how they are physically laid out in the memory. Each page is assigned an 'element-width' property, which determines how the address space maps to the bytes of the physical page.
For example if the page has an 'element-width' of 16-bits then the bytes are arranged as
|----Word 1-------------|----Word 2-------------|----Word 3-------------|----Word 4-------------|
address 00 01 08 09 10 11 18 19 02 03 0A 0B 12 13 1A 1B 04 05 0C 0D 14 15 1C 1D 06 07 0E 0F 16 17 1E 1F
Such that when a elements of width 16 (matching the page element-width) are placed in the memory they are arranged as:
Page layout with element width 16 bits
|----Word 1-------------|----Word 2-------------|----Word 3-------------|----Word 4-------------|
address 00 01 08 09 10 11 18 19 02 03 0A 0B 12 13 1A 1B 04 05 0C 0D 14 15 1C 1D 06 07 0E 0F 16 17 1E 1F
element e0 e4 e8 eC e1 e5 e9 eD e2 e6 eA eE e3 e7 eB eF
Page layout with element width 32 bits
|----Word 1-------------|----Word 2-------------|----Word 3-------------|----Word 4-------------|
address 00 01 02 03 10 11 12 13 04 05 06 07 14 15 16 17 08 09 0A 0B 18 19 1A 1B 0C 0D 0E 0F 1C 1D 1E 1F
element f0 f4 f1 f5 f2 f6 f3 f7
f8 fC f9 fD fA fE fB fF
We can see that if the if a vector of elements of width 16 is placed in a page with element-width 16, and a vector with elements of width 32 is placed in a page with element-width 32 then the elements contained in each physical word are the same.
Beta <-> Gamma Mapping
The lanes, when considered in address space order, are laid out in a Moore curve on the 2D mesh. The words of the address space are interleaved across the lanes following this pattern. This means that two words close in the address space will also be close physically on the mesh.
Gamma <-> DRAM Mapping
Each lane group has a dedicated memlet which communicates with a dedicated DRAM. The words of a DRAM cache line are interleaved across the lanes of that lane group.