In general terms, cache is a store which are fast accessible and where you keep things which are frequently accessed. It can be a website, browser, or processor cache. In this post, we are focusing on memory caches.
A small and fast memory which stores some block of data being used by process. The block of cached data can be something which is recently used or something which is close to what was recently accessed. There are different techniques how data is cached and moved out like least recently used, round robin, least frequently used etc. With advent of changes in processor technologies and memory chips, level of caches has gone up from 1 to 2 to 3. Multiple levels can be there between processor and memory
Why do we need to cache?
There is a huge difference between processing speed of processors and memories which are usually DRAM. So, if it takes more time to fetch instruction or data from memory, overall throughput of machine goes down. That’s why we need faster SRAM memories which have considerably less latency in fetching instruction or data. These SRAM sit between processor and main memory to speed things up. SRAM chips are faster since they do not have to refresh their contents.
Then, why not have main memory equally fast? It is because SRAM are very expensive and power consuming, so their size beyond a point is not affordable.
Programming with caches
Caches are transparent to programmers, it means that programmer does not have to worry about caching strategy while coding however, sometimes, keeping this in mind while programming may give tremendous performance benefits. Programmers are kept oblivious to cache to shield them from the micro instruction of CPU which usually change. Also, cache implementations vary even in same processor, so it will be difficult to write general purpose code. Size of cache limits the scalability of program.
Size and latency of various storages
Operations on data
Whenever a process asks for a data, it is first checked in cache. If data is already cached and present, it is returned. This is called as a Hit. If data is not present, main memory is accessed and data is brought returned to process and put into cache also. This is called as a Miss.
What’s the data cached?
By caching data, operating systems want to minimize delay to fetch next data or instruction. Cache mechanisms use principle of locality to bring in data which may be accessed next, based on currently accessed data, for faster access. There are two locality principles:
Temporal locality: data which is used recently may be used again in near future.
Spatial locality: data near to current accessed data may be accessed in near future.
Data consistency with caches
With cache in between memory and process, there are two copies of data. How to maintain consistency of data in this case? What happens, when process accesses data, gets it from cache and want to modify it? There are three ways to do so.
- Data is written on to cache and main memory with write through approach. Both are in consistent state.
- Data is accessed, modified in main memory but not updated in cache. This is write around approach.
- With copy back method, when data is just written back to cache but main memory is not updated.
Cache mapping techniques
As mentioned earlier, size of cache is much smaller than main memory and during writes by process, data has to be put back to main memory. So we need some kind of mapping between two.
Before understand that a line is nothing but a block of data which is accessed together and mapped to a block of main memory. Each line has a tag (usually high order bits in cache line address) which identifies which memory address it is mapped to.There are three mapping techniques : Direct mapping, fully associative and set associative mapping
Consider a cache of 128 blocks and 16 words each. In this scheme, each address j in memory is mapped to j modulo 128 line.How can we map memory of 64 KB divided into 4096 bytes blocks? Block number 0,128,256….3096 will map to same block number in cache,
Number of bits required for addressing 64 KB will be 16 bits. To represent 16 words in cache line, we need 4 bits. To address 128 cache lines, 7 bits are used. Rest 5 bits are used to mark as tag.
How to identify that memory block is present in cache? Let’s take an example: accessed address is 43029 ( 10101 0000001 0101). Here, tag is 5, block is 1 and word 5. If block 1 has tag 5, then it’s a hit else miss.
Drawback is that more than one memory address may be mapped to same cache line. For example, in example if address with block 0 and 128 are accessed together, there will be miss every time and hence decreasing the performance.
A block of memory can be anywhere in cache as opposed to fixed positions based memory address bits in direct mapping. Each line has tag. When processor accesses memory location, it checks all tags and determines if it’s cache hit. This slows down memory access (checking all tags for each memory access), however, since all tags are checked in parallel using special circuitry, it’s not an expensive operation.The last log(size of line) bits defines word number to be accessed from cache.
To map 64KB memory, 16 words each, in 128 block memory of 16 words each, 16 bits address are required, last 4 bits (log(16)) give word’s address while most significant 12 bits give tag.
Set associative map
To avoid checking all tags, there is another mapping technique called as set associative mapping. Set associative mapping is mixture of direct map and associative map. Set contains multiple lines, with number ranging from 2 to 16 lines/set. Specific bits in address identify which set stores given memory address, . When processor accesses memory location, it first looks at set bits, goes to that set and checks all tags in it.
Translation look-aside buffers
Translation of virtual or logical address to physical address takes a significant processing time in virtual memory implementation. To understand what is virtual memory and it’s implementation, please refer to post on virtual memory and paging.
Each time process accesses a page, it refers page table which leads to delay. Solution for this is having a cache for page table too. It is known as translation look-aside buffer, commonly known as TLB.
Process goes to TLB first when it accesses page. If physical address of page is available in TLB, no need to access page table and do all translations, page is accessed using physical address. Whenever, a page is moves out of main memory, then its reference from TLB is removed.
Please share if something is wrong or missing. If you want to contribute on algorithms and me, please refer publishing on algorithmandme.