#/* Author : Rick van der Zwet # * S-number : 0433373 # * Version : $Id: README.txt 438 2007-12-28 02:53:47Z rick $ # * Copyright : FreeBSD Licence # * Description : Assignment documentation # */ = Index = * Preface * Methodology * Design decisions * Directory stucture * Programs structure * Definitions * Configurations ** Memory ** Cache * BUS2 Output * Usage / Running * Conclusion * Recommendationn = Preface = Over here you will find the results of assigment 3 -internally also called assignment 4- of Computer architicture. The main purpose of the assignment where to determine what the bank conflicts and memory bandwith where in certain configurations. Cache simulation is done by using dinero. To calculate the result several C programs had to be written. = Methodology = The only need of the dinero output will be the the BUS2 (-o2) output , no need anything else. As perl, awk, grep and others are not allowed, a small custom C program called grep-bus2 is written. Shell calculate.sh will determine the correct cache options variables and will call the proper compu with the memory configuration. = Design decisions = * As we are working in 'debugging' a standard program I assume every call to the memory will fetch 32 bits (4bytes) * No cache will be simulated in dinero using 1 word cache, which might fake results while accessing multiple times the same memory cell (highly unlikely however) * Shell script will allow output to set benchmark file = Directory structure = Makefile = GNU Make config file compu.c = calculate bandwidth and bank conflicts common.[ch] = Common functions memory_std.[ch] = Standard memory memory_bank.[ch] = Bank memory memory_dram.[ch] = DRAM memory calculate.sh = Shell script to generate results of combinations develop.sh = Very simple script to keep running will coding to ensure continues feedback data = directory of the traces data/lisp.002.din = provided input data/spic.002.din = provided input docs = Some additional documentation docs/dineroIII.txt = man page src/dineroIII.tar.gz = dineroIII source = Programs structure = compu.c will be the logic of choosing the correct memory module to use. All memory implementations are defined in memory_.[ch]. To avoid double coding a 'interface' common.c is defined which includes the common functions mainly outputs. = Definitions = The bandwidth of a cache/memory-system is the total number of bytes send between CPU and cache divided by the number of cycles. We assume here that memory activity is the bottleneck for the enire system in other words: that the cache is continuously busy processing requests. A bankconflict is defined as follows: For normal memory, a bankconflict is a request that is sent to a bank that is still busy handling the previous request (either in the access phase or in the bus transfer phase). The first memory acces incurs no conflict. In page-mode DRAM we do not use multiple banks, so the above definition is not so useful. In the case interpret the following as a bankconflict: a request for a page (=column) in which the previous request was not for the same page. The first access here also does not incur a conflict. = Configurations = == Memory == 1) standard memory with random access time of 8 clock cycles 2) 4-bank word-interleaved memory with random access time of 8 clock cycles 3) 8-bank word-interleaved memory with random access time of 8 clock cycles 4) page-mode DRAM with a page-size of 64 words, a random access time of 8 clock cycles and a 'next access time' of 3 clock cycles. 5) page-mode DRAM with a page-size of 1024 words, a random access time of 8 clock cycles and a 'next access time' of 2 clock cycles. == Cache == a) no cache, a write buffer of 1 word deep b) a 64 KB, unified, direct-mapped, write-through, no write-allocate cache with 4 word blocks and a 1 word write buffer c) a 64 KB, unified, direct-mapped, write-back, write-allocate cache with 4 word blocks and a 1 word write buffer = Assumptions = == Given == * All adddresses in this assignment are word-aligned and all data-accesses to the cache are 1 word. * Per clock cycl, 1 request for 1 word can be handled, but requests should remain in order. * The time for submitting the requests does not have to be taken into consideration * If a memory access is requested to the same bank or row of a busy memory part it is implied to be to adifferent address * When accessing memory, the same 'checking' cycle will be used as initial memory call cycle * DRAM memory will also have RAS time of 1, all memory will have a bus and no need to disticts between all of them * Calculating Byte transfer between memory and cache is trivial, number of lines * bytes every line * Both read and writes are treaded the same, no optimizations are made to ensure simplicity * Code is not build to be optimized, but to be clear instead, which will result in 'dumb' loops * With DRAM worst case senario is used, meaning a call to a locked * memory adress or diffent page will block for RAS seconds = BUS2 Output = BUS2 * BUS2 are four literal characters to start bus record access is the access * type ( r for a bus-read, w for a bus-write, p for a bus-prefetch, s for snoop activity (output style 3 only). * size is the transfer size in bytes * address is a hexadecimal byte-address between 0 and ffffffff inclusively * reference_count is the number of demand references since the last bus transfer (i.e. cache misses) * instruction_count is the number of demand instruction fetches since the last bus transfer = Usage / Running = # Alter calculate.sh to speficy the right dinero binary path # Build binaries $ make # Call calculate.sh with proper argument for datafile $ sh calculate.sh # Result will be posted to stderr and to the files res-xx.txt # Rerun will simplely overwrite all the old data = Conclusion = It really does depend which type of cache and memory to use. A lot of assumptions where made and many generalisations where made. In order to have the systems work better with eachother the system must need to have good kownleage of the underlaying implementation in hardware. It seems that the output traces lisp and spic has been optimized for the use of class C cause and type word memory. It's also pretty clear that without any cause the system will not preform at all and will just suffer from the really slow memory. = Recommendations = * As the input is text and only simple calculations are beeing used, allowing to use a interperted like perl, python or else would be pretty handy * /usr/local/edu/data does not exists this should be ~csca/edu/data * The 'large files' are tar gzipped 100k, 300k ;-) * Please specify which version of dinero to use and were the program needs to run on * dineroIII does not exists online anymore and has been replaced by dineroIV, compiling also fails for dineroIII on more modern systems, quick and dirty fix patch: http://rickvanderzwet.nl/svn/data/liacs/ca/opdr3/src/ * Submit does not include an email where to submit to * Translate 'Het gaat om' part Assignment