[2] | 1 | #/* Author : Rick van der Zwet
|
---|
| 2 | # * S-number : 0433373
|
---|
| 3 | # * Version : $Id: README.txt 438 2007-12-28 02:53:47Z rick $
|
---|
| 4 | # * Copyright : FreeBSD Licence
|
---|
| 5 | # * Description : Assignment documentation
|
---|
| 6 | # */
|
---|
| 7 |
|
---|
| 8 | = Index =
|
---|
| 9 | * Preface
|
---|
| 10 | * Methodology
|
---|
| 11 | * Design decisions
|
---|
| 12 | * Directory stucture
|
---|
| 13 | * Programs structure
|
---|
| 14 | * Definitions
|
---|
| 15 | * Configurations
|
---|
| 16 | ** Memory
|
---|
| 17 | ** Cache
|
---|
| 18 | * BUS2 Output
|
---|
| 19 | * Usage / Running
|
---|
| 20 | * Conclusion
|
---|
| 21 | * Recommendationn
|
---|
| 22 |
|
---|
| 23 |
|
---|
| 24 | = Preface =
|
---|
| 25 | Over here you will find the results of assigment 3 -internally also
|
---|
| 26 | called assignment 4- of Computer architicture. The main purpose of the
|
---|
| 27 | assignment where to determine what the bank conflicts and memory
|
---|
| 28 | bandwith where in certain configurations. Cache simulation is done by
|
---|
| 29 | using dinero. To calculate the result several C programs had to be
|
---|
| 30 | written.
|
---|
| 31 |
|
---|
| 32 | = Methodology =
|
---|
| 33 | The only need of the dinero output will be the the BUS2 (-o2) output ,
|
---|
| 34 | no need anything else. As perl, awk, grep and others are not allowed, a
|
---|
| 35 | small custom C program called grep-bus2 is written. Shell calculate.sh
|
---|
| 36 | will determine the correct cache options variables and will call the
|
---|
| 37 | proper compu with the memory configuration.
|
---|
| 38 |
|
---|
| 39 | = Design decisions =
|
---|
| 40 | * As we are working in 'debugging' a standard program I assume every call
|
---|
| 41 | to the memory will fetch 32 bits (4bytes)
|
---|
| 42 | * No cache will be simulated in dinero using 1 word cache, which might
|
---|
| 43 | fake results while accessing multiple times the same memory cell
|
---|
| 44 | (highly unlikely however)
|
---|
| 45 | * Shell script will allow output to set benchmark file
|
---|
| 46 |
|
---|
| 47 |
|
---|
| 48 | = Directory structure =
|
---|
| 49 | Makefile = GNU Make config file
|
---|
| 50 | compu.c = calculate bandwidth and bank conflicts
|
---|
| 51 | common.[ch] = Common functions
|
---|
| 52 | memory_std.[ch] = Standard memory
|
---|
| 53 | memory_bank.[ch] = Bank memory
|
---|
| 54 | memory_dram.[ch] = DRAM memory
|
---|
| 55 | calculate.sh = Shell script to generate results of combinations
|
---|
| 56 | develop.sh = Very simple script to keep running will coding to
|
---|
| 57 | ensure continues feedback
|
---|
| 58 | data = directory of the traces
|
---|
| 59 | data/lisp.002.din = provided input
|
---|
| 60 | data/spic.002.din = provided input
|
---|
| 61 | docs = Some additional documentation
|
---|
| 62 | docs/dineroIII.txt = man page
|
---|
| 63 | src/dineroIII.tar.gz = dineroIII source
|
---|
| 64 |
|
---|
| 65 | = Programs structure =
|
---|
| 66 | compu.c will be the logic of choosing the correct memory module to use.
|
---|
| 67 | All memory implementations are defined in memory_<type>.[ch]. To avoid double
|
---|
| 68 | coding a 'interface' common.c is defined which includes the common
|
---|
| 69 | functions mainly outputs.
|
---|
| 70 |
|
---|
| 71 |
|
---|
| 72 | = Definitions =
|
---|
| 73 | The bandwidth of a cache/memory-system is the total number of bytes send
|
---|
| 74 | between CPU and cache divided by the number of cycles. We assume here
|
---|
| 75 | that memory activity is the bottleneck for the enire system in other
|
---|
| 76 | words: that the cache is continuously busy processing requests.
|
---|
| 77 |
|
---|
| 78 | A bankconflict is defined as follows:
|
---|
| 79 |
|
---|
| 80 | For normal memory, a bankconflict is a request that is sent to a bank
|
---|
| 81 | that is still busy handling the previous request (either in the access
|
---|
| 82 | phase or in the bus transfer phase). The first memory acces incurs no
|
---|
| 83 | conflict.
|
---|
| 84 |
|
---|
| 85 | In page-mode DRAM we do not use multiple banks, so the above definition
|
---|
| 86 | is not so useful. In the case interpret the following as a bankconflict:
|
---|
| 87 | a request for a page (=column) in which the previous request was not for
|
---|
| 88 | the same page. The first access here also does not incur a conflict.
|
---|
| 89 |
|
---|
| 90 | = Configurations =
|
---|
| 91 |
|
---|
| 92 | == Memory ==
|
---|
| 93 | 1) standard memory with random access time of 8 clock cycles
|
---|
| 94 | 2) 4-bank word-interleaved memory with random access time of 8 clock
|
---|
| 95 | cycles
|
---|
| 96 | 3) 8-bank word-interleaved memory with random access time of 8 clock
|
---|
| 97 | cycles
|
---|
| 98 | 4) page-mode DRAM with a page-size of 64 words, a random access time of
|
---|
| 99 | 8 clock cycles and a 'next access time' of 3 clock cycles.
|
---|
| 100 | 5) page-mode DRAM with a page-size of 1024 words, a random access time
|
---|
| 101 | of 8 clock cycles and a 'next access time' of 2 clock cycles.
|
---|
| 102 |
|
---|
| 103 |
|
---|
| 104 | == Cache ==
|
---|
| 105 | a) no cache, a write buffer of 1 word deep
|
---|
| 106 | b) a 64 KB, unified, direct-mapped, write-through, no write-allocate
|
---|
| 107 | cache with 4 word blocks and a 1 word write buffer
|
---|
| 108 | c) a 64 KB, unified, direct-mapped, write-back, write-allocate cache
|
---|
| 109 | with 4 word blocks and a 1 word write buffer
|
---|
| 110 |
|
---|
| 111 | = Assumptions =
|
---|
| 112 | == Given ==
|
---|
| 113 | * All adddresses in this assignment are word-aligned and all
|
---|
| 114 | data-accesses to the cache are 1 word.
|
---|
| 115 | * Per clock cycl, 1 request for 1 word can be handled, but requests
|
---|
| 116 | should remain in order.
|
---|
| 117 | * The time for submitting the requests does not have to be taken into
|
---|
| 118 | consideration
|
---|
| 119 | * If a memory access is requested to the same bank or row of a busy
|
---|
| 120 | memory part it is implied to be to adifferent address
|
---|
| 121 | * When accessing memory, the same 'checking' cycle will be used as
|
---|
| 122 | initial memory call cycle
|
---|
| 123 | * DRAM memory will also have RAS time of 1, all memory will have a bus
|
---|
| 124 | and no need to disticts between all of them
|
---|
| 125 | * Calculating Byte transfer between memory and cache is trivial,
|
---|
| 126 | number of lines * bytes every line
|
---|
| 127 | * Both read and writes are treaded the same, no optimizations are made
|
---|
| 128 | to ensure simplicity
|
---|
| 129 | * Code is not build to be optimized, but to be clear instead, which
|
---|
| 130 | will result in 'dumb' loops
|
---|
| 131 | * With DRAM worst case senario is used, meaning a call to a locked
|
---|
| 132 | * memory adress or diffent page will block for RAS seconds
|
---|
| 133 |
|
---|
| 134 |
|
---|
| 135 | = BUS2 Output =
|
---|
| 136 | BUS2 <type> <size> <adddress> <reference_count> <instruction_count>
|
---|
| 137 | * BUS2 are four literal characters to start bus record access is the access
|
---|
| 138 | * type ( r for a bus-read, w for a bus-write, p for a bus-prefetch, s for
|
---|
| 139 | snoop activity (output style 3 only).
|
---|
| 140 | * size is the transfer size in bytes
|
---|
| 141 | * address is a hexadecimal byte-address between 0 and ffffffff
|
---|
| 142 | inclusively
|
---|
| 143 | * reference_count is the number of demand references since the
|
---|
| 144 | last bus transfer (i.e. cache misses)
|
---|
| 145 | * instruction_count is the number of demand instruction fetches
|
---|
| 146 | since the last bus transfer
|
---|
| 147 |
|
---|
| 148 | = Usage / Running =
|
---|
| 149 | # Alter calculate.sh to speficy the right dinero binary path
|
---|
| 150 | # Build binaries
|
---|
| 151 | $ make
|
---|
| 152 | # Call calculate.sh with proper argument for datafile
|
---|
| 153 | $ sh calculate.sh <path-to-datafile>
|
---|
| 154 | # Result will be posted to stderr and to the files res-xx.txt
|
---|
| 155 | # Rerun will simplely overwrite all the old data
|
---|
| 156 |
|
---|
| 157 |
|
---|
| 158 | = Conclusion =
|
---|
| 159 | It really does depend which type of cache and memory to use. A lot of
|
---|
| 160 | assumptions where made and many generalisations where made. In order
|
---|
| 161 | to have the systems work better with eachother the system must need to
|
---|
| 162 | have good kownleage of the underlaying implementation in hardware. It
|
---|
| 163 | seems that the output traces lisp and spic has been optimized for the
|
---|
| 164 | use of class C cause and type word memory. It's also pretty clear that
|
---|
| 165 | without any cause the system will not preform at all and will just
|
---|
| 166 | suffer from the really slow memory.
|
---|
| 167 |
|
---|
| 168 |
|
---|
| 169 | = Recommendations =
|
---|
| 170 | * As the input is text and only simple calculations are beeing used, allowing
|
---|
| 171 | to use a interperted like perl, python or else would be pretty handy
|
---|
| 172 | * /usr/local/edu/data does not exists this should be ~csca/edu/data
|
---|
| 173 | * The 'large files' are tar gzipped 100k, 300k ;-)
|
---|
| 174 | * Please specify which version of dinero to use and were the program needs to
|
---|
| 175 | run on
|
---|
| 176 | * dineroIII does not exists online anymore and has been replaced by dineroIV,
|
---|
| 177 | compiling also fails for dineroIII on more modern systems, quick and dirty
|
---|
| 178 | fix patch: http://rickvanderzwet.nl/svn/data/liacs/ca/opdr3/src/
|
---|
| 179 | * Submit does not include an email where to submit to
|
---|
| 180 | * Translate 'Het gaat om' part Assignment
|
---|