1 | #/* Author : Rick van der Zwet
2 | # * S-number : 0433373
3 | # * Version : $Id: README.txt 438 2007-12-28 02:53:47Z rick $
4 | # * Copyright : FreeBSD Licence
5 | # * Description : Assignment documentation
6 | # */
7 |
8 | = Index =
9 | * Preface
10 | * Methodology
11 | * Design decisions
12 | * Directory stucture
13 | * Programs structure
14 | * Definitions
15 | * Configurations
16 | ** Memory
17 | ** Cache
18 | * BUS2 Output
19 | * Usage / Running
20 | * Conclusion
21 | * Recommendationn
22 |
23 |
24 | = Preface =
25 | Over here you will find the results of assigment 3 -internally also
26 | called assignment 4- of Computer architicture. The main purpose of the
27 | assignment where to determine what the bank conflicts and memory
28 | bandwith where in certain configurations. Cache simulation is done by
29 | using dinero. To calculate the result several C programs had to be
30 | written.
31 |
32 | = Methodology =
33 | The only need of the dinero output will be the the BUS2 (-o2) output ,
34 | no need anything else. As perl, awk, grep and others are not allowed, a
35 | small custom C program called grep-bus2 is written. Shell calculate.sh
36 | will determine the correct cache options variables and will call the
37 | proper compu with the memory configuration.
38 |
39 | = Design decisions =
40 | * As we are working in 'debugging' a standard program I assume every call
41 | to the memory will fetch 32 bits (4bytes)
42 | * No cache will be simulated in dinero using 1 word cache, which might
43 | fake results while accessing multiple times the same memory cell
44 | (highly unlikely however)
45 | * Shell script will allow output to set benchmark file
46 |
47 |
48 | = Directory structure =
49 | Makefile = GNU Make config file
50 | compu.c = calculate bandwidth and bank conflicts
51 | common.[ch] = Common functions
52 | memory_std.[ch] = Standard memory
53 | memory_bank.[ch] = Bank memory
54 | memory_dram.[ch] = DRAM memory
55 | calculate.sh = Shell script to generate results of combinations
56 | develop.sh = Very simple script to keep running will coding to
57 | ensure continues feedback
58 | data = directory of the traces
59 | data/lisp.002.din = provided input
60 | data/spic.002.din = provided input
61 | docs = Some additional documentation
62 | docs/dineroIII.txt = man page
63 | src/dineroIII.tar.gz = dineroIII source
64 |
65 | = Programs structure =
66 | compu.c will be the logic of choosing the correct memory module to use.
67 | All memory implementations are defined in memory_<type>.[ch]. To avoid double
68 | coding a 'interface' common.c is defined which includes the common
69 | functions mainly outputs.
70 |
71 |
72 | = Definitions =
73 | The bandwidth of a cache/memory-system is the total number of bytes send
74 | between CPU and cache divided by the number of cycles. We assume here
75 | that memory activity is the bottleneck for the enire system in other
76 | words: that the cache is continuously busy processing requests.
77 |
78 | A bankconflict is defined as follows:
79 |
80 | For normal memory, a bankconflict is a request that is sent to a bank
81 | that is still busy handling the previous request (either in the access
82 | phase or in the bus transfer phase). The first memory acces incurs no
83 | conflict.
84 |
85 | In page-mode DRAM we do not use multiple banks, so the above definition
86 | is not so useful. In the case interpret the following as a bankconflict:
87 | a request for a page (=column) in which the previous request was not for
88 | the same page. The first access here also does not incur a conflict.
89 |
90 | = Configurations =
91 |
92 | == Memory ==
93 | 1) standard memory with random access time of 8 clock cycles
94 | 2) 4-bank word-interleaved memory with random access time of 8 clock
95 | cycles
96 | 3) 8-bank word-interleaved memory with random access time of 8 clock
97 | cycles
98 | 4) page-mode DRAM with a page-size of 64 words, a random access time of
99 | 8 clock cycles and a 'next access time' of 3 clock cycles.
100 | 5) page-mode DRAM with a page-size of 1024 words, a random access time
101 | of 8 clock cycles and a 'next access time' of 2 clock cycles.
102 |
103 |
104 | == Cache ==
105 | a) no cache, a write buffer of 1 word deep
106 | b) a 64 KB, unified, direct-mapped, write-through, no write-allocate
107 | cache with 4 word blocks and a 1 word write buffer
108 | c) a 64 KB, unified, direct-mapped, write-back, write-allocate cache
109 | with 4 word blocks and a 1 word write buffer
110 |
111 | = Assumptions =
112 | == Given ==
113 | * All adddresses in this assignment are word-aligned and all
114 | data-accesses to the cache are 1 word.
115 | * Per clock cycl, 1 request for 1 word can be handled, but requests
116 | should remain in order.
117 | * The time for submitting the requests does not have to be taken into
118 | consideration
119 | * If a memory access is requested to the same bank or row of a busy
120 | memory part it is implied to be to adifferent address
121 | * When accessing memory, the same 'checking' cycle will be used as
122 | initial memory call cycle
123 | * DRAM memory will also have RAS time of 1, all memory will have a bus
124 | and no need to disticts between all of them
125 | * Calculating Byte transfer between memory and cache is trivial,
126 | number of lines * bytes every line
127 | * Both read and writes are treaded the same, no optimizations are made
128 | to ensure simplicity
129 | * Code is not build to be optimized, but to be clear instead, which
130 | will result in 'dumb' loops
131 | * With DRAM worst case senario is used, meaning a call to a locked
132 | * memory adress or diffent page will block for RAS seconds
133 |
134 |
135 | = BUS2 Output =
136 | BUS2 <type> <size> <adddress> <reference_count> <instruction_count>
137 | * BUS2 are four literal characters to start bus record access is the access
138 | * type ( r for a bus-read, w for a bus-write, p for a bus-prefetch, s for
139 | snoop activity (output style 3 only).
140 | * size is the transfer size in bytes
141 | * address is a hexadecimal byte-address between 0 and ffffffff
142 | inclusively
143 | * reference_count is the number of demand references since the
144 | last bus transfer (i.e. cache misses)
145 | * instruction_count is the number of demand instruction fetches
146 | since the last bus transfer
147 |
148 | = Usage / Running =
149 | # Alter calculate.sh to speficy the right dinero binary path
150 | # Build binaries
151 | $ make
152 | # Call calculate.sh with proper argument for datafile
153 | $ sh calculate.sh <path-to-datafile>
154 | # Result will be posted to stderr and to the files res-xx.txt
155 | # Rerun will simplely overwrite all the old data
156 |
157 |
158 | = Conclusion =
159 | It really does depend which type of cache and memory to use. A lot of
160 | assumptions where made and many generalisations where made. In order
161 | to have the systems work better with eachother the system must need to
162 | have good kownleage of the underlaying implementation in hardware. It
163 | seems that the output traces lisp and spic has been optimized for the
164 | use of class C cause and type word memory. It's also pretty clear that
165 | without any cause the system will not preform at all and will just
166 | suffer from the really slow memory.
167 |
168 |
169 | = Recommendations =
170 | * As the input is text and only simple calculations are beeing used, allowing
171 | to use a interperted like perl, python or else would be pretty handy
172 | * /usr/local/edu/data does not exists this should be ~csca/edu/data
173 | * The 'large files' are tar gzipped 100k, 300k ;-)
174 | * Please specify which version of dinero to use and were the program needs to
175 | run on
176 | * dineroIII does not exists online anymore and has been replaced by dineroIV,
177 | compiling also fails for dineroIII on more modern systems, quick and dirty
178 | fix patch: http://rickvanderzwet.nl/svn/data/liacs/ca/opdr3/src/
179 | * Submit does not include an email where to submit to
180 | * Translate 'Het gaat om' part Assignment