source: liacs/ca/opdr3/docs/dineroIII.txt@ 368

Last change on this file since 368 was 2, checked in by Rick van der Zwet, 15 years ago

Initial import of data of old repository ('data') worth keeping (e.g. tracking
means of URL access statistics)

File size: 13.6 KB
Line 
1DINEROIII() DINEROIII()
2
3
4
5NNAAMMEE
6 dineroIII - cache simulator, version III
7
8SSYYNNOOPPSSIISS
9 ddiinneerrooIIIIII -b block_size -u unified_cache_size -i instruction_cache_size
10 -d data_cache_size [ other_options ]
11
12DDEESSCCRRIIPPTTIIOONN
13 _d_i_n_e_r_o_I_I_I is a trace-driven cache simulator that supports sub-block
14 placement. Simulation results are determined by the input trace and
15 the cache parameters. A trace is a finite sequence of memory refer-
16 ences usually obtained by the interpretive execution of a program or
17 set of programs. Trace input is read by the simulator in _d_i_n format
18 (described later). Cache parameters, e.g. block size and associativ-
19 ity, are set with command line options (also described later).
20 _d_i_n_e_r_o_I_I_I uses the priority stack method of memory hierarchy simulation
21 to increase flexibility and improve simulator performance in highly
22 associative caches. One can simulate either a unified cache (mixed,
23 data and instructions cached together) or separate instruction and data
24 caches. This version of _d_i_n_e_r_o_I_I_I does not permit the simultaneous
25 simulation of multiple alternative caches.
26
27 _d_i_n_e_r_o_I_I_I differs from most other cache simulators because it supports
28 sub-block placement (also known as sector placement) in which address
29 tags are still associated with cache blocks but data is transferred to
30 and from the cache in smaller sub-blocks. This organization is espe-
31 cially useful for on-chip microprocessor caches which have to load data
32 on cache misses over a limited number of pins. In traditional cache
33 design, this constraint leads to small blocks. Unfortunately, a cache
34 with small block devotes much more on-chip RAM to address tags than
35 does one with large blocks. Sub-block placement allows a cache to have
36 small sub-blocks for fast data transfer and large blocks to associate
37 with address tags for efficient use of on-chip RAM.
38
39 Trace-driven simulation is frequently used to evaluating memory hierar-
40 chy performance. These simulations are repeatable and allow cache
41 design parameters to be varied so that effects can be isolated. They
42 are cheaper than hardware monitoring and do not require access to or
43 the existence of the machine being studied. Simulation results can be
44 obtained in many situations where analytic model solutions are
45 intractable without questionable simplifying assumptions. Further,
46 there does not currently exist any generally accepted model for program
47 behavior, let alone one that is suitable for cache evaluation; work-
48 loads in trace-driven simulation are represented by samples of real
49 workloads and contain complex embedded correlations that synthetic
50 workloads often lack. Lastly, a trace-driven simulation is guaranteed
51 to be representative of at least one program in execution.
52
53 _d_i_n_e_r_o_I_I_I reads trace input in _d_i_n format from _s_t_d_i_n. A _d_i_n record is
54 two-tuple _l_a_b_e_l _a_d_d_r_e_s_s. Each line of the trace file must contain one
55 _d_i_n record. The rest of the line is ignored so that comments can be
56 included in the trace file.
57
58 The _l_a_b_e_l gives the access type of a reference.
59
60 0 read data.
61 1 write data.
62 2 instruction fetch.
63 3 escape record (treated as unknown access type).
64 4 escape record (causes cache flush).
65
66 The _a_d_d_r_e_s_s is a hexadecimal byte-address between 0 and ffffffff inclu-
67 sively.
68
69 Cache parameters are set by command line options. Parameters
70 _b_l_o_c_k___s_i_z_e and either _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e or both _d_a_t_a___c_a_c_h_e___s_i_z_e and
71 _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e must be specified. Other parameters are
72 optional. The suffixes _K, _M and _G multiply numbers by 1024, 1024^2 and
73 1024^3, respectively.
74
75 The following command line options are available:
76
77 --bb _b_l_o_c_k___s_i_z_e
78 sets the cache block size in bytes. Must be explicitly set
79 (e.g. -b16).
80
81 --uu _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e
82 sets the unified cache size in bytes (e.g., -u16K). A unified
83 cache, also called a mixed cache, caches both data and instruc-
84 tions. If _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e is positive, both _i_n_s_t_r_u_c_-
85 _t_i_o_n___c_a_c_h_e___s_i_z_e and _d_a_t_a___c_a_c_h_e___s_i_z_e must be zero. If zero,
86 implying separate instruction and data caches will be simulated,
87 both _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e and _d_a_t_a___c_a_c_h_e___s_i_z_e must be set to
88 positive values. Defaults to 0.
89
90 --ii _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e
91 sets the instruction cache size in bytes (e.g. -i16384).
92 Defaults to 0 indicating a unified cache simulation. If posi-
93 tive, the _d_a_t_a___c_a_c_h_e___s_i_z_e must be positive as well.
94
95 --dd _d_a_t_a___c_a_c_h_e___s_i_z_e
96 sets the data cache size in bytes (e.g. -d1M). Defaults to 0
97 indicating a unified cache simulation. If positive, the
98 _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e must be positive as well.
99
100 --SS _s_u_b_b_l_o_c_k___s_i_z_e
101 sets the cache sub-block size in bytes. Defaults to 0 indicat-
102 ing that sub-block placement is not being used (i.e. -S0).
103
104 --aa _a_s_s_o_c_i_a_t_i_v_i_t_y
105 sets the cache associativity. A direct-mapped cache has asso-
106 ciativity 1. A two-way set-associative cache has associativity
107 2. A fully associative cache has associativity
108 _d_a_t_a___c_a_c_h_e___s_i_z_e_/_b_l_o_c_k___s_i_z_e. Defaults to direct-mapped placement
109 (i.e. -a1).
110
111 --rr _r_e_p_l_a_c_e_m_e_n_t___p_o_l_i_c_y
112 sets the cache replacement policy. Valid replacement policies
113 are _l (LRU), _f (FIFO), and _r (RANDOM). Defaults to LRU (i.e.
114 -rl).
115
116 --ff _f_e_t_c_h___p_o_l_i_c_y
117 sets the cache fetch policy. Demand-fetch (_d), which fetches
118 blocks that are needed to service a cache reference, is the most
119 common fetch policy. All other fetch policies are methods of
120 prefetching. Prefetching is never done after writes. The
121 prefetch target is determined by the --pp option and whether sub-
122 block placement is enabled.
123
124 d demand-fetch which never prefetches.
125 a always-prefetch which prefetches after every demand ref-
126 erence.
127 m miss-prefetch which prefetches after every demand miss.
128 t tagged-prefetch which prefetches after the first demand
129 miss to a (sub)-block. The next two prefetch options work only
130 with sub-block placement.
131 l load-forward-prefetch (sub-block placement only) works
132 like prefetch-always within a block, but it will not attempt to
133 prefetch sub-blocks in other blocks.
134 S sub-block-prefetch (sub-block placement only) works like
135 prefetch-always within a block except when references near the
136 end of a block. At this point sub-block-prefetches references
137 will wrap around within the current block.
138
139 Defaults to demand-fetch (i.e. -fd).
140
141 --pp _p_r_e_f_e_t_c_h___d_i_s_t_a_n_c_e
142 sets the prefetch distance in sub-blocks if sub-block placement
143 is enabled or in blocks if it is not. A prefetch_distance of 1
144 means that the next sequential (sub)-block is the potential tar-
145 get of a prefetch. Defaults to 1 (i.e. -p1).
146
147 --PP _a_b_o_r_t___p_r_e_f_e_t_c_h___p_e_r_c_e_n_t
148 sets the percentage of prefetches that are aborted. This can be
149 used to examine the effects of data references blocking prefetch
150 references from reaching a shared cache. Defaults to no
151 prefetches aborted (i.e. -P0).
152
153 --ww _w_r_i_t_e___p_o_l_i_c_y
154 selects one of two the cache write policies. Write-through (_w)
155 updates main memory on all writes. Copy-back (_c) updates main
156 memory only when a dirty block is replaced or the cache is
157 flushed. Defaults to copy-back (i.e. -wc)
158
159 --AA _w_r_i_t_e___a_l_l_o_c_a_t_i_o_n___p_o_l_i_c_y
160 selects whether a (sub)-block is loaded on a write reference.
161 Write-allocate (_w) causes (sub)-blocks to be loaded on all ref-
162 erences that miss. Non-write-allocate (_n) causes (sub)-blocks
163 to be loaded only on non-write references that miss. Defaults
164 to write-allocate (i.e. -Aw).
165
166 --DD _d_e_b_u_g___f_l_a_g
167 used by implementor to debug simulator. A debug_flag of _0 dis-
168 ables debugging; _1 prints the priority stacks after every refer-
169 ence; and _2 prints the priority stacks and performance metrics
170 after every reference. Debugging information may be useful to
171 the user to understand the precise meaning of all cache parame-
172 ter settings. Defaults to no-debug (i.e. -D0).
173
174 --oo _o_u_t_p_u_t___s_t_y_l_e
175 sets the output style. Terse-output (_0) prints results only at
176 the end of the simulation run. Verbose-output (_1) prints
177 results at half-million reference increments and at the end of
178 the simulation run. Bus-output (_2) prints an output record for
179 every memory bus transfer. Bus_and_snoop-output (_3) prints an
180 output record for every memory bus transfer and clean sub-block
181 that is replaced. Defaults to terse-output (i.e. -o0). For
182 bus-output, each bus record is a six-tuple:
183
184 _B_U_S_2 are four literal characters to start bus record
185 _a_c_c_e_s_s is the access type ( _r for a bus-read, _w for a bus-write,
186 _p for a bus-prefetch, _s for snoop activity (output style 3
187 only).
188 _s_i_z_e is the transfer size in bytes
189 _a_d_d_r_e_s_s is a hexadecimal byte-address between 0 and ffffffff
190 inclusively
191 _r_e_f_e_r_e_n_c_e___c_o_u_n_t is the number of demand references since the
192 last bus transfer
193 _i_n_s_t_r_u_c_t_i_o_n___c_o_u_n_t is the number of demand instruction fetches
194 since the last bus transfer
195
196 --ZZ _s_k_i_p___c_o_u_n_t
197 sets the number of trace references to be skipped before begin-
198 ning cache simulation. Defaults to none (i.e. -Z0).
199
200 --zz _m_a_x_i_m_u_m___c_o_u_n_t
201 sets the maximum number of trace references to be processed
202 after skipping the trace references specified by _s_k_i_p___c_o_u_n_t _.
203 Note, references generated by the simulator not read from the
204 trace (e.g. prefetch references) are not included in this count.
205 Defaults to 10 million (i.e. -z10000000).
206
207 --QQ _f_l_u_s_h___c_o_u_n_t
208 sets the number of references between cache flushes. Can be
209 used to crudely simulate multiprogramming. Defaults to no
210 flushing (i.e. -Q0).
211
212FFIILLEESS
213 _d_o_c_._h contains additional programmer documentation.
214
215SSEEEE AALLSSOO
216 Mark D. Hill and Alan Jay Smith, _E_x_p_e_r_i_m_e_n_t_a_l _E_v_a_l_u_a_t_i_o_n _o_f _O_n_-_C_h_i_p
217 _M_i_c_r_o_p_r_o_c_e_s_s_o_r _C_a_c_h_e _M_e_m_o_r_i_e_s, _P_r_o_c_. _E_l_e_v_e_n_t_h _I_n_t_e_r_n_a_t_i_o_n_a_l _S_y_m_p_o_s_i_u_m
218 _o_n _C_o_m_p_u_t_e_r _A_r_c_h_i_t_e_c_t_u_r_e, June 1984, Ann Arbor, MI.
219
220 Alan Jay Smith, _C_a_c_h_e _M_e_m_o_r_i_e_s, _C_o_m_p_u_t_i_n_g _S_u_r_v_e_y_s, 14-3, September
221 1982.
222
223BBUUGGSS
224 Not all combination of options have been thoroughly tested.
225
226AAUUTTHHOORR
227 Mark D. Hill
228 Computer Sciences Dept.
229 1210 West Dayton St.
230 Univ. of Wisconsin
231 Madison, WI 53706
232
233 markhill@cs.wisc.edu
234
235
236
237
238
2394th Berkeley Distribution DINEROIII()
Note: See TracBrowser for help on using the repository browser.