1 | DINEROIII() DINEROIII()
|
---|
2 |
|
---|
3 |
|
---|
4 |
|
---|
5 | NNAAMMEE
|
---|
6 | dineroIII - cache simulator, version III
|
---|
7 |
|
---|
8 | SSYYNNOOPPSSIISS
|
---|
9 | ddiinneerrooIIIIII -b block_size -u unified_cache_size -i instruction_cache_size
|
---|
10 | -d data_cache_size [ other_options ]
|
---|
11 |
|
---|
12 | DDEESSCCRRIIPPTTIIOONN
|
---|
13 | _d_i_n_e_r_o_I_I_I is a trace-driven cache simulator that supports sub-block
|
---|
14 | placement. Simulation results are determined by the input trace and
|
---|
15 | the cache parameters. A trace is a finite sequence of memory refer-
|
---|
16 | ences usually obtained by the interpretive execution of a program or
|
---|
17 | set of programs. Trace input is read by the simulator in _d_i_n format
|
---|
18 | (described later). Cache parameters, e.g. block size and associativ-
|
---|
19 | ity, are set with command line options (also described later).
|
---|
20 | _d_i_n_e_r_o_I_I_I uses the priority stack method of memory hierarchy simulation
|
---|
21 | to increase flexibility and improve simulator performance in highly
|
---|
22 | associative caches. One can simulate either a unified cache (mixed,
|
---|
23 | data and instructions cached together) or separate instruction and data
|
---|
24 | caches. This version of _d_i_n_e_r_o_I_I_I does not permit the simultaneous
|
---|
25 | simulation of multiple alternative caches.
|
---|
26 |
|
---|
27 | _d_i_n_e_r_o_I_I_I differs from most other cache simulators because it supports
|
---|
28 | sub-block placement (also known as sector placement) in which address
|
---|
29 | tags are still associated with cache blocks but data is transferred to
|
---|
30 | and from the cache in smaller sub-blocks. This organization is espe-
|
---|
31 | cially useful for on-chip microprocessor caches which have to load data
|
---|
32 | on cache misses over a limited number of pins. In traditional cache
|
---|
33 | design, this constraint leads to small blocks. Unfortunately, a cache
|
---|
34 | with small block devotes much more on-chip RAM to address tags than
|
---|
35 | does one with large blocks. Sub-block placement allows a cache to have
|
---|
36 | small sub-blocks for fast data transfer and large blocks to associate
|
---|
37 | with address tags for efficient use of on-chip RAM.
|
---|
38 |
|
---|
39 | Trace-driven simulation is frequently used to evaluating memory hierar-
|
---|
40 | chy performance. These simulations are repeatable and allow cache
|
---|
41 | design parameters to be varied so that effects can be isolated. They
|
---|
42 | are cheaper than hardware monitoring and do not require access to or
|
---|
43 | the existence of the machine being studied. Simulation results can be
|
---|
44 | obtained in many situations where analytic model solutions are
|
---|
45 | intractable without questionable simplifying assumptions. Further,
|
---|
46 | there does not currently exist any generally accepted model for program
|
---|
47 | behavior, let alone one that is suitable for cache evaluation; work-
|
---|
48 | loads in trace-driven simulation are represented by samples of real
|
---|
49 | workloads and contain complex embedded correlations that synthetic
|
---|
50 | workloads often lack. Lastly, a trace-driven simulation is guaranteed
|
---|
51 | to be representative of at least one program in execution.
|
---|
52 |
|
---|
53 | _d_i_n_e_r_o_I_I_I reads trace input in _d_i_n format from _s_t_d_i_n. A _d_i_n record is
|
---|
54 | two-tuple _l_a_b_e_l _a_d_d_r_e_s_s. Each line of the trace file must contain one
|
---|
55 | _d_i_n record. The rest of the line is ignored so that comments can be
|
---|
56 | included in the trace file.
|
---|
57 |
|
---|
58 | The _l_a_b_e_l gives the access type of a reference.
|
---|
59 |
|
---|
60 | 0 read data.
|
---|
61 | 1 write data.
|
---|
62 | 2 instruction fetch.
|
---|
63 | 3 escape record (treated as unknown access type).
|
---|
64 | 4 escape record (causes cache flush).
|
---|
65 |
|
---|
66 | The _a_d_d_r_e_s_s is a hexadecimal byte-address between 0 and ffffffff inclu-
|
---|
67 | sively.
|
---|
68 |
|
---|
69 | Cache parameters are set by command line options. Parameters
|
---|
70 | _b_l_o_c_k___s_i_z_e and either _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e or both _d_a_t_a___c_a_c_h_e___s_i_z_e and
|
---|
71 | _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e must be specified. Other parameters are
|
---|
72 | optional. The suffixes _K, _M and _G multiply numbers by 1024, 1024^2 and
|
---|
73 | 1024^3, respectively.
|
---|
74 |
|
---|
75 | The following command line options are available:
|
---|
76 |
|
---|
77 | --bb _b_l_o_c_k___s_i_z_e
|
---|
78 | sets the cache block size in bytes. Must be explicitly set
|
---|
79 | (e.g. -b16).
|
---|
80 |
|
---|
81 | --uu _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e
|
---|
82 | sets the unified cache size in bytes (e.g., -u16K). A unified
|
---|
83 | cache, also called a mixed cache, caches both data and instruc-
|
---|
84 | tions. If _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e is positive, both _i_n_s_t_r_u_c_-
|
---|
85 | _t_i_o_n___c_a_c_h_e___s_i_z_e and _d_a_t_a___c_a_c_h_e___s_i_z_e must be zero. If zero,
|
---|
86 | implying separate instruction and data caches will be simulated,
|
---|
87 | both _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e and _d_a_t_a___c_a_c_h_e___s_i_z_e must be set to
|
---|
88 | positive values. Defaults to 0.
|
---|
89 |
|
---|
90 | --ii _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e
|
---|
91 | sets the instruction cache size in bytes (e.g. -i16384).
|
---|
92 | Defaults to 0 indicating a unified cache simulation. If posi-
|
---|
93 | tive, the _d_a_t_a___c_a_c_h_e___s_i_z_e must be positive as well.
|
---|
94 |
|
---|
95 | --dd _d_a_t_a___c_a_c_h_e___s_i_z_e
|
---|
96 | sets the data cache size in bytes (e.g. -d1M). Defaults to 0
|
---|
97 | indicating a unified cache simulation. If positive, the
|
---|
98 | _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e must be positive as well.
|
---|
99 |
|
---|
100 | --SS _s_u_b_b_l_o_c_k___s_i_z_e
|
---|
101 | sets the cache sub-block size in bytes. Defaults to 0 indicat-
|
---|
102 | ing that sub-block placement is not being used (i.e. -S0).
|
---|
103 |
|
---|
104 | --aa _a_s_s_o_c_i_a_t_i_v_i_t_y
|
---|
105 | sets the cache associativity. A direct-mapped cache has asso-
|
---|
106 | ciativity 1. A two-way set-associative cache has associativity
|
---|
107 | 2. A fully associative cache has associativity
|
---|
108 | _d_a_t_a___c_a_c_h_e___s_i_z_e_/_b_l_o_c_k___s_i_z_e. Defaults to direct-mapped placement
|
---|
109 | (i.e. -a1).
|
---|
110 |
|
---|
111 | --rr _r_e_p_l_a_c_e_m_e_n_t___p_o_l_i_c_y
|
---|
112 | sets the cache replacement policy. Valid replacement policies
|
---|
113 | are _l (LRU), _f (FIFO), and _r (RANDOM). Defaults to LRU (i.e.
|
---|
114 | -rl).
|
---|
115 |
|
---|
116 | --ff _f_e_t_c_h___p_o_l_i_c_y
|
---|
117 | sets the cache fetch policy. Demand-fetch (_d), which fetches
|
---|
118 | blocks that are needed to service a cache reference, is the most
|
---|
119 | common fetch policy. All other fetch policies are methods of
|
---|
120 | prefetching. Prefetching is never done after writes. The
|
---|
121 | prefetch target is determined by the --pp option and whether sub-
|
---|
122 | block placement is enabled.
|
---|
123 |
|
---|
124 | d demand-fetch which never prefetches.
|
---|
125 | a always-prefetch which prefetches after every demand ref-
|
---|
126 | erence.
|
---|
127 | m miss-prefetch which prefetches after every demand miss.
|
---|
128 | t tagged-prefetch which prefetches after the first demand
|
---|
129 | miss to a (sub)-block. The next two prefetch options work only
|
---|
130 | with sub-block placement.
|
---|
131 | l load-forward-prefetch (sub-block placement only) works
|
---|
132 | like prefetch-always within a block, but it will not attempt to
|
---|
133 | prefetch sub-blocks in other blocks.
|
---|
134 | S sub-block-prefetch (sub-block placement only) works like
|
---|
135 | prefetch-always within a block except when references near the
|
---|
136 | end of a block. At this point sub-block-prefetches references
|
---|
137 | will wrap around within the current block.
|
---|
138 |
|
---|
139 | Defaults to demand-fetch (i.e. -fd).
|
---|
140 |
|
---|
141 | --pp _p_r_e_f_e_t_c_h___d_i_s_t_a_n_c_e
|
---|
142 | sets the prefetch distance in sub-blocks if sub-block placement
|
---|
143 | is enabled or in blocks if it is not. A prefetch_distance of 1
|
---|
144 | means that the next sequential (sub)-block is the potential tar-
|
---|
145 | get of a prefetch. Defaults to 1 (i.e. -p1).
|
---|
146 |
|
---|
147 | --PP _a_b_o_r_t___p_r_e_f_e_t_c_h___p_e_r_c_e_n_t
|
---|
148 | sets the percentage of prefetches that are aborted. This can be
|
---|
149 | used to examine the effects of data references blocking prefetch
|
---|
150 | references from reaching a shared cache. Defaults to no
|
---|
151 | prefetches aborted (i.e. -P0).
|
---|
152 |
|
---|
153 | --ww _w_r_i_t_e___p_o_l_i_c_y
|
---|
154 | selects one of two the cache write policies. Write-through (_w)
|
---|
155 | updates main memory on all writes. Copy-back (_c) updates main
|
---|
156 | memory only when a dirty block is replaced or the cache is
|
---|
157 | flushed. Defaults to copy-back (i.e. -wc)
|
---|
158 |
|
---|
159 | --AA _w_r_i_t_e___a_l_l_o_c_a_t_i_o_n___p_o_l_i_c_y
|
---|
160 | selects whether a (sub)-block is loaded on a write reference.
|
---|
161 | Write-allocate (_w) causes (sub)-blocks to be loaded on all ref-
|
---|
162 | erences that miss. Non-write-allocate (_n) causes (sub)-blocks
|
---|
163 | to be loaded only on non-write references that miss. Defaults
|
---|
164 | to write-allocate (i.e. -Aw).
|
---|
165 |
|
---|
166 | --DD _d_e_b_u_g___f_l_a_g
|
---|
167 | used by implementor to debug simulator. A debug_flag of _0 dis-
|
---|
168 | ables debugging; _1 prints the priority stacks after every refer-
|
---|
169 | ence; and _2 prints the priority stacks and performance metrics
|
---|
170 | after every reference. Debugging information may be useful to
|
---|
171 | the user to understand the precise meaning of all cache parame-
|
---|
172 | ter settings. Defaults to no-debug (i.e. -D0).
|
---|
173 |
|
---|
174 | --oo _o_u_t_p_u_t___s_t_y_l_e
|
---|
175 | sets the output style. Terse-output (_0) prints results only at
|
---|
176 | the end of the simulation run. Verbose-output (_1) prints
|
---|
177 | results at half-million reference increments and at the end of
|
---|
178 | the simulation run. Bus-output (_2) prints an output record for
|
---|
179 | every memory bus transfer. Bus_and_snoop-output (_3) prints an
|
---|
180 | output record for every memory bus transfer and clean sub-block
|
---|
181 | that is replaced. Defaults to terse-output (i.e. -o0). For
|
---|
182 | bus-output, each bus record is a six-tuple:
|
---|
183 |
|
---|
184 | _B_U_S_2 are four literal characters to start bus record
|
---|
185 | _a_c_c_e_s_s is the access type ( _r for a bus-read, _w for a bus-write,
|
---|
186 | _p for a bus-prefetch, _s for snoop activity (output style 3
|
---|
187 | only).
|
---|
188 | _s_i_z_e is the transfer size in bytes
|
---|
189 | _a_d_d_r_e_s_s is a hexadecimal byte-address between 0 and ffffffff
|
---|
190 | inclusively
|
---|
191 | _r_e_f_e_r_e_n_c_e___c_o_u_n_t is the number of demand references since the
|
---|
192 | last bus transfer
|
---|
193 | _i_n_s_t_r_u_c_t_i_o_n___c_o_u_n_t is the number of demand instruction fetches
|
---|
194 | since the last bus transfer
|
---|
195 |
|
---|
196 | --ZZ _s_k_i_p___c_o_u_n_t
|
---|
197 | sets the number of trace references to be skipped before begin-
|
---|
198 | ning cache simulation. Defaults to none (i.e. -Z0).
|
---|
199 |
|
---|
200 | --zz _m_a_x_i_m_u_m___c_o_u_n_t
|
---|
201 | sets the maximum number of trace references to be processed
|
---|
202 | after skipping the trace references specified by _s_k_i_p___c_o_u_n_t _.
|
---|
203 | Note, references generated by the simulator not read from the
|
---|
204 | trace (e.g. prefetch references) are not included in this count.
|
---|
205 | Defaults to 10 million (i.e. -z10000000).
|
---|
206 |
|
---|
207 | --QQ _f_l_u_s_h___c_o_u_n_t
|
---|
208 | sets the number of references between cache flushes. Can be
|
---|
209 | used to crudely simulate multiprogramming. Defaults to no
|
---|
210 | flushing (i.e. -Q0).
|
---|
211 |
|
---|
212 | FFIILLEESS
|
---|
213 | _d_o_c_._h contains additional programmer documentation.
|
---|
214 |
|
---|
215 | SSEEEE AALLSSOO
|
---|
216 | Mark D. Hill and Alan Jay Smith, _E_x_p_e_r_i_m_e_n_t_a_l _E_v_a_l_u_a_t_i_o_n _o_f _O_n_-_C_h_i_p
|
---|
217 | _M_i_c_r_o_p_r_o_c_e_s_s_o_r _C_a_c_h_e _M_e_m_o_r_i_e_s, _P_r_o_c_. _E_l_e_v_e_n_t_h _I_n_t_e_r_n_a_t_i_o_n_a_l _S_y_m_p_o_s_i_u_m
|
---|
218 | _o_n _C_o_m_p_u_t_e_r _A_r_c_h_i_t_e_c_t_u_r_e, June 1984, Ann Arbor, MI.
|
---|
219 |
|
---|
220 | Alan Jay Smith, _C_a_c_h_e _M_e_m_o_r_i_e_s, _C_o_m_p_u_t_i_n_g _S_u_r_v_e_y_s, 14-3, September
|
---|
221 | 1982.
|
---|
222 |
|
---|
223 | BBUUGGSS
|
---|
224 | Not all combination of options have been thoroughly tested.
|
---|
225 |
|
---|
226 | AAUUTTHHOORR
|
---|
227 | Mark D. Hill
|
---|
228 | Computer Sciences Dept.
|
---|
229 | 1210 West Dayton St.
|
---|
230 | Univ. of Wisconsin
|
---|
231 | Madison, WI 53706
|
---|
232 |
|
---|
233 | markhill@cs.wisc.edu
|
---|
234 |
|
---|
235 |
|
---|
236 |
|
---|
237 |
|
---|
238 |
|
---|
239 | 4th Berkeley Distribution DINEROIII()
|
---|