| 1 | DINEROIII() DINEROIII()
|
---|
| 2 |
|
---|
| 3 |
|
---|
| 4 |
|
---|
| 5 | NNAAMMEE
|
---|
| 6 | dineroIII - cache simulator, version III
|
---|
| 7 |
|
---|
| 8 | SSYYNNOOPPSSIISS
|
---|
| 9 | ddiinneerrooIIIIII -b block_size -u unified_cache_size -i instruction_cache_size
|
---|
| 10 | -d data_cache_size [ other_options ]
|
---|
| 11 |
|
---|
| 12 | DDEESSCCRRIIPPTTIIOONN
|
---|
| 13 | _d_i_n_e_r_o_I_I_I is a trace-driven cache simulator that supports sub-block
|
---|
| 14 | placement. Simulation results are determined by the input trace and
|
---|
| 15 | the cache parameters. A trace is a finite sequence of memory refer-
|
---|
| 16 | ences usually obtained by the interpretive execution of a program or
|
---|
| 17 | set of programs. Trace input is read by the simulator in _d_i_n format
|
---|
| 18 | (described later). Cache parameters, e.g. block size and associativ-
|
---|
| 19 | ity, are set with command line options (also described later).
|
---|
| 20 | _d_i_n_e_r_o_I_I_I uses the priority stack method of memory hierarchy simulation
|
---|
| 21 | to increase flexibility and improve simulator performance in highly
|
---|
| 22 | associative caches. One can simulate either a unified cache (mixed,
|
---|
| 23 | data and instructions cached together) or separate instruction and data
|
---|
| 24 | caches. This version of _d_i_n_e_r_o_I_I_I does not permit the simultaneous
|
---|
| 25 | simulation of multiple alternative caches.
|
---|
| 26 |
|
---|
| 27 | _d_i_n_e_r_o_I_I_I differs from most other cache simulators because it supports
|
---|
| 28 | sub-block placement (also known as sector placement) in which address
|
---|
| 29 | tags are still associated with cache blocks but data is transferred to
|
---|
| 30 | and from the cache in smaller sub-blocks. This organization is espe-
|
---|
| 31 | cially useful for on-chip microprocessor caches which have to load data
|
---|
| 32 | on cache misses over a limited number of pins. In traditional cache
|
---|
| 33 | design, this constraint leads to small blocks. Unfortunately, a cache
|
---|
| 34 | with small block devotes much more on-chip RAM to address tags than
|
---|
| 35 | does one with large blocks. Sub-block placement allows a cache to have
|
---|
| 36 | small sub-blocks for fast data transfer and large blocks to associate
|
---|
| 37 | with address tags for efficient use of on-chip RAM.
|
---|
| 38 |
|
---|
| 39 | Trace-driven simulation is frequently used to evaluating memory hierar-
|
---|
| 40 | chy performance. These simulations are repeatable and allow cache
|
---|
| 41 | design parameters to be varied so that effects can be isolated. They
|
---|
| 42 | are cheaper than hardware monitoring and do not require access to or
|
---|
| 43 | the existence of the machine being studied. Simulation results can be
|
---|
| 44 | obtained in many situations where analytic model solutions are
|
---|
| 45 | intractable without questionable simplifying assumptions. Further,
|
---|
| 46 | there does not currently exist any generally accepted model for program
|
---|
| 47 | behavior, let alone one that is suitable for cache evaluation; work-
|
---|
| 48 | loads in trace-driven simulation are represented by samples of real
|
---|
| 49 | workloads and contain complex embedded correlations that synthetic
|
---|
| 50 | workloads often lack. Lastly, a trace-driven simulation is guaranteed
|
---|
| 51 | to be representative of at least one program in execution.
|
---|
| 52 |
|
---|
| 53 | _d_i_n_e_r_o_I_I_I reads trace input in _d_i_n format from _s_t_d_i_n. A _d_i_n record is
|
---|
| 54 | two-tuple _l_a_b_e_l _a_d_d_r_e_s_s. Each line of the trace file must contain one
|
---|
| 55 | _d_i_n record. The rest of the line is ignored so that comments can be
|
---|
| 56 | included in the trace file.
|
---|
| 57 |
|
---|
| 58 | The _l_a_b_e_l gives the access type of a reference.
|
---|
| 59 |
|
---|
| 60 | 0 read data.
|
---|
| 61 | 1 write data.
|
---|
| 62 | 2 instruction fetch.
|
---|
| 63 | 3 escape record (treated as unknown access type).
|
---|
| 64 | 4 escape record (causes cache flush).
|
---|
| 65 |
|
---|
| 66 | The _a_d_d_r_e_s_s is a hexadecimal byte-address between 0 and ffffffff inclu-
|
---|
| 67 | sively.
|
---|
| 68 |
|
---|
| 69 | Cache parameters are set by command line options. Parameters
|
---|
| 70 | _b_l_o_c_k___s_i_z_e and either _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e or both _d_a_t_a___c_a_c_h_e___s_i_z_e and
|
---|
| 71 | _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e must be specified. Other parameters are
|
---|
| 72 | optional. The suffixes _K, _M and _G multiply numbers by 1024, 1024^2 and
|
---|
| 73 | 1024^3, respectively.
|
---|
| 74 |
|
---|
| 75 | The following command line options are available:
|
---|
| 76 |
|
---|
| 77 | --bb _b_l_o_c_k___s_i_z_e
|
---|
| 78 | sets the cache block size in bytes. Must be explicitly set
|
---|
| 79 | (e.g. -b16).
|
---|
| 80 |
|
---|
| 81 | --uu _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e
|
---|
| 82 | sets the unified cache size in bytes (e.g., -u16K). A unified
|
---|
| 83 | cache, also called a mixed cache, caches both data and instruc-
|
---|
| 84 | tions. If _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e is positive, both _i_n_s_t_r_u_c_-
|
---|
| 85 | _t_i_o_n___c_a_c_h_e___s_i_z_e and _d_a_t_a___c_a_c_h_e___s_i_z_e must be zero. If zero,
|
---|
| 86 | implying separate instruction and data caches will be simulated,
|
---|
| 87 | both _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e and _d_a_t_a___c_a_c_h_e___s_i_z_e must be set to
|
---|
| 88 | positive values. Defaults to 0.
|
---|
| 89 |
|
---|
| 90 | --ii _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e
|
---|
| 91 | sets the instruction cache size in bytes (e.g. -i16384).
|
---|
| 92 | Defaults to 0 indicating a unified cache simulation. If posi-
|
---|
| 93 | tive, the _d_a_t_a___c_a_c_h_e___s_i_z_e must be positive as well.
|
---|
| 94 |
|
---|
| 95 | --dd _d_a_t_a___c_a_c_h_e___s_i_z_e
|
---|
| 96 | sets the data cache size in bytes (e.g. -d1M). Defaults to 0
|
---|
| 97 | indicating a unified cache simulation. If positive, the
|
---|
| 98 | _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e must be positive as well.
|
---|
| 99 |
|
---|
| 100 | --SS _s_u_b_b_l_o_c_k___s_i_z_e
|
---|
| 101 | sets the cache sub-block size in bytes. Defaults to 0 indicat-
|
---|
| 102 | ing that sub-block placement is not being used (i.e. -S0).
|
---|
| 103 |
|
---|
| 104 | --aa _a_s_s_o_c_i_a_t_i_v_i_t_y
|
---|
| 105 | sets the cache associativity. A direct-mapped cache has asso-
|
---|
| 106 | ciativity 1. A two-way set-associative cache has associativity
|
---|
| 107 | 2. A fully associative cache has associativity
|
---|
| 108 | _d_a_t_a___c_a_c_h_e___s_i_z_e_/_b_l_o_c_k___s_i_z_e. Defaults to direct-mapped placement
|
---|
| 109 | (i.e. -a1).
|
---|
| 110 |
|
---|
| 111 | --rr _r_e_p_l_a_c_e_m_e_n_t___p_o_l_i_c_y
|
---|
| 112 | sets the cache replacement policy. Valid replacement policies
|
---|
| 113 | are _l (LRU), _f (FIFO), and _r (RANDOM). Defaults to LRU (i.e.
|
---|
| 114 | -rl).
|
---|
| 115 |
|
---|
| 116 | --ff _f_e_t_c_h___p_o_l_i_c_y
|
---|
| 117 | sets the cache fetch policy. Demand-fetch (_d), which fetches
|
---|
| 118 | blocks that are needed to service a cache reference, is the most
|
---|
| 119 | common fetch policy. All other fetch policies are methods of
|
---|
| 120 | prefetching. Prefetching is never done after writes. The
|
---|
| 121 | prefetch target is determined by the --pp option and whether sub-
|
---|
| 122 | block placement is enabled.
|
---|
| 123 |
|
---|
| 124 | d demand-fetch which never prefetches.
|
---|
| 125 | a always-prefetch which prefetches after every demand ref-
|
---|
| 126 | erence.
|
---|
| 127 | m miss-prefetch which prefetches after every demand miss.
|
---|
| 128 | t tagged-prefetch which prefetches after the first demand
|
---|
| 129 | miss to a (sub)-block. The next two prefetch options work only
|
---|
| 130 | with sub-block placement.
|
---|
| 131 | l load-forward-prefetch (sub-block placement only) works
|
---|
| 132 | like prefetch-always within a block, but it will not attempt to
|
---|
| 133 | prefetch sub-blocks in other blocks.
|
---|
| 134 | S sub-block-prefetch (sub-block placement only) works like
|
---|
| 135 | prefetch-always within a block except when references near the
|
---|
| 136 | end of a block. At this point sub-block-prefetches references
|
---|
| 137 | will wrap around within the current block.
|
---|
| 138 |
|
---|
| 139 | Defaults to demand-fetch (i.e. -fd).
|
---|
| 140 |
|
---|
| 141 | --pp _p_r_e_f_e_t_c_h___d_i_s_t_a_n_c_e
|
---|
| 142 | sets the prefetch distance in sub-blocks if sub-block placement
|
---|
| 143 | is enabled or in blocks if it is not. A prefetch_distance of 1
|
---|
| 144 | means that the next sequential (sub)-block is the potential tar-
|
---|
| 145 | get of a prefetch. Defaults to 1 (i.e. -p1).
|
---|
| 146 |
|
---|
| 147 | --PP _a_b_o_r_t___p_r_e_f_e_t_c_h___p_e_r_c_e_n_t
|
---|
| 148 | sets the percentage of prefetches that are aborted. This can be
|
---|
| 149 | used to examine the effects of data references blocking prefetch
|
---|
| 150 | references from reaching a shared cache. Defaults to no
|
---|
| 151 | prefetches aborted (i.e. -P0).
|
---|
| 152 |
|
---|
| 153 | --ww _w_r_i_t_e___p_o_l_i_c_y
|
---|
| 154 | selects one of two the cache write policies. Write-through (_w)
|
---|
| 155 | updates main memory on all writes. Copy-back (_c) updates main
|
---|
| 156 | memory only when a dirty block is replaced or the cache is
|
---|
| 157 | flushed. Defaults to copy-back (i.e. -wc)
|
---|
| 158 |
|
---|
| 159 | --AA _w_r_i_t_e___a_l_l_o_c_a_t_i_o_n___p_o_l_i_c_y
|
---|
| 160 | selects whether a (sub)-block is loaded on a write reference.
|
---|
| 161 | Write-allocate (_w) causes (sub)-blocks to be loaded on all ref-
|
---|
| 162 | erences that miss. Non-write-allocate (_n) causes (sub)-blocks
|
---|
| 163 | to be loaded only on non-write references that miss. Defaults
|
---|
| 164 | to write-allocate (i.e. -Aw).
|
---|
| 165 |
|
---|
| 166 | --DD _d_e_b_u_g___f_l_a_g
|
---|
| 167 | used by implementor to debug simulator. A debug_flag of _0 dis-
|
---|
| 168 | ables debugging; _1 prints the priority stacks after every refer-
|
---|
| 169 | ence; and _2 prints the priority stacks and performance metrics
|
---|
| 170 | after every reference. Debugging information may be useful to
|
---|
| 171 | the user to understand the precise meaning of all cache parame-
|
---|
| 172 | ter settings. Defaults to no-debug (i.e. -D0).
|
---|
| 173 |
|
---|
| 174 | --oo _o_u_t_p_u_t___s_t_y_l_e
|
---|
| 175 | sets the output style. Terse-output (_0) prints results only at
|
---|
| 176 | the end of the simulation run. Verbose-output (_1) prints
|
---|
| 177 | results at half-million reference increments and at the end of
|
---|
| 178 | the simulation run. Bus-output (_2) prints an output record for
|
---|
| 179 | every memory bus transfer. Bus_and_snoop-output (_3) prints an
|
---|
| 180 | output record for every memory bus transfer and clean sub-block
|
---|
| 181 | that is replaced. Defaults to terse-output (i.e. -o0). For
|
---|
| 182 | bus-output, each bus record is a six-tuple:
|
---|
| 183 |
|
---|
| 184 | _B_U_S_2 are four literal characters to start bus record
|
---|
| 185 | _a_c_c_e_s_s is the access type ( _r for a bus-read, _w for a bus-write,
|
---|
| 186 | _p for a bus-prefetch, _s for snoop activity (output style 3
|
---|
| 187 | only).
|
---|
| 188 | _s_i_z_e is the transfer size in bytes
|
---|
| 189 | _a_d_d_r_e_s_s is a hexadecimal byte-address between 0 and ffffffff
|
---|
| 190 | inclusively
|
---|
| 191 | _r_e_f_e_r_e_n_c_e___c_o_u_n_t is the number of demand references since the
|
---|
| 192 | last bus transfer
|
---|
| 193 | _i_n_s_t_r_u_c_t_i_o_n___c_o_u_n_t is the number of demand instruction fetches
|
---|
| 194 | since the last bus transfer
|
---|
| 195 |
|
---|
| 196 | --ZZ _s_k_i_p___c_o_u_n_t
|
---|
| 197 | sets the number of trace references to be skipped before begin-
|
---|
| 198 | ning cache simulation. Defaults to none (i.e. -Z0).
|
---|
| 199 |
|
---|
| 200 | --zz _m_a_x_i_m_u_m___c_o_u_n_t
|
---|
| 201 | sets the maximum number of trace references to be processed
|
---|
| 202 | after skipping the trace references specified by _s_k_i_p___c_o_u_n_t _.
|
---|
| 203 | Note, references generated by the simulator not read from the
|
---|
| 204 | trace (e.g. prefetch references) are not included in this count.
|
---|
| 205 | Defaults to 10 million (i.e. -z10000000).
|
---|
| 206 |
|
---|
| 207 | --QQ _f_l_u_s_h___c_o_u_n_t
|
---|
| 208 | sets the number of references between cache flushes. Can be
|
---|
| 209 | used to crudely simulate multiprogramming. Defaults to no
|
---|
| 210 | flushing (i.e. -Q0).
|
---|
| 211 |
|
---|
| 212 | FFIILLEESS
|
---|
| 213 | _d_o_c_._h contains additional programmer documentation.
|
---|
| 214 |
|
---|
| 215 | SSEEEE AALLSSOO
|
---|
| 216 | Mark D. Hill and Alan Jay Smith, _E_x_p_e_r_i_m_e_n_t_a_l _E_v_a_l_u_a_t_i_o_n _o_f _O_n_-_C_h_i_p
|
---|
| 217 | _M_i_c_r_o_p_r_o_c_e_s_s_o_r _C_a_c_h_e _M_e_m_o_r_i_e_s, _P_r_o_c_. _E_l_e_v_e_n_t_h _I_n_t_e_r_n_a_t_i_o_n_a_l _S_y_m_p_o_s_i_u_m
|
---|
| 218 | _o_n _C_o_m_p_u_t_e_r _A_r_c_h_i_t_e_c_t_u_r_e, June 1984, Ann Arbor, MI.
|
---|
| 219 |
|
---|
| 220 | Alan Jay Smith, _C_a_c_h_e _M_e_m_o_r_i_e_s, _C_o_m_p_u_t_i_n_g _S_u_r_v_e_y_s, 14-3, September
|
---|
| 221 | 1982.
|
---|
| 222 |
|
---|
| 223 | BBUUGGSS
|
---|
| 224 | Not all combination of options have been thoroughly tested.
|
---|
| 225 |
|
---|
| 226 | AAUUTTHHOORR
|
---|
| 227 | Mark D. Hill
|
---|
| 228 | Computer Sciences Dept.
|
---|
| 229 | 1210 West Dayton St.
|
---|
| 230 | Univ. of Wisconsin
|
---|
| 231 | Madison, WI 53706
|
---|
| 232 |
|
---|
| 233 | markhill@cs.wisc.edu
|
---|
| 234 |
|
---|
| 235 |
|
---|
| 236 |
|
---|
| 237 |
|
---|
| 238 |
|
---|
| 239 | 4th Berkeley Distribution DINEROIII()
|
---|