Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

dineroIII.txt@ 43

Last change on this file since 43 was 2, checked in by Rick van der Zwet, 15 years ago
Initial import of data of old repository ('data') worth keeping (e.g. tracking means of URL access statistics)
File size: 13.6 KB

Line
1	DINEROIII() DINEROIII()
2
3
4
5	NNAAMMEE
6	dineroIII - cache simulator, version III
7
8	SSYYNNOOPPSSIISS
9	ddiinneerrooIIIIII -b block_size -u unified_cache_size -i instruction_cache_size
10	-d data_cache_size [ other_options ]
11
12	DDEESSCCRRIIPPTTIIOONN
13	_d_i_n_e_r_o_I_I_I is a trace-driven cache simulator that supports sub-block
14	placement. Simulation results are determined by the input trace and
15	the cache parameters. A trace is a finite sequence of memory refer-
16	ences usually obtained by the interpretive execution of a program or
17	set of programs. Trace input is read by the simulator in _d_i_n format
18	(described later). Cache parameters, e.g. block size and associativ-
19	ity, are set with command line options (also described later).
20	_d_i_n_e_r_o_I_I_I uses the priority stack method of memory hierarchy simulation
21	to increase flexibility and improve simulator performance in highly
22	associative caches. One can simulate either a unified cache (mixed,
23	data and instructions cached together) or separate instruction and data
24	caches. This version of _d_i_n_e_r_o_I_I_I does not permit the simultaneous
25	simulation of multiple alternative caches.
26
27	_d_i_n_e_r_o_I_I_I differs from most other cache simulators because it supports
28	sub-block placement (also known as sector placement) in which address
29	tags are still associated with cache blocks but data is transferred to
30	and from the cache in smaller sub-blocks. This organization is espe-
31	cially useful for on-chip microprocessor caches which have to load data
32	on cache misses over a limited number of pins. In traditional cache
33	design, this constraint leads to small blocks. Unfortunately, a cache
34	with small block devotes much more on-chip RAM to address tags than
35	does one with large blocks. Sub-block placement allows a cache to have
36	small sub-blocks for fast data transfer and large blocks to associate
37	with address tags for efficient use of on-chip RAM.
38
39	Trace-driven simulation is frequently used to evaluating memory hierar-
40	chy performance. These simulations are repeatable and allow cache
41	design parameters to be varied so that effects can be isolated. They
42	are cheaper than hardware monitoring and do not require access to or
43	the existence of the machine being studied. Simulation results can be
44	obtained in many situations where analytic model solutions are
45	intractable without questionable simplifying assumptions. Further,
46	there does not currently exist any generally accepted model for program
47	behavior, let alone one that is suitable for cache evaluation; work-
48	loads in trace-driven simulation are represented by samples of real
49	workloads and contain complex embedded correlations that synthetic
50	workloads often lack. Lastly, a trace-driven simulation is guaranteed
51	to be representative of at least one program in execution.
52
53	_d_i_n_e_r_o_I_I_I reads trace input in _d_i_n format from _s_t_d_i_n. A _d_i_n record is
54	two-tuple _l_a_b_e_l _a_d_d_r_e_s_s. Each line of the trace file must contain one
55	_d_i_n record. The rest of the line is ignored so that comments can be
56	included in the trace file.
57
58	The _l_a_b_e_l gives the access type of a reference.
59
60	0 read data.
61	1 write data.
62	2 instruction fetch.
63	3 escape record (treated as unknown access type).
64	4 escape record (causes cache flush).
65
66	The _a_d_d_r_e_s_s is a hexadecimal byte-address between 0 and ffffffff inclu-
67	sively.
68
69	Cache parameters are set by command line options. Parameters
70	_b_l_o_c_k___s_i_z_e and either _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e or both _d_a_t_a___c_a_c_h_e___s_i_z_e and
71	_i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e must be specified. Other parameters are
72	optional. The suffixes _K, _M and _G multiply numbers by 1024, 1024^2 and
73	1024^3, respectively.
74
75	The following command line options are available:
76
77	--bb _b_l_o_c_k___s_i_z_e
78	sets the cache block size in bytes. Must be explicitly set
79	(e.g. -b16).
80
81	--uu _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e
82	sets the unified cache size in bytes (e.g., -u16K). A unified
83	cache, also called a mixed cache, caches both data and instruc-
84	tions. If _u_n_i_f_i_e_d___c_a_c_h_e___s_i_z_e is positive, both _i_n_s_t_r_u_c_-
85	_t_i_o_n___c_a_c_h_e___s_i_z_e and _d_a_t_a___c_a_c_h_e___s_i_z_e must be zero. If zero,
86	implying separate instruction and data caches will be simulated,
87	both _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e and _d_a_t_a___c_a_c_h_e___s_i_z_e must be set to
88	positive values. Defaults to 0.
89
90	--ii _i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e
91	sets the instruction cache size in bytes (e.g. -i16384).
92	Defaults to 0 indicating a unified cache simulation. If posi-
93	tive, the _d_a_t_a___c_a_c_h_e___s_i_z_e must be positive as well.
94
95	--dd _d_a_t_a___c_a_c_h_e___s_i_z_e
96	sets the data cache size in bytes (e.g. -d1M). Defaults to 0
97	indicating a unified cache simulation. If positive, the
98	_i_n_s_t_r_u_c_t_i_o_n___c_a_c_h_e___s_i_z_e must be positive as well.
99
100	--SS _s_u_b_b_l_o_c_k___s_i_z_e
101	sets the cache sub-block size in bytes. Defaults to 0 indicat-
102	ing that sub-block placement is not being used (i.e. -S0).
103
104	--aa _a_s_s_o_c_i_a_t_i_v_i_t_y
105	sets the cache associativity. A direct-mapped cache has asso-
106	ciativity 1. A two-way set-associative cache has associativity
107	2. A fully associative cache has associativity
108	_d_a_t_a___c_a_c_h_e___s_i_z_e_/_b_l_o_c_k___s_i_z_e. Defaults to direct-mapped placement
109	(i.e. -a1).
110
111	--rr _r_e_p_l_a_c_e_m_e_n_t___p_o_l_i_c_y
112	sets the cache replacement policy. Valid replacement policies
113	are _l (LRU), _f (FIFO), and _r (RANDOM). Defaults to LRU (i.e.
114	-rl).
115
116	--ff _f_e_t_c_h___p_o_l_i_c_y
117	sets the cache fetch policy. Demand-fetch (_d), which fetches
118	blocks that are needed to service a cache reference, is the most
119	common fetch policy. All other fetch policies are methods of
120	prefetching. Prefetching is never done after writes. The
121	prefetch target is determined by the --pp option and whether sub-
122	block placement is enabled.
123
124	d demand-fetch which never prefetches.
125	a always-prefetch which prefetches after every demand ref-
126	erence.
127	m miss-prefetch which prefetches after every demand miss.
128	t tagged-prefetch which prefetches after the first demand
129	miss to a (sub)-block. The next two prefetch options work only
130	with sub-block placement.
131	l load-forward-prefetch (sub-block placement only) works
132	like prefetch-always within a block, but it will not attempt to
133	prefetch sub-blocks in other blocks.
134	S sub-block-prefetch (sub-block placement only) works like
135	prefetch-always within a block except when references near the
136	end of a block. At this point sub-block-prefetches references
137	will wrap around within the current block.
138
139	Defaults to demand-fetch (i.e. -fd).
140
141	--pp _p_r_e_f_e_t_c_h___d_i_s_t_a_n_c_e
142	sets the prefetch distance in sub-blocks if sub-block placement
143	is enabled or in blocks if it is not. A prefetch_distance of 1
144	means that the next sequential (sub)-block is the potential tar-
145	get of a prefetch. Defaults to 1 (i.e. -p1).
146
147	--PP _a_b_o_r_t___p_r_e_f_e_t_c_h___p_e_r_c_e_n_t
148	sets the percentage of prefetches that are aborted. This can be
149	used to examine the effects of data references blocking prefetch
150	references from reaching a shared cache. Defaults to no
151	prefetches aborted (i.e. -P0).
152
153	--ww _w_r_i_t_e___p_o_l_i_c_y
154	selects one of two the cache write policies. Write-through (_w)
155	updates main memory on all writes. Copy-back (_c) updates main
156	memory only when a dirty block is replaced or the cache is
157	flushed. Defaults to copy-back (i.e. -wc)
158
159	--AA _w_r_i_t_e___a_l_l_o_c_a_t_i_o_n___p_o_l_i_c_y
160	selects whether a (sub)-block is loaded on a write reference.
161	Write-allocate (_w) causes (sub)-blocks to be loaded on all ref-
162	erences that miss. Non-write-allocate (_n) causes (sub)-blocks
163	to be loaded only on non-write references that miss. Defaults
164	to write-allocate (i.e. -Aw).
165
166	--DD _d_e_b_u_g___f_l_a_g
167	used by implementor to debug simulator. A debug_flag of _0 dis-
168	ables debugging; _1 prints the priority stacks after every refer-
169	ence; and _2 prints the priority stacks and performance metrics
170	after every reference. Debugging information may be useful to
171	the user to understand the precise meaning of all cache parame-
172	ter settings. Defaults to no-debug (i.e. -D0).
173
174	--oo _o_u_t_p_u_t___s_t_y_l_e
175	sets the output style. Terse-output (_0) prints results only at
176	the end of the simulation run. Verbose-output (_1) prints
177	results at half-million reference increments and at the end of
178	the simulation run. Bus-output (_2) prints an output record for
179	every memory bus transfer. Bus_and_snoop-output (_3) prints an
180	output record for every memory bus transfer and clean sub-block
181	that is replaced. Defaults to terse-output (i.e. -o0). For
182	bus-output, each bus record is a six-tuple:
183
184	_B_U_S_2 are four literal characters to start bus record
185	_a_c_c_e_s_s is the access type ( _r for a bus-read, _w for a bus-write,
186	_p for a bus-prefetch, _s for snoop activity (output style 3
187	only).
188	_s_i_z_e is the transfer size in bytes
189	_a_d_d_r_e_s_s is a hexadecimal byte-address between 0 and ffffffff
190	inclusively
191	_r_e_f_e_r_e_n_c_e___c_o_u_n_t is the number of demand references since the
192	last bus transfer
193	_i_n_s_t_r_u_c_t_i_o_n___c_o_u_n_t is the number of demand instruction fetches
194	since the last bus transfer
195
196	--ZZ _s_k_i_p___c_o_u_n_t
197	sets the number of trace references to be skipped before begin-
198	ning cache simulation. Defaults to none (i.e. -Z0).
199
200	--zz _m_a_x_i_m_u_m___c_o_u_n_t
201	sets the maximum number of trace references to be processed
202	after skipping the trace references specified by _s_k_i_p___c_o_u_n_t _.
203	Note, references generated by the simulator not read from the
204	trace (e.g. prefetch references) are not included in this count.
205	Defaults to 10 million (i.e. -z10000000).
206
207	--QQ _f_l_u_s_h___c_o_u_n_t
208	sets the number of references between cache flushes. Can be
209	used to crudely simulate multiprogramming. Defaults to no
210	flushing (i.e. -Q0).
211
212	FFIILLEESS
213	_d_o_c_._h contains additional programmer documentation.
214
215	SSEEEE AALLSSOO
216	Mark D. Hill and Alan Jay Smith, _E_x_p_e_r_i_m_e_n_t_a_l _E_v_a_l_u_a_t_i_o_n _o_f _O_n_-_C_h_i_p
217	_M_i_c_r_o_p_r_o_c_e_s_s_o_r _C_a_c_h_e _M_e_m_o_r_i_e_s, _P_r_o_c_. _E_l_e_v_e_n_t_h _I_n_t_e_r_n_a_t_i_o_n_a_l _S_y_m_p_o_s_i_u_m
218	_o_n _C_o_m_p_u_t_e_r _A_r_c_h_i_t_e_c_t_u_r_e, June 1984, Ann Arbor, MI.
219
220	Alan Jay Smith, _C_a_c_h_e _M_e_m_o_r_i_e_s, _C_o_m_p_u_t_i_n_g _S_u_r_v_e_y_s, 14-3, September
221	1982.
222
223	BBUUGGSS
224	Not all combination of options have been thoroughly tested.
225
226	AAUUTTHHOORR
227	Mark D. Hill
228	Computer Sciences Dept.
229	1210 West Dayton St.
230	Univ. of Wisconsin
231	Madison, WI 53706
232
233	markhill@cs.wisc.edu
234
235
236
237
238
239	4th Berkeley Distribution DINEROIII()

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: liacs/ca/opdr3/docs/dineroIII.txt@ 43

Download in other formats: