#/* Author      : Rick van der Zwet
# * S-number    : 0433373
# * Version     : $Id: README.txt 438 2007-12-28 02:53:47Z rick $
# * Copyright   : FreeBSD Licence
# * Description : Assignment documentation
# */

= Index =
* Preface
* Methodology
* Design decisions
* Directory stucture
* Programs structure
* Definitions
* Configurations
** Memory
** Cache
* BUS2 Output
* Usage / Running
* Conclusion
* Recommendationn


= Preface =
Over here you will find the results of assigment 3 -internally also
called assignment 4- of Computer architicture. The main purpose of the
assignment where to determine what the bank conflicts and memory
bandwith where in certain configurations. Cache simulation is done by
using dinero. To calculate the result several C programs had to be
written.

= Methodology =
The only need of the dinero output will be the the BUS2 (-o2) output ,
no need anything else. As perl, awk, grep and others are not allowed, a
small custom C program called grep-bus2 is written. Shell calculate.sh
will determine the correct cache options variables and will call the
proper compu with the memory configuration.

= Design decisions =
* As we are working in 'debugging' a standard program I assume every call
  to the memory will fetch 32 bits (4bytes)
* No cache will be simulated in dinero using 1 word cache, which might
  fake results while accessing multiple times the same memory cell
  (highly unlikely however)
* Shell script will allow output to set benchmark file


= Directory structure =
Makefile              = GNU Make config file
compu.c               = calculate bandwidth and bank conflicts
common.[ch]           = Common functions
memory_std.[ch]       = Standard memory
memory_bank.[ch]      = Bank memory
memory_dram.[ch]      = DRAM memory
calculate.sh          = Shell script to generate results of combinations
develop.sh            = Very simple script to keep running will coding to
                        ensure continues feedback
data                  = directory of the traces
data/lisp.002.din     = provided input
data/spic.002.din     = provided input
docs                  = Some additional documentation
docs/dineroIII.txt    = man page
src/dineroIII.tar.gz  = dineroIII source

= Programs structure =
compu.c will be the logic of choosing the correct memory module to use.
All memory implementations are defined in memory_<type>.[ch]. To avoid double
coding a 'interface' common.c is defined which includes the common
functions mainly outputs.


= Definitions =
The bandwidth of a cache/memory-system is the total number of bytes send
between CPU and cache divided by the number of cycles. We assume here
that memory activity is the bottleneck for the enire system in other
words: that the cache is continuously busy processing requests.

A bankconflict is defined as follows:

For normal memory, a bankconflict is a request that is sent to a bank
that is still busy handling the previous request (either in the access
phase or in the bus transfer phase). The first memory acces incurs no
conflict.

In page-mode DRAM we do not use multiple banks, so the above definition
is not so useful. In the case interpret the following as a bankconflict:
a request for a page (=column) in which the previous request was not for
the same page. The first access here also does not incur a conflict.

= Configurations =

== Memory ==
1)  standard memory with random access time of 8 clock cycles
2)  4-bank word-interleaved memory with random access time of 8 clock
    cycles
3)  8-bank word-interleaved memory with random access time of 8 clock
    cycles
4)  page-mode DRAM with a page-size of 64 words, a random access time of
    8 clock cycles and a 'next access time' of 3 clock cycles.
5)  page-mode DRAM with a page-size of 1024 words, a random access time
    of 8 clock cycles and a 'next access time' of 2 clock cycles.


== Cache ==
a)  no cache, a write buffer of 1 word deep
b)  a 64 KB, unified, direct-mapped, write-through, no write-allocate
    cache with 4 word blocks and a 1 word write buffer
c)  a 64 KB, unified, direct-mapped, write-back, write-allocate cache
    with 4 word blocks and a 1 word write buffer

= Assumptions =
== Given ==
*   All adddresses in this assignment are word-aligned and all
    data-accesses to the cache are 1 word.
*   Per clock cycl, 1 request for 1 word can be handled, but requests
    should remain in order.
*   The time for submitting the requests does not have to be taken into
    consideration
*   If a memory access is requested to the same bank or row of a busy
    memory part it is implied to be to adifferent address
*   When accessing memory, the same 'checking' cycle will be used as
    initial memory call cycle
*   DRAM memory will also have RAS time of 1, all memory will have a bus
    and no need to disticts between all of them
*   Calculating Byte transfer between memory and cache is trivial,
    number of lines * bytes every line
*   Both read and writes are treaded the same, no optimizations are made
    to ensure simplicity
*   Code is not build to be optimized, but to be clear instead, which
    will result in 'dumb' loops
*   With DRAM worst case senario is used, meaning a call to a locked
*   memory adress or diffent page will block for RAS seconds


= BUS2 Output =
BUS2 <type> <size> <adddress> <reference_count> <instruction_count>
*   BUS2 are four literal characters to start bus record access is the access
*   type ( r for a bus-read, w for a bus-write, p for a bus-prefetch, s for
    snoop activity (output style 3 only).
*   size is the transfer size in bytes
*   address is a hexadecimal byte-address between 0 and ffffffff
    inclusively
*   reference_count is the number of demand references since the
    last bus transfer (i.e. cache misses)
*   instruction_count is the number of demand instruction fetches
    since the last bus transfer

= Usage / Running =
# Alter calculate.sh to speficy the right dinero binary path
# Build binaries
$ make
# Call calculate.sh with proper argument for datafile
$ sh calculate.sh <path-to-datafile>
# Result will be posted to stderr and to the files res-xx.txt
# Rerun will simplely overwrite all the old data


= Conclusion =
  It really does depend which type of cache and memory to use. A lot of
  assumptions where made and many generalisations where made. In order
  to have the systems work better with eachother the system must need to
  have good kownleage of the underlaying implementation in hardware. It
  seems that the output traces lisp and spic has been optimized for the
  use of class C cause and type word memory. It's also pretty clear that
  without any cause the system will not preform at all and will just
  suffer from the really slow memory.


= Recommendations =
*   As the input is text and only simple calculations are beeing used, allowing
    to use a interperted like perl, python or else would be pretty handy
*   /usr/local/edu/data does not exists this should be ~csca/edu/data
*   The 'large files' are tar gzipped 100k, 300k ;-)
*   Please specify which version of dinero to use and were the program needs to
    run on
*   dineroIII does not exists online anymore and has been replaced by dineroIV,
    compiling also fails for dineroIII on more modern systems, quick and dirty
    fix patch: http://rickvanderzwet.nl/svn/data/liacs/ca/opdr3/src/
*   Submit does not include an email where to submit to
*   Translate 'Het gaat om' part Assignment