[2] | 1 | Student: Rick van der Zwet - 0433373
|
---|
| 2 |
|
---|
| 3 | > Question 1: Estimate the space to store 10^10 float numbers, using the
|
---|
| 4 | > following cases, take in count to minimize the storage costs, while
|
---|
| 5 | > preserving the full number accuracy.
|
---|
| 6 |
|
---|
| 7 | I take a float to be single precision (called float in C language) as
|
---|
| 8 | the definition of float. This occupied 32 bits (4 bytes), and has got
|
---|
| 9 | a significand precision of 24 bits (7 decimal digits)
|
---|
| 10 | source: http://en.wikipedia.org/wiki/IEEE_754-1985
|
---|
| 11 | To store integers of max 10^10, there is a need of a 64 bit integer
|
---|
| 12 | (also called long)
|
---|
| 13 | source: http://en.wikipedia.org/wiki/Integer_%28computer_science%29
|
---|
| 14 |
|
---|
| 15 | > a) We store the float numbers together with their integer index as
|
---|
| 16 | > characters in a flat CSV file.
|
---|
| 17 |
|
---|
| 18 | Each line will look like: <index>,<float><newline>. The max numbers
|
---|
| 19 | of characters of <index> will be 11, <float> will take up 7 and 1 extra
|
---|
| 20 | as delimiter. Extra on every line will be 2, due to the delimiters. So 1
|
---|
| 21 | line will contain 11 + 8 + 2 = 21 characters. In total 10^10 * 21. If
|
---|
| 22 | you choose to use the line number as index number you will only need 8
|
---|
| 23 | +1 * 10^10 characters, which is roughly (1 char = 4 bytes) 36 * 10^10
|
---|
| 24 |
|
---|
| 25 | > b) We store the numbers together with their index in a MySQL database.
|
---|
| 26 | > Make your choice between FLOAT, DOUBLE types for the numbers and
|
---|
| 27 | > SIGNED/UNSIGNED TINYINT indeces.
|
---|
| 28 |
|
---|
| 29 | UNSIGNED is the best choice as we are not dealing with negative
|
---|
| 30 | numbers, the index is by default set in a mysql table as it it used to
|
---|
| 31 | refer to the data. A FLOAT takes up 4 bytes a DOUBLE 8 bytes. So roughly
|
---|
| 32 | in total it takes up (8 + 1) * 10^10 bytes. You could enlarge your FLOAT
|
---|
| 33 | to gain more persision, or make it general to make sure you will only
|
---|
| 34 | start loose presition after 53 digits.
|
---|
| 35 | source: http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html
|
---|
| 36 | source: http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
|
---|
| 37 |
|
---|
| 38 | > c) We store the numbers in a binary format of your own together with
|
---|
| 39 | > indeces.
|
---|
| 40 |
|
---|
| 41 | After storing all numbers in just the simplest format as posible,
|
---|
| 42 | float numbers next after eachother with a delimiter in between,would
|
---|
| 43 | make it most effient.
|
---|
| 44 |
|
---|
| 45 |
|
---|
| 46 | > Question 2: Giving the following cases, make some rough estimations the
|
---|
| 47 | > time necessary to compute 10^10 association tests.
|
---|
| 48 |
|
---|
| 49 | Results in days,years are rounded.
|
---|
| 50 | Case | each test in sec | Total/sec | Total/days | Total/years
|
---|
| 51 | --------|---------------------|-----------|------------|------------
|
---|
| 52 | a) | 1 | 10^10 | 115.740 | 317
|
---|
| 53 | b) | 0.1 | 10^9 | 11.574 | 32
|
---|
| 54 | c) | 0.01 | 10^8 | 1.157 | 3
|
---|
| 55 | d) | 0.001 | 10^7 | 116 | 0
|
---|
| 56 |
|
---|
| 57 | > Question 3: Create a user-space envirionment for the free scientific
|
---|
| 58 | > computation envirionment R.
|
---|
| 59 |
|
---|
| 60 | The shell script attached will create the envirionment. Commented parts
|
---|
| 61 | shows what's currently going on. I have had a create help of the
|
---|
| 62 | README.txt and INSTALL.txt which came with the software packages
|
---|
| 63 |
|
---|
| 64 |
|
---|