1 | Student: Rick van der Zwet - 0433373
|
---|
2 |
|
---|
3 | > Question 1: Estimate the space to store 10^10 float numbers, using the
|
---|
4 | > following cases, take in count to minimize the storage costs, while
|
---|
5 | > preserving the full number accuracy.
|
---|
6 |
|
---|
7 | I take a float to be single precision (called float in C language) as
|
---|
8 | the definition of float. This occupied 32 bits (4 bytes), and has got
|
---|
9 | a significand precision of 24 bits (7 decimal digits)
|
---|
10 | source: http://en.wikipedia.org/wiki/IEEE_754-1985
|
---|
11 | To store integers of max 10^10, there is a need of a 64 bit integer
|
---|
12 | (also called long)
|
---|
13 | source: http://en.wikipedia.org/wiki/Integer_%28computer_science%29
|
---|
14 |
|
---|
15 | > a) We store the float numbers together with their integer index as
|
---|
16 | > characters in a flat CSV file.
|
---|
17 |
|
---|
18 | Each line will look like: <index>,<float><newline>. The max numbers
|
---|
19 | of characters of <index> will be 11, <float> will take up 7 and 1 extra
|
---|
20 | as delimiter. Extra on every line will be 2, due to the delimiters. So 1
|
---|
21 | line will contain 11 + 8 + 2 = 21 characters. In total 10^10 * 21. If
|
---|
22 | you choose to use the line number as index number you will only need 8
|
---|
23 | +1 * 10^10 characters, which is roughly (1 char = 4 bytes) 36 * 10^10
|
---|
24 |
|
---|
25 | > b) We store the numbers together with their index in a MySQL database.
|
---|
26 | > Make your choice between FLOAT, DOUBLE types for the numbers and
|
---|
27 | > SIGNED/UNSIGNED TINYINT indeces.
|
---|
28 |
|
---|
29 | UNSIGNED is the best choice as we are not dealing with negative
|
---|
30 | numbers, the index is by default set in a mysql table as it it used to
|
---|
31 | refer to the data. A FLOAT takes up 4 bytes a DOUBLE 8 bytes. So roughly
|
---|
32 | in total it takes up (8 + 1) * 10^10 bytes. You could enlarge your FLOAT
|
---|
33 | to gain more persision, or make it general to make sure you will only
|
---|
34 | start loose presition after 53 digits.
|
---|
35 | source: http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html
|
---|
36 | source: http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
|
---|
37 |
|
---|
38 | > c) We store the numbers in a binary format of your own together with
|
---|
39 | > indeces.
|
---|
40 |
|
---|
41 | After storing all numbers in just the simplest format as posible,
|
---|
42 | float numbers next after eachother with a delimiter in between,would
|
---|
43 | make it most effient.
|
---|
44 |
|
---|
45 |
|
---|
46 | > Question 2: Giving the following cases, make some rough estimations the
|
---|
47 | > time necessary to compute 10^10 association tests.
|
---|
48 |
|
---|
49 | Results in days,years are rounded.
|
---|
50 | Case | each test in sec | Total/sec | Total/days | Total/years
|
---|
51 | --------|---------------------|-----------|------------|------------
|
---|
52 | a) | 1 | 10^10 | 115.740 | 317
|
---|
53 | b) | 0.1 | 10^9 | 11.574 | 32
|
---|
54 | c) | 0.01 | 10^8 | 1.157 | 3
|
---|
55 | d) | 0.001 | 10^7 | 116 | 0
|
---|
56 |
|
---|
57 | > Question 3: Create a user-space envirionment for the free scientific
|
---|
58 | > computation envirionment R.
|
---|
59 |
|
---|
60 | The shell script attached will create the envirionment. Commented parts
|
---|
61 | shows what's currently going on. I have had a create help of the
|
---|
62 | README.txt and INSTALL.txt which came with the software packages
|
---|
63 |
|
---|
64 |
|
---|