source: liacs/os/assignment7/README@ 405

Last change on this file since 405 was 2, checked in by Rick van der Zwet, 15 years ago

Initial import of data of old repository ('data') worth keeping (e.g. tracking
means of URL access statistics)

File size: 2.9 KB
RevLine 
[2]1Student: Rick van der Zwet - 0433373
2
3> Question 1: Estimate the space to store 10^10 float numbers, using the
4> following cases, take in count to minimize the storage costs, while
5> preserving the full number accuracy.
6
7I take a float to be single precision (called float in C language) as
8the definition of float. This occupied 32 bits (4 bytes), and has got
9a significand precision of 24 bits (7 decimal digits)
10source: http://en.wikipedia.org/wiki/IEEE_754-1985
11To store integers of max 10^10, there is a need of a 64 bit integer
12(also called long)
13source: http://en.wikipedia.org/wiki/Integer_%28computer_science%29
14
15> a) We store the float numbers together with their integer index as
16> characters in a flat CSV file.
17
18Each line will look like: <index>,<float><newline>. The max numbers
19of characters of <index> will be 11, <float> will take up 7 and 1 extra
20as delimiter. Extra on every line will be 2, due to the delimiters. So 1
21line will contain 11 + 8 + 2 = 21 characters. In total 10^10 * 21. If
22you choose to use the line number as index number you will only need 8
23+1 * 10^10 characters, which is roughly (1 char = 4 bytes) 36 * 10^10
24
25> b) We store the numbers together with their index in a MySQL database.
26> Make your choice between FLOAT, DOUBLE types for the numbers and
27> SIGNED/UNSIGNED TINYINT indeces.
28
29UNSIGNED is the best choice as we are not dealing with negative
30numbers, the index is by default set in a mysql table as it it used to
31refer to the data. A FLOAT takes up 4 bytes a DOUBLE 8 bytes. So roughly
32in total it takes up (8 + 1) * 10^10 bytes. You could enlarge your FLOAT
33to gain more persision, or make it general to make sure you will only
34start loose presition after 53 digits.
35source: http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html
36source: http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
37
38> c) We store the numbers in a binary format of your own together with
39> indeces.
40
41After storing all numbers in just the simplest format as posible,
42float numbers next after eachother with a delimiter in between,would
43make it most effient.
44
45
46> Question 2: Giving the following cases, make some rough estimations the
47> time necessary to compute 10^10 association tests.
48
49Results in days,years are rounded.
50Case | each test in sec | Total/sec | Total/days | Total/years
51--------|---------------------|-----------|------------|------------
52a) | 1 | 10^10 | 115.740 | 317
53b) | 0.1 | 10^9 | 11.574 | 32
54c) | 0.01 | 10^8 | 1.157 | 3
55d) | 0.001 | 10^7 | 116 | 0
56
57> Question 3: Create a user-space envirionment for the free scientific
58> computation envirionment R.
59
60The shell script attached will create the envirionment. Commented parts
61shows what's currently going on. I have had a create help of the
62README.txt and INSTALL.txt which came with the software packages
63
64
Note: See TracBrowser for help on using the repository browser.