Student: Rick van der Zwet - 0433373 > Question 1: Estimate the space to store 10^10 float numbers, using the > following cases, take in count to minimize the storage costs, while > preserving the full number accuracy. I take a float to be single precision (called float in C language) as the definition of float. This occupied 32 bits (4 bytes), and has got a significand precision of 24 bits (7 decimal digits) source: http://en.wikipedia.org/wiki/IEEE_754-1985 To store integers of max 10^10, there is a need of a 64 bit integer (also called long) source: http://en.wikipedia.org/wiki/Integer_%28computer_science%29 > a) We store the float numbers together with their integer index as > characters in a flat CSV file. Each line will look like: ,. The max numbers of characters of will be 11, will take up 7 and 1 extra as delimiter. Extra on every line will be 2, due to the delimiters. So 1 line will contain 11 + 8 + 2 = 21 characters. In total 10^10 * 21. If you choose to use the line number as index number you will only need 8 +1 * 10^10 characters, which is roughly (1 char = 4 bytes) 36 * 10^10 > b) We store the numbers together with their index in a MySQL database. > Make your choice between FLOAT, DOUBLE types for the numbers and > SIGNED/UNSIGNED TINYINT indeces. UNSIGNED is the best choice as we are not dealing with negative numbers, the index is by default set in a mysql table as it it used to refer to the data. A FLOAT takes up 4 bytes a DOUBLE 8 bytes. So roughly in total it takes up (8 + 1) * 10^10 bytes. You could enlarge your FLOAT to gain more persision, or make it general to make sure you will only start loose presition after 53 digits. source: http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html source: http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html > c) We store the numbers in a binary format of your own together with > indeces. After storing all numbers in just the simplest format as posible, float numbers next after eachother with a delimiter in between,would make it most effient. > Question 2: Giving the following cases, make some rough estimations the > time necessary to compute 10^10 association tests. Results in days,years are rounded. Case | each test in sec | Total/sec | Total/days | Total/years --------|---------------------|-----------|------------|------------ a) | 1 | 10^10 | 115.740 | 317 b) | 0.1 | 10^9 | 11.574 | 32 c) | 0.01 | 10^8 | 1.157 | 3 d) | 0.001 | 10^7 | 116 | 0 > Question 3: Create a user-space envirionment for the free scientific > computation envirionment R. The shell script attached will create the envirionment. Commented parts shows what's currently going on. I have had a create help of the README.txt and INSTALL.txt which came with the software packages