source: liacs/dbdm/dbdm_1/0433373_rickvanderzwet.txt@ 235

Last change on this file since 235 was 2, checked in by Rick van der Zwet, 15 years ago

Initial import of data of old repository ('data') worth keeping (e.g. tracking
means of URL access statistics)

File size: 3.5 KB
Line 
1Papers read:
2 - J. Gray, The Next Database Revolution. SIGMOD 2004, pp. 13-18, June 2004.
3 - Hans-Peter Kriegel, et al. Future trends in data mining. Data Mining and
4 Knowledge Discovery [1384-5810] 2007 vol:15 iss:1 pg:87
5
6Current database limits [Gray, 2004] and the current limits in data-mining
7[Kriegel, 2007] technology does not solely focus on the technological barriers of
8current datasets, with regards to memory, latency and storage. But also focus
9on a way on how to process this data efficiently. [Gray, 2004] talks about the
10use of interfaces to connect databases to the clients e.g. providing direct
11interfaces to the clients using SOAP calls for example. But the use of
12distributed databases how not gone mentioned. But first things first:
13
14= Data-mining & Usability =
15
16Data-mining nowadays focus on subset solutions, with less attempt to generalize
17the effort for mass-use. The tools and methods provided and used are merely the
18building blocks for algorithms with focus on subset solutions with a well known
19datasets or a lot of sanitized and known meta-data.
20
21With the ever increasing amount of data gathered and stored, generated human
22understandable results (if any result at all) becomes harder and harder. The
23underlying technique for generating results is often not to be explained by
24logic human reasoning. Making the results hard to justify or even explain,
25leaving potential good algorithms and strategies unused.
26
27(Near) future should show us whether we are capable of extracting results which
28are of added value to understanding the process instead of showing heuristics,
29allowing us to reason further about what is going on inside an process.
30
31
32= Memory based databases with a file based backend =
33
34Reducing and elimination latency to the database objects on specific media has
35been always been a major focus within the design of algorithms of database query
36automation. Recent technology inventions and improvements has lead to
37developments allowing us to run any average small size database fully into the
38memory system. Hence reducing access to every object within the database to a
39equal level, making the latency decisions in algorithms obsolete, clearing the
40path for a new type of algorithm design focusing of spanning the whole data-set
41as fast possible.
42
43Together with a full-memory database, comes the process of designing the
44database in such way that it can be mirrored on persistent media for obvious
45reasons (power failure, transport, backup, revisions). Instead of taking the
46traditional block level disk access approach new disks comes with ability to do
47clever queuing and latency reducing actions of file based objects. Future will
48show whether block based access (memory database) with a file based storage
49will be one of the possibles and how to cope best with large databases sets.
50
51= Distributed databases =
52
53One area not covered by [Kriegel,2007] and [Gray,2004] it the development of
54several Peta-bytes datasets (like the genome databases) that needed to be
55accessed by many concurrent clients trough out the world, so link-layer latencies
56comes in the picture.
57
58Finding ways of enabling this datasets for all clients at an acceptable/uniform
59access time it something getting a major importance in the future as datasets
60are rapidly growing due to the development of new sensors and image/video based
61storage and more of those datasets have a heavily shared nature as more
62research and business will be gathering and sharing from multiple (geographical)
63locations, but are in need of centralized query interfaces.
64
Note: See TracBrowser for help on using the repository browser.