Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Normal
Revision Log

close Warning: Can't use blame annotator:
No changeset 422 in the repository

0433373_rickvanderzwet.txt@ 422

Last change on this file since 422 was 2, checked in by Rick van der Zwet, 15 years ago
Initial import of data of old repository ('data') worth keeping (e.g. tracking means of URL access statistics)
File size: 3.5 KB

Rev	Line
	1	Papers read:
	2	- J. Gray, The Next Database Revolution. SIGMOD 2004, pp. 13-18, June 2004.
	3	- Hans-Peter Kriegel, et al. Future trends in data mining. Data Mining and
	4	Knowledge Discovery [1384-5810] 2007 vol:15 iss:1 pg:87
	5
	6	Current database limits [Gray, 2004] and the current limits in data-mining
	7	[Kriegel, 2007] technology does not solely focus on the technological barriers of
	8	current datasets, with regards to memory, latency and storage. But also focus
	9	on a way on how to process this data efficiently. [Gray, 2004] talks about the
	10	use of interfaces to connect databases to the clients e.g. providing direct
	11	interfaces to the clients using SOAP calls for example. But the use of
	12	distributed databases how not gone mentioned. But first things first:
	13
	14	= Data-mining & Usability =
	15
	16	Data-mining nowadays focus on subset solutions, with less attempt to generalize
	17	the effort for mass-use. The tools and methods provided and used are merely the
	18	building blocks for algorithms with focus on subset solutions with a well known
	19	datasets or a lot of sanitized and known meta-data.
	20
	21	With the ever increasing amount of data gathered and stored, generated human
	22	understandable results (if any result at all) becomes harder and harder. The
	23	underlying technique for generating results is often not to be explained by
	24	logic human reasoning. Making the results hard to justify or even explain,
	25	leaving potential good algorithms and strategies unused.
	26
	27	(Near) future should show us whether we are capable of extracting results which
	28	are of added value to understanding the process instead of showing heuristics,
	29	allowing us to reason further about what is going on inside an process.
	30
	31
	32	= Memory based databases with a file based backend =
	33
	34	Reducing and elimination latency to the database objects on specific media has
	35	been always been a major focus within the design of algorithms of database query
	36	automation. Recent technology inventions and improvements has lead to
	37	developments allowing us to run any average small size database fully into the
	38	memory system. Hence reducing access to every object within the database to a
	39	equal level, making the latency decisions in algorithms obsolete, clearing the
	40	path for a new type of algorithm design focusing of spanning the whole data-set
	41	as fast possible.
	42
	43	Together with a full-memory database, comes the process of designing the
	44	database in such way that it can be mirrored on persistent media for obvious
	45	reasons (power failure, transport, backup, revisions). Instead of taking the
	46	traditional block level disk access approach new disks comes with ability to do
	47	clever queuing and latency reducing actions of file based objects. Future will
	48	show whether block based access (memory database) with a file based storage
	49	will be one of the possibles and how to cope best with large databases sets.
	50
	51	= Distributed databases =
	52
	53	One area not covered by [Kriegel,2007] and [Gray,2004] it the development of
	54	several Peta-bytes datasets (like the genome databases) that needed to be
	55	accessed by many concurrent clients trough out the world, so link-layer latencies
	56	comes in the picture.
	57
	58	Finding ways of enabling this datasets for all clients at an acceptable/uniform
	59	access time it something getting a major importance in the future as datasets
	60	are rapidly growing due to the development of new sensors and image/video based
	61	storage and more of those datasets have a heavily shared nature as more
	62	research and business will be gathering and sharing from multiple (geographical)
	63	locations, but are in need of centralized query interfaces.
	64

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: liacs/dbdm/dbdm_1/0433373_rickvanderzwet.txt@ 422

Download in other formats: