source: liacs/dbdm/dbdm_5/report.tex@ 71

Last change on this file since 71 was 71, checked in by Rick van der Zwet, 15 years ago

LaTeX framework

File size: 4.2 KB
RevLine 
[71]1\documentclass{report}
2
3\title{Databases and Data Mining --- Assignment 5}
4\author{Rick van der Zwet}
5
6
7\begin{document}
8\newcommand{\question}[2]{\begin{quotation}\noindent{\Large``}#1. #2{\Large''}\end{quotation}}
9\maketitle
10
11\question{1a}{Propose an algorithm, in pseudo-code, for the following: The automatic
12generation of a concept hierarchy for numerical data based on the equal-width
13partitioning rule.}
14
15\question{1b}{Propose an algorithm, in pseudo-code, for the following: The automatic
16generation of a concept hierarchy for numerical data based on the equal-frequency
17partitioning rule.}
18
19\question{2}{Suppose that a data warehouse consists of four dimensions,
20 date, spectator, location, and game, and two measures, count, and charge, where
21 charge is the fare that a spectator pays when watching a game on a given date.
22 Spectators may be students, adults, or seniors, with each category having its own
23 charge rate.}
24\question{2a}{Draw a star schema diagram for the data warehouse.}
25\question{2b}{Starting with the base cuboid [date, spectator, location, game], what specific
26 OLAP operations should one perform in order to list the total charge paid by
27 student spectators at GM\_Place 2004?}
28\question{2c}{Bitmap indexing is useful in data warehousing. Taking this cube as an example,
29 briefly discuss advantages and problems of using a bitmap index structure.}
30
31\question{3}{A data cube, C, has n dimensions, and each dimension
32 has exactly p distinct values in the base cuboid. Assume that there are no concept
33 hierarchies associated with the dimensions. What is the maximum number of cells
34 possible (including both base cells and aggregate cells) in the data cube, C?}
35
36\question{4}{The Apriori algorithm uses prior knowledge of subset support properties.}
37\question{4a}{Prove that all nonempty subsets of a frequent itemset must also be frequent.}
38\question{4b}{Given frequent itemset l and subset s of l, prove that the confidence of the rule “s’
39 => (l-s’)” cannot be more than the confidence of the rule “s => (l – s)” where s’
40 is a subset of s.}
41\question{5}{An optimization in frequent item set mining is mining closed patterns, or mining max
42 patterns instead. Describe the main differences of mining closed patterns and mining
43 max patterns.}
44
45\question{6}{The price of each item in a store is nonnegative. For
46each of the following cases, identify the kinds of constraint they represent (e.g.
47antimonotonic, monotonic, succinct) and briefly discuss how to mine such association
48rules efficiently:}
49
50\question{6a}{Containing one free item and other items the sum of whose prices is at least \$190.}
51\question{6b}{Where the average price of all the items is between \$120 and \$520.}
52
53\question{7}{Suppose a city has installed hundreds of surveillance cameras at strategic locations in
54busy streets. All the captured video (each stream being 640 pixels x 480 pixels, 24-
55bits (RGB) per pixel, 25fps) is streamed to a central observation post in real time.
56Describe an efficient system that would allow real time data mining and continuously
57querying these video streams for abnormal events, such as explosions, people in
58panic, etc.. Also discuss the computational costs and the memory requirements of
59your system.}
60\question{8}{A flight data warehouse for a travel agent consists of six
61dimensions: traveler, departure (city), departure\_time, arrival (city), arrival\_time,
62and flight; and two measures count, and avg\_fare, where avg\_fare stores the concrete
63fare at the lowest level but the average fare at the other levels. Suppose the cube is
64fully materialized. Starting with the base cuboid [traveler, departure, departure\_time,
65arrival, arrival\_time, flight], what specific OLAP operations (e.g roll-up flight to
66airline, etc.) should one perform in order to list the average fare per month for each
67business traveler who flies American Airlines (AA) from Los Angeles (LA) in the
68year 2007?}
69\question{9}{In graph mining what would be the advantage of the described apriori-based approach
70over the pattern growth based approach (see lecture slides) and vice versa.}
71
72\bibliography{report}
73
74\end{document}
Note: See TracBrowser for help on using the repository browser.