1 | \documentclass{report}
|
---|
2 |
|
---|
3 | \title{Databases and Data Mining --- Assignment 5}
|
---|
4 | \author{Rick van der Zwet}
|
---|
5 |
|
---|
6 |
|
---|
7 | \begin{document}
|
---|
8 | \newcommand{\question}[2]{\begin{quotation}\noindent{\Large``}#1. #2{\Large''}\end{quotation}}
|
---|
9 | \maketitle
|
---|
10 |
|
---|
11 | \question{1a}{Propose an algorithm, in pseudo-code, for the following: The automatic
|
---|
12 | generation of a concept hierarchy for numerical data based on the equal-width
|
---|
13 | partitioning rule.}
|
---|
14 |
|
---|
15 | \question{1b}{Propose an algorithm, in pseudo-code, for the following: The automatic
|
---|
16 | generation of a concept hierarchy for numerical data based on the equal-frequency
|
---|
17 | partitioning rule.}
|
---|
18 |
|
---|
19 | \question{2}{Suppose that a data warehouse consists of four dimensions,
|
---|
20 | date, spectator, location, and game, and two measures, count, and charge, where
|
---|
21 | charge is the fare that a spectator pays when watching a game on a given date.
|
---|
22 | Spectators may be students, adults, or seniors, with each category having its own
|
---|
23 | charge rate.}
|
---|
24 | \question{2a}{Draw a star schema diagram for the data warehouse.}
|
---|
25 | \question{2b}{Starting with the base cuboid [date, spectator, location, game], what specific
|
---|
26 | OLAP operations should one perform in order to list the total charge paid by
|
---|
27 | student spectators at GM\_Place 2004?}
|
---|
28 | \question{2c}{Bitmap indexing is useful in data warehousing. Taking this cube as an example,
|
---|
29 | briefly discuss advantages and problems of using a bitmap index structure.}
|
---|
30 |
|
---|
31 | \question{3}{A data cube, C, has n dimensions, and each dimension
|
---|
32 | has exactly p distinct values in the base cuboid. Assume that there are no concept
|
---|
33 | hierarchies associated with the dimensions. What is the maximum number of cells
|
---|
34 | possible (including both base cells and aggregate cells) in the data cube, C?}
|
---|
35 |
|
---|
36 | \question{4}{The Apriori algorithm uses prior knowledge of subset support properties.}
|
---|
37 | \question{4a}{Prove that all nonempty subsets of a frequent itemset must also be frequent.}
|
---|
38 | \question{4b}{Given frequent itemset l and subset s of l, prove that the confidence of the rule âsâ
|
---|
39 | => (l-sâ)â cannot be more than the confidence of the rule âs => (l â s)â where sâ
|
---|
40 | is a subset of s.}
|
---|
41 | \question{5}{An optimization in frequent item set mining is mining closed patterns, or mining max
|
---|
42 | patterns instead. Describe the main differences of mining closed patterns and mining
|
---|
43 | max patterns.}
|
---|
44 |
|
---|
45 | \question{6}{The price of each item in a store is nonnegative. For
|
---|
46 | each of the following cases, identify the kinds of constraint they represent (e.g.
|
---|
47 | antimonotonic, monotonic, succinct) and briefly discuss how to mine such association
|
---|
48 | rules efficiently:}
|
---|
49 |
|
---|
50 | \question{6a}{Containing one free item and other items the sum of whose prices is at least \$190.}
|
---|
51 | \question{6b}{Where the average price of all the items is between \$120 and \$520.}
|
---|
52 |
|
---|
53 | \question{7}{Suppose a city has installed hundreds of surveillance cameras at strategic locations in
|
---|
54 | busy streets. All the captured video (each stream being 640 pixels x 480 pixels, 24-
|
---|
55 | bits (RGB) per pixel, 25fps) is streamed to a central observation post in real time.
|
---|
56 | Describe an efficient system that would allow real time data mining and continuously
|
---|
57 | querying these video streams for abnormal events, such as explosions, people in
|
---|
58 | panic, etc.. Also discuss the computational costs and the memory requirements of
|
---|
59 | your system.}
|
---|
60 | \question{8}{A flight data warehouse for a travel agent consists of six
|
---|
61 | dimensions: traveler, departure (city), departure\_time, arrival (city), arrival\_time,
|
---|
62 | and flight; and two measures count, and avg\_fare, where avg\_fare stores the concrete
|
---|
63 | fare at the lowest level but the average fare at the other levels. Suppose the cube is
|
---|
64 | fully materialized. Starting with the base cuboid [traveler, departure, departure\_time,
|
---|
65 | arrival, arrival\_time, flight], what specific OLAP operations (e.g roll-up flight to
|
---|
66 | airline, etc.) should one perform in order to list the average fare per month for each
|
---|
67 | business traveler who flies American Airlines (AA) from Los Angeles (LA) in the
|
---|
68 | year 2007?}
|
---|
69 | \question{9}{In graph mining what would be the advantage of the described apriori-based approach
|
---|
70 | over the pattern growth based approach (see lecture slides) and vice versa.}
|
---|
71 |
|
---|
72 | \bibliography{report}
|
---|
73 |
|
---|
74 | \end{document}
|
---|