- Timestamp:
- Jan 31, 2010, 10:08:58 PM (15 years ago)
- Location:
- liacs/dbdm
- Files:
-
- 12 added
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
liacs/dbdm/dbdm_5/report.tex
r72 r73 1 1 \documentclass{report} 2 \usepackage{graphicx} 2 3 3 4 \title{Databases and Data Mining --- Assignment 5} … … 6 7 7 8 \begin{document} 8 \newcommand{\question}[2]{\begin{quotation}\noindent{\Large``}#1. #2{\Large''}\end{quotation}}9 \newcommand{\question}[2]{\begin{quotation}\noindent{\Large``}#1. \textit{#2}{\Large''}\end{quotation}} 9 10 \maketitle 10 11 … … 19 20 interval_width = (max - min) / num_intervals 20 21 21 output = [] # Output array, where the values of interval k is stored in value output[k] 22 output = [] # Output array, where the values of interval k 23 # is stored in value output[k] 24 22 25 for value in input: 23 26 interval = value / interval_width # Find it's correct bin 24 27 output[interval].append(value) # Put the value inside the bin 25 28 endfor 26 27 29 \end{verbatim} 28 30 … … 38 40 sorted_input = sorted(input) # Sort items on value from small to large 39 41 40 output = [] # Output array, where the values of interval k is stored in value output[k] 42 output = [] # Output array, where the values of interval k 43 # is stored in value output[k] 41 44 42 45 interval = 0 … … 60 63 charge rate.} 61 64 \question{2a}{Draw a star schema diagram for the data warehouse.} 65 See figure~\ref{fig:star}. 62 66 % http://it.toolbox.com/blogs/enterprise-solutions/star-schema-modelling-data-warehouse-20803 67 \begin{figure}[htp] 68 \centering 69 \includegraphics[scale=0.6]{star-diagram.eps} 70 \caption{Star schema diagram 2a data warehose} 71 \label{fig:star} 72 \end{figure} 63 73 64 74 \question{2b}{Starting with the base cuboid [date, spectator, location, game], … … 125 135 \question{6}{The price of each item in a store is nonnegative. For 126 136 each of the following cases, identify the kinds of constraint they represent (e.g. 127 anti monotonic, monotonic, succinct) and briefly discuss how to mine such association137 anti-monotonic, monotonic, succinct) and briefly discuss how to mine such association 128 138 rules efficiently:} 129 130 139 % www.cs.sfu.ca/CC/741/jpei/slides/ConstrainedFrequentPatterns.pdf 131 132 140 \question{6a}{Containing one free item and other items the sum of whose prices is at least \$190.} 141 This is \emph{monotonic}, as adding items will never lower the sum of prices. 142 Mining this will be bestly done by first finding the free items and next order 143 the other items based on price, starting with the most expensive ones first. 144 The soon you get to \$190 can al it's supersets to matching as well, next you 145 remove the last item and try to match up again. Continue this till you removed 146 the most expensive item and tried to match up again. Next \emph{JOIN} with all 147 the free items and the list is complete. 148 133 149 \question{6b}{Where the average price of all the items is between \$120 and \$520.} 150 This is convertible \emph{anti-monotonic} and convertible \emph{monotonic} if 151 you look at one contrain. \emph{anti-monotonic} If itemset $S$ violates the 152 sub-constraint $avg(S.Price) \le \$520$. so does every itemset with $S$ in 153 prefix item value descending order. \emph{monotonic} If itemset $S$ constraint 154 $avg(S.Price) \ge \$120$ so does every itemset having S as prefix item values 155 descending order. 156 157 Satifing both conditions how-ever does make it neither \emph{anti-monotonic} 158 nor \emph{monotonic}. A fast way to generate this set is to use the algoritm 159 used in 6a but modify the number constraint on-the-fly. Like average of 3 items 160 between a centrain range (120 -- 520) It the same of checking whether the sum 161 of 3 items is between ($3*120$ -- $3*520$). 134 162 135 163 \question{7}{Suppose a city has installed hundreds of surveillance cameras at strategic locations in … … 141 169 your system.} 142 170 143 No compression, pre-processing, abnomality detection using heuristics, 144 un-compressed storage for direct retrival. compressed storage for long term 145 storage. 171 Every camera generates $640 * 480 * 24 bits * 25fps = 180.000 kbit/sec \approx 172 21.9 MByte/sec$. I would first use a compression algoritm to send-the-data over 173 the wire by sending the changes only and not the full frame every time. Secondly 174 applying abnomality detections using heuristics. Keeping the detected 175 abnormalities available un-compressed and uncut and save all other data in a 176 much lower resolution (1fps), with a compression ratio of 4:1 this would make 177 storage stream of approx 19GB/camera/day. Processing power required depends on 178 the algoritms used, but a 500Mhz CPU/camera should fit the bill when simple 179 algoritms are used. 146 180 147 181 \question{8}{A flight data warehouse for a travel agent consists of six … … 155 189 year 2007?} 156 190 191 Note bit tricky as we use the departure\_time to determine the year and 192 arrival\_time to determine the month of the flight. When these values are not 193 set to the same day we might run into inconsistencies. 194 195 196 \begin{verbatim} 197 * Roll-up departure\_time to 'Year' 198 * Roll-up flight to 'Airline' 199 * Dice departure = 'LA' and departure_time = '2007' and flight = 'American Airlines' 200 * Roll-up traveler to 'Consumer/Business' 201 * Roll-up arrival_time to 'Month' 202 * Slice traveler = 'Business' 203 \end{verbatim} 157 204 158 205 \question{9}{In graph mining what would be the advantage of the described apriori-based approach 159 206 over the pattern growth based approach (see lecture slides) and vice versa.} 207 The advantage comes in the simplicity of the algotitm when it get to 208 computations needed and it's setup, but this does not make it effient. 209 210 On the other hand apriori uses the costly process of candidate generation and 211 testing, making it more 'heavy' then pattern grow based. and also avoid costly 212 database scans. 160 213 161 214 % http://en.wikipedia.org/wiki/Association_rule_learning
Note:
See TracChangeset
for help on using the changeset viewer.