Changeset 73 for liacs


Ignore:
Timestamp:
Jan 31, 2010, 10:08:58 PM (15 years ago)
Author:
Rick van der Zwet
Message:

Final result of hard work

Location:
liacs/dbdm
Files:
12 added
1 edited

Legend:

Unmodified
Added
Removed
  • liacs/dbdm/dbdm_5/report.tex

    r72 r73  
    11\documentclass{report}
     2\usepackage{graphicx}
    23
    34\title{Databases and Data Mining --- Assignment 5}
     
    67
    78\begin{document}
    8 \newcommand{\question}[2]{\begin{quotation}\noindent{\Large``}#1. #2{\Large''}\end{quotation}}
     9\newcommand{\question}[2]{\begin{quotation}\noindent{\Large``}#1. \textit{#2}{\Large''}\end{quotation}}
    910\maketitle
    1011
     
    1920interval_width = (max - min) / num_intervals
    2021
    21 output = [] # Output array, where the values of interval k is stored in value output[k]
     22output = [] # Output array, where the values of interval k
     23            # is stored in value output[k]
     24
    2225for value in input:
    2326  interval = value / interval_width # Find it's correct bin
    2427  output[interval].append(value)    # Put the value inside the bin
    2528endfor
    26 
    2729\end{verbatim}
    2830
     
    3840sorted_input = sorted(input) # Sort items on value from small to large
    3941
    40 output = [] # Output array, where the values of interval k is stored in value output[k]
     42output = [] # Output array, where the values of interval k
     43            # is stored in value output[k]
    4144
    4245interval = 0
     
    6063    charge rate.}
    6164\question{2a}{Draw a star schema diagram for the data warehouse.}
     65See figure~\ref{fig:star}.
    6266% http://it.toolbox.com/blogs/enterprise-solutions/star-schema-modelling-data-warehouse-20803
     67\begin{figure}[htp]
     68\centering
     69\includegraphics[scale=0.6]{star-diagram.eps}
     70\caption{Star schema diagram 2a data warehose}
     71\label{fig:star}
     72\end{figure}
    6373
    6474\question{2b}{Starting with the base cuboid [date, spectator, location, game],
     
    125135\question{6}{The price of each item in a store is nonnegative. For
    126136each of the following cases, identify the kinds of constraint they represent (e.g.
    127 antimonotonic, monotonic, succinct) and briefly discuss how to mine such association
     137anti-monotonic, monotonic, succinct) and briefly discuss how to mine such association
    128138rules efficiently:}
    129 
    130139% www.cs.sfu.ca/CC/741/jpei/slides/ConstrainedFrequentPatterns.pdf
    131 
    132140\question{6a}{Containing one free item and other items the sum of whose prices is at least \$190.}
     141This is \emph{monotonic}, as adding items will never lower the sum of prices.
     142Mining this will be bestly done by first finding the free items and next order
     143the other items based on price, starting with the most expensive ones first.
     144The soon you get to \$190 can al it's supersets to matching as well, next you
     145remove the last item and try to match up again. Continue this till you removed
     146the most expensive item and tried to match up again. Next \emph{JOIN} with all
     147the free items and the list is complete.
     148
    133149\question{6b}{Where the average price of all the items is between \$120 and \$520.}
     150This is convertible \emph{anti-monotonic} and convertible \emph{monotonic} if
     151you look at one contrain.  \emph{anti-monotonic} If itemset $S$ violates the
     152sub-constraint $avg(S.Price) \le \$520$. so does every itemset with $S$ in
     153prefix item value descending order.  \emph{monotonic} If itemset $S$ constraint
     154$avg(S.Price) \ge \$120$ so does every itemset having S as prefix item values
     155descending order.
     156
     157Satifing both conditions how-ever does make it neither \emph{anti-monotonic}
     158nor \emph{monotonic}. A fast way to generate this set is to use the algoritm
     159used in 6a but modify the number constraint on-the-fly. Like average of 3 items
     160between a centrain range (120 -- 520) It the same of checking whether the sum
     161of 3 items is between ($3*120$ -- $3*520$).
    134162
    135163\question{7}{Suppose a city has installed hundreds of surveillance cameras at strategic locations in
     
    141169your system.}
    142170
    143 No compression, pre-processing, abnomality detection using heuristics,
    144 un-compressed storage for direct retrival. compressed storage for long term
    145 storage.
     171Every camera generates $640 *  480 * 24 bits * 25fps = 180.000 kbit/sec \approx
     17221.9 MByte/sec$. I would first use a compression algoritm to send-the-data over
     173the wire by sending the changes only and not the full frame every time. Secondly
     174applying abnomality detections using heuristics. Keeping the detected
     175abnormalities available un-compressed and uncut and save all other data in a
     176much lower resolution (1fps), with a compression ratio of 4:1 this would make
     177storage stream of approx 19GB/camera/day. Processing power required depends on
     178the algoritms used, but a 500Mhz CPU/camera should fit the bill when simple
     179algoritms are used.
    146180
    147181\question{8}{A flight data warehouse for a travel agent consists of six
     
    155189year 2007?}
    156190
     191Note bit tricky as we use the departure\_time to determine the year and
     192arrival\_time to determine the month of the flight. When these values are not
     193set to the same day we might run into inconsistencies.
     194
     195
     196\begin{verbatim}
     197* Roll-up departure\_time to 'Year'
     198* Roll-up flight to 'Airline'
     199* Dice departure = 'LA' and departure_time = '2007' and flight = 'American Airlines'
     200* Roll-up traveler to 'Consumer/Business'
     201* Roll-up arrival_time to 'Month'
     202* Slice traveler = 'Business'
     203\end{verbatim}
    157204
    158205\question{9}{In graph mining what would be the advantage of the described apriori-based approach
    159206over the pattern growth based approach (see lecture slides) and vice versa.}
     207The advantage comes in the simplicity of the algotitm when it get to
     208computations needed and it's setup, but this does not make it effient.
     209
     210On the other hand apriori uses the costly process of candidate generation and
     211testing, making it more 'heavy' then pattern grow based. and also avoid costly
     212database scans.
    160213
    161214% http://en.wikipedia.org/wiki/Association_rule_learning
Note: See TracChangeset for help on using the changeset viewer.