Context Navigation

← Previous Changeset
Next Changeset →

Changeset 73

Timestamp:

Jan 31, 2010, 10:08:58 PM (15 years ago)

Author:

Rick van der Zwet

Message:

Final result of hard work

Location:

liacs/dbdm

Files:

: 12 added
: 1 edited

dbdm_5/report.pdf (added)
dbdm_5/report.tex (modified) (8 diffs)
dbdm_5/star-diagram.eps (added)
slides (added)
slides/01_dbdm2009_introduction.pdf (added)
slides/02_dbdm2009_databases.pdf (added)
slides/04_dbdm2009_Data Warehouses_OLAP.pdf (added)
slides/05_dbdm2009_Data_Cubes_Computation.pdf (added)
slides/06_dbdm2009_Data Mining.pdf (added)
slides/07_dbdm2009_Data Mining.pdf (added)
slides/08_dbdm2009_Mining_Data_Streams.pdf (added)
slides/09_dbdm2009_Mining_Biological_Data.pdf (added)
slides/09_dbdm2009_Mining_Biological_Data_new.pdf (added)

Legend:

: Unmodified
: Added
: Removed

liacs/dbdm/dbdm_5/report.tex

-              r72
+              r73
 \documentclass{report}
+\usepackage{graphicx}
 \title{Databases and Data Mining --- Assignment 5}
 …
 \begin{document}
 \newcommand{\question}[2]{\begin{quotation}\noindent{\Large``}#1. #2{\Large''}\end{quotation}}
+\newcommand{\question}[2]{\begin{quotation}\noindent{\Large``}#1. \textit{#2}{\Large''}\end{quotation}}
 \maketitle
 …
 interval_width = (max - min) / num_intervals
+output = [] # Output array, where the values of interval k is stored in value output[k]
+output = [] # Output array, where the values of interval k
+            # is stored in value output[k]
 for value in input:
   interval = value / interval_width # Find it's correct bin
   output[interval].append(value)    # Put the value inside the bin
 endfor
 \end{verbatim}
 …
 sorted_input = sorted(input) # Sort items on value from small to large
+output = [] # Output array, where the values of interval k is stored in value output[k]
+output = [] # Output array, where the values of interval k
+            # is stored in value output[k]
 interval = 0
 …
     charge rate.}
 \question{2a}{Draw a star schema diagram for the data warehouse.}
+See figure~\ref{fig:star}.
 % http://it.toolbox.com/blogs/enterprise-solutions/star-schema-modelling-data-warehouse-20803
+\begin{figure}[htp]
+\centering
+\includegraphics[scale=0.6]{star-diagram.eps}
+\caption{Star schema diagram 2a data warehose}
+\label{fig:star}
+\end{figure}
 \question{2b}{Starting with the base cuboid [date, spectator, location, game],
 …
 \question{6}{The price of each item in a store is nonnegative. For
 each of the following cases, identify the kinds of constraint they represent (e.g.
 antimonotonic, monotonic, succinct) and briefly discuss how to mine such association
+anti-monotonic, monotonic, succinct) and briefly discuss how to mine such association
 rules efficiently:}
 % www.cs.sfu.ca/CC/741/jpei/slides/ConstrainedFrequentPatterns.pdf
 \question{6a}{Containing one free item and other items the sum of whose prices is at least \$190.}
+This is \emph{monotonic}, as adding items will never lower the sum of prices.
+Mining this will be bestly done by first finding the free items and next order
+the other items based on price, starting with the most expensive ones first.
+The soon you get to \$190 can al it's supersets to matching as well, next you
+remove the last item and try to match up again. Continue this till you removed
+the most expensive item and tried to match up again. Next \emph{JOIN} with all
+the free items and the list is complete.
 \question{6b}{Where the average price of all the items is between \$120 and \$520.}
+This is convertible \emph{anti-monotonic} and convertible \emph{monotonic} if
+you look at one contrain.  \emph{anti-monotonic} If itemset $S$ violates the
+sub-constraint $avg(S.Price) \le \$520$. so does every itemset with $S$ in
+prefix item value descending order.  \emph{monotonic} If itemset $S$ constraint
+$avg(S.Price) \ge \$120$ so does every itemset having S as prefix item values
+descending order.
+Satifing both conditions how-ever does make it neither \emph{anti-monotonic}
+nor \emph{monotonic}. A fast way to generate this set is to use the algoritm
+used in 6a but modify the number constraint on-the-fly. Like average of 3 items
+between a centrain range (120 -- 520) It the same of checking whether the sum
+of 3 items is between ($3*120$ -- $3*520$).
 \question{7}{Suppose a city has installed hundreds of surveillance cameras at strategic locations in
 …
 your system.}
+No compression, pre-processing, abnomality detection using heuristics,
+un-compressed storage for direct retrival. compressed storage for long term
+storage.
+Every camera generates $640 *  480 * 24 bits * 25fps = 180.000 kbit/sec \approx
+.9 MByte/sec$. I would first use a compression algoritm to send-the-data over
+the wire by sending the changes only and not the full frame every time. Secondly
+applying abnomality detections using heuristics. Keeping the detected
+abnormalities available un-compressed and uncut and save all other data in a
+much lower resolution (1fps), with a compression ratio of 4:1 this would make
+storage stream of approx 19GB/camera/day. Processing power required depends on
+the algoritms used, but a 500Mhz CPU/camera should fit the bill when simple
+algoritms are used.
 \question{8}{A flight data warehouse for a travel agent consists of six
 …
 year 2007?}
+Note bit tricky as we use the departure\_time to determine the year and
+arrival\_time to determine the month of the flight. When these values are not
+set to the same day we might run into inconsistencies.
+\begin{verbatim}
+* Roll-up departure\_time to 'Year'
+* Roll-up flight to 'Airline'
+* Dice departure = 'LA' and departure_time = '2007' and flight = 'American Airlines'
+* Roll-up traveler to 'Consumer/Business'
+* Roll-up arrival_time to 'Month'
+* Slice traveler = 'Business'
+\end{verbatim}
 \question{9}{In graph mining what would be the advantage of the described apriori-based approach
 over the pattern growth based approach (see lecture slides) and vice versa.}
+The advantage comes in the simplicity of the algotitm when it get to
+computations needed and it's setup, but this does not make it effient.
+On the other hand apriori uses the costly process of candidate generation and
+testing, making it more 'heavy' then pattern grow based. and also avoid costly
+database scans.
 % http://en.wikipedia.org/wiki/Association_rule_learning

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 73

Legend:

liacs/dbdm/dbdm_5/report.tex

Download in other formats: