Λεπτομέρειες

ΕίδοςΔημοσίευση
ΚωδικόςTR-2007-21
ΤίτλοςExploiting Duality in Summarization with Deterministic Guarantees
ΣυγγραφέαςΠαναγιώτης Καρράς, Δημήτρης Σαχαρίδης, Νίκος Μαμουλής
Έτος2007
Λέξεις κλειδιάwavelets
ΠερίληψηSummarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a B(logn)^2/logε∗ factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of (logB)^2/(logε∗+logn) in time and B(1−logB/logn) in space, where ε∗ is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation.
ΚατηγορίαData Streams
Δημοσίευση13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '07), San Jose, California, USA, August 12-15, 2007
Αρχείο Επισκόπηση


Επιστροφή στην αρχική σελίδα