How do we find the dominant groups of customers in age, sex and location that were responsible for at least 85% of the sales of iPad, Macbook and iPhone? To answer such types of questions we introduce a novel data mining task – mining multidimensional distinct patterns (DPs). Given a multidimensional data set where each tuple carries some attribute values and a transaction, multidimensional DPs are itemsets whose absolute support ratio in a group-by on the attributes against the rest of the data set passes a given threshold. A baseline algorithm uses BUC as our cubing algorithm, and passes two distinct sets of transactions associated to the tuples of the cell to a pattern mining algorithm called DPMiner. The use of several effective pruning techniques eliminates redundant processing of DPMiner and reduces the runtime. The empirical study between the baseline and advanced algorithm demonstrates that the advanced algorithm is significantly faster.
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Supervisor or Senior Supervisor
Thesis advisor: Pei, Jian
Member of collection