2010

Authors

  • Justin Terry Justin Terry

In current database applications there is a large and growing amount of multidimensional data that is not well supported. Modern database applications are turning more often to multidimensional data for a more sophisticated representation and reasoning of their universe. The amount of data collected, processed and stored continues to grow as new technologies emerge. Even relational data can be seen to be multidimensional as each attribute corresponds to a dimension. Despite the large and growing data sets, multidimensional data is not well supported above low dimensionality and there is no index available in current commercial RDBMS that can scale up well beyond low dimensionality. A major problem with multidimensional data is that it has no true order. Space filling curves are an elegant solution to organizing multidimensional space. In this work we adapt and further develop this concept further to create a curve that connects all regions in space and the regions can be of various hyper-cubic-like sizes, not just points. This allows an efficient transformation of interval queries into regions of physically clustered data. We therefore present the Variable Granularity Curve (VG-Curve), a symmetrical index for multidimensional points, spatially extended objects and general database relations. We show that our method is applicable to a variety of multidimensional database applications. The method is immediately able to be constructed within current commercial database systems and thus inherits industrial strength concurrency and recovery services. The main contributions of this study are summarized as follows: • We propose a novel twist on indexing that allows improved support for multidimensional point data. Our method ensures an efficient clustering of multidimensional data that allows for primary index style storage and utilizes a two stage filtering that efficiently prunes the search space. In an extensive empirical study on up to 18 dimensions we show that the VG-Curve represents a significant improvement over the currently available access methods for managing multidimensional point data, and scales well with an increasing number of dimensions (Terry, Stantic & Sattar 2008). In addition we also investigate the applicability of the proposed method to manage spatially extended data, identifying its suitability to index both interval and point attributes together. • We show that the VC-curve can be effectively and efficiently applied to the emerging technology of Radio Frequency Identification (RFID) that generates huge amounts of spatio-temporal data. We show empirically, on data sets of three, five and nine dimensional data, that our method is an efficient replacement for the best of currently available off-the-shelf index methods, i.e, the set of compound indices representing all combinations of dimensions.