2010

xml has become a standard for representing, storing and exchanging data, so eectively and eciently retrieving data from xml data sources becomes increasingly important. In this thesis, we mainly study two types of xml queries: xml structured query and xml keyword search. For xml structured query, we focus on twig pattern matching, which lies in the center of most xml query languages (e.g., XPath, XQuery). Existing twig pattern matching algorithms can be classied into two-phase algorithms and one-phase algorithms. We rst propose two novel one-phase holistic twig matching algorithms, TwigMix and TwigFast, which combine the ecient selection of useful elements (introduced in TwigStack) with the simple lists for storing nal solutions (introduced in TwigList). TwigMix simply introduces the element selection function getNext of TwigStack into TwigList to avoid manipulation of useless elements in the stack and lists. TwigFast further improves this by introducing some pointers in the lists to completely avoid the use of stacks. On the other hand, previous twig pattern matching algorithms may incur other redundant computation, so we propose two approaches, namely re-test checking and forward-to-end, which can reduce the redundant computation and can be easily applied to both holistic one-phase and two-phase algorithms. Improving the eectiveness of xml keyword search remains an open problem. In this thesis, we rst present XKMis, which divides the nodes into meaningful and self-containing information segments, called minimal information segments (MISs), and return MIS-subtrees which consist of MISs that are logically connected by the keywords. The MIS-subtrees are closer to what the user wants. XReal [1] utilizes the statistics of underlying data to resolve keyword ambiguity problems.