| In my article, I present a new tree mining | | | | In my article, I focus on |
| algorithm, DRYADEPARENT, based on the hooking | | | | tree mining that is, finding frequent tree-shaped |
| principle first introduced in DRYADE. In the | | | | patterns in a database of tree-shaped data. Tree |
| project, I demonstrate that the branching factor | | | | mining can lead to many practical applications in |
| and depth of the frequent patterns to find are | | | | the areas of computer networks, bioinformatics, |
| key factors of complexity for tree mining | | | | and XML documents databases mining and hence |
| algorithms, even if often overlooked in previous | | | | have received a lot of attention from the |
| algorithm. I show that DRYADEPARENT | | | | research community in recent years. Most of the |
| outperforms the current fastest algorithm, | | | | well-known algorithms use the same |
| CMTreeMiner, by orders of magnitude on data | | | | generate-and-test principle that made the success |
| sets where the frequent tree patterns have a | | | | of frequent item set algorithms. The main |
| high branching factor. DRYADE is based on a | | | | adaptation to the tree case is the design of |
| more general tree inclusion definition appropriate | | | | efficient candidate tree enumeration algorithms in |
| for mining highly heterogeneous collections of tree | | | | order to avoid generating redundant candidates |
| data. DRYADEPARENT follows the same principles | | | | and to enable efficient pruning. However, the |
| of DRYADE but uses a standard inclusion | | | | search space of tree candidates is huge, |
| definition. The performance of the | | | | particularly when the frequent trees to find have |
| DRYADEPARENT is very fast when compare to | | | | both a high depth and a high branching factor. |
| the CMTreeMiner. | | | | |