Data Mining in a Variety
of Applications
We have been working with
mining of association rules in large databases for the last two to three years. These
rules 'discover" new associations form large sets of records such as transaction
data from supermarkets (also known as "market basket data"). The rules
are of the type "if a customer buys coke he/she is more than likely to buy potato
chips." A general form of the rule says that "if a customer buys soft drinks,
they are likely to buy snack foods, " etc. We have published this work in VLDB
1995 where we demonstrated efficient algorithms to mine association rules using partitioning
that exceeded the performance of previously published algorithms. We will publish
about negative rules in the 1998 Data Engineering Conference. There is hardly any
previous work in the area of discovering negative rules, especially due to the vast
possible space of possibilities. Negative rules state negative conclusions such as
'if someone buys mineral water, they are likely not to buy donuts.'
We have been extending our
work on association rules in several directions:
- Extending to different
types of domains and of data types. In this effort, we have started work with the
medical domain. We have already showed that images of cardiac data (SPEC images)
can be preprocessed and subjected to the discovery of association rules. They reveal
some interesting relationships among known conditions related to heart disease. We
are also interested in processing simulation experimental data and analyzing simulation
results to discover association rules among events and agents that lead to those
events. Another area is mining of time series data and of image data. The image data
work leads to interesting new research which is described in (2) below.
- We are currently investigating
techniques to discover knowledge in image databases. We rely on the output from an
image understanding system in the form of feature vectors and using data mining algorithms,
we find associations among objects identified in each image. Once the associations
are obtained, we can use them to define images that or similar, or are related in
terms of some given properties a user may be interested in. In our current work no
assumptions are made about the image content but in the future we expect to use domain
knowledge to speed up the process.
FUTURE WORK:
We plan to implement the
algorithms using partitioning by further enhancing them with possible incremental
computation to incorporate new data as it arrives into the database. We are also
interested in time-series data and its mining in financial applications. There is
also a possibility of doing a parallel server implementation of the partitioning
algorithms which we have already developed.
PUBLICATIONS:
- A. Savasere, E. Omiecinski
and S. Navathe. " An Efficient Algorithm for Mining Association Rules in Large
Databases ," In Proceedings of the Very Large Data Base Conference, September,
1995.
- A. Savasere, E. Omiecinski
and S. Navathe. "Mining for Strong Negative Associations in a Large Database
of Customer Transactions," In Proceedings of the IEEE 14th Int. Conference.
on Data Engineering , Orlando, FL, February 1998 (forthcoming).
Interested researchers and
industry collaborators should contact Profs. Navathe (sham@cc.gatech.edu) or Omiecinski
(edwardo@cc.gatech.edu).