《Data Mining》Course Syllabus
Course Name | Data Mining |
Instructor | Prof. Juanying Xie | Course Type | Elective Course |
Prerequisite Courses | Statistics, Machine learning, Pattern Recognition, Database, Artificial Intelligence | Discipline | Computer Science |
Learning Method | Mentoring, discussion, and programming |
Semester | 1st semester | Hours | 40 | Credit | 2 |
| | | | | |
1. Objective & Requirement
Data mining is a multidiscipline subject of computer science, mathematics and statistics etc. It has been widely used in many different fields including bioinformatics, biomedical sciences, biochemistry, biogeography, financial data analysis , medical data analysis and other fields related with computer science such as the very popular research fields of big data analysis and big search over cyber space et al.
Data mining provides many techniques to dig out the hidden patterns and unknown and potentially useful knowledge from the vast amount of data. This course will explore the concepts and techniques of knowledge discovery and data mining. We will give the encyclopedic coverage of all the related methods in this course, from the classic topics of clustering and classification, to database methods (e.g., association rules, data cubes) to more recent and advanced topics (e.g., SVD/PCA, wavelets, support vector machines). The objective of this course is to establish fundamental concepts of data mining and let students know how to analyze the data using some related methods.
The course will be taught in English. All of the graduate students including master and PhD students are welcome to choose this course as a selective course to study whose major are related to computer science or who are interested in the techniques of data mining and plan to use the techniques to their research to analyze the data in their research fields. The prerequisite courses of Data Mining are Statistics, Machine learning, Pattern Recognition, Database, and Artificial Intelligence etc.
2. Topics and the specific contents to be covered in this course
We will explore the concepts and techniques of knowledge discovery and data mining. We will give the encyclopedic coverage of all the related methods in data mining. The following core topics will be discussed in this course.
1. Introduction
1.1. Why Data Mining?
1.2. What Is Data Mining?
1.3. What Kinds of Data Can Be Mined?
1.4. What Kinds of Patterns Can Be Mined?
1.5. Which Technologies Are Used?
1.6. Which Kinds of Applications Are Targeted?
1.7. Major Issues in Data Mining
1.8. Summary
2. Getting to Know Your Data
2.1. Data Objects and Attribute Types
2.2. Basic Statistical Descriptions of Data
2.3. Data Visualization
2.4. Measuring Data Similarity and Dissimilarity
2.5. Summary
3. Data Preprocessing
3.1. Data Preprocessing: An Overview
3.2. Data Cleaning
3.3. Data Integration
3.4. Data Reduction
3.5. Data Transformation and Data Discretization
3.6. Summary
4. Data Warehousing and Online Analytical Processing
4.1. Data Warehouse: Basic Concepts
4.2. Data Warehouse Modeling: Data Cube and OLAP
4.3. Data Warehouse Design and Usage
4.4. Data Warehouse Implementation
4.5. Data Generalization by Attribute-Oriented Induction
4.6. Summary
5. Data Cube Technology
5.1. Data Cube Computation: Preliminary Concepts
5.2. Data Cube Computation Methods
5.3. Processing Advanced Kinds of Queries by Exploring Cube Technology
5.4. Multidimensional Data Analysis in Cube Space
5.5. Summary
6. Mining Frequent Patterns, Associations, and Correlations
6.1. Basic Concepts
6.2. Frequent Itemset Mining Methods
6.3. Which Patterns Are Interesting?—Pattern Evaluation Methods
6.4. Summary
7. Advanced Pattern Mining
7.1. Pattern Mining: A Road Map
7.2. Pattern Mining in Multilevel, Multidimensional Space
7.3. Constraint-Based Frequent Pattern Mining
7.4. Mining High-Dimensional Data and Colossal Patterns
7.5. Mining Compressed or Approximate Patterns
7.6. Pattern Exploration and Application
7.7. Summary
8. Classification
8.1. Basic Concepts
8.2. Decision Tree Induction
8.3. Bayes Classification Methods
8.4. Rule-Based Classification
8.5. Model Evaluation and Selection
8.6. Techniques to Improve Classification Accuracy
8.7. Summary
9. Classification
9.1. Bayesian Belief Networks
9.2. Classification by Backpropagation
9.3. Support Vector Machines
9.4. Classification Using Frequent Patterns
9.5. Lazy Learners (or Learning from Your Neighbors)
9.6. Other Classification Methods
9.7. Additional Topics Regarding Classification
9.8 Summary
10. Cluster Analysis
10.1. Cluster Analysis
10.2. Partitioning Methods
10.3. Hierarchical Methods
10.4. Density-Based Methods
10.5. Grid-Based Methods
10.6. Evaluation of Clustering
10.7. Summary
11. Advanced Cluster Analysis
11.1. Probabilistic Model-Based Clustering
11.2. Clustering High-Dimensional Data
11.3. Clustering Graph and Network Data
11.4. Clustering with Constraints
11.6. Summary
12. Outlier Detection
12.1. Outliers and Outlier Analysis
12.2. Outlier Detection Methods
12.3. Statistical Approaches
12.4. Proximity-Based Approaches
12.5. Clustering-Based Approaches
12.6. Classification-Based Approaches
12.7. Mining Contextual and Collective Outliers
12.8. Outlier Detection in High-Dimensional Data
12.9. Summary
13. Data Mining Trends and Research Frontiers
13.1. Mining Complex Data Types
13.2. Other Methodologies of Data Mining
13.3. Data Mining Applications
13.4. Data Mining and Society
13.5. Data Mining Trends
13.6. Summary
3. Textbook
Jiawei Han, Micheline Kamber, & Jian Pei. Data Mining: Concepts and Techniques (3rd edition). Morgan Kaufman Publisher, 2012.
4. Reference
Pang-Ning Tan, Michael Steinbach, & Vipin Kumar. Introduction to Data Mining, Pearson Eductation, Inc. 2006.
David J. Hand, Heikki Mannila, Padhraic Smyth, Principles of Data Mining (Adaptive Computation and Machine Learning), MIT Press, 2001.
Zaki Mohammed, Meira Wagner. Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, 2014.
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques (2nd edition). Morgan Kaufmann Publishers, 2006.
5. Course Evaluation (Tentative)
Assignments 30%
Course Project 40%
Exam 30%