Course Outline
Module 1: Predictive Modeling Basics
-
Data Preparation
-
Data Cleansing
-
Integrating Data from Multiple Sources
-
Common Issues
Module 2: Linear Regression
-
Predictive vs. Explanatory Modeling Using Regression
-
Overfitting vs. Underfitting
-
Splitting Data into Training/Validation subsets
-
Multicollinearity
-
Feature Subset Selection Models
Module 3: Classification Models
-
K- Nearest Neighbor
-
Distance Function
-
Similarity Functions
-
Combination Function
-
Choosing K
-
Advantages/Disadvantages
Module 4: Segmentation Modeling/ Cluster Analysis
-
Clustering
-
Clustering vs. Classification
-
K-Means Clustering
-
Clusters Interpretation
-
Hierarchical Clustering
-
Segmentation
Module 5: Spreadsheet Models / Optimization
-
Linear Optimization Models
-
Maximizing Profit/ Minimizing Cost
Module 6: Data Analysis Using R
-
Introduction to R
-
Data Analysis using R
-
Reading Data
-
Data Type in R
-
Clustering in R
-
Regression in R
Exercise and Software:
-
Within each module, students will be provided with lots of in-class hands on exercises to practice the materials on their own and/or with the guidance of the instructor.
-
Class materials, including lecture notes and exercises will be provided to students.
-
Students are required to have Microsoft Excel (and R for Level 2) installed on their laptops.