Syllabus for EM 623 - Data Science and Knowledge Discovery

PURPOSE:

This syllabus provides the student with information about the details and guidance necessary to complete EM 623.

TEXT:

  1. Lecture Notes and Handouts

  2. KNIME Essentials, Gábor Bakos, October 16, 2013 [Available on Canvas as pdf]

Additional and recommended texts:

  1. Focus on Data Mining theory and algorithms. Discovering Knowledge in Data: An introduction to Data Mining, Daniel T. Larose, John Wiley, 2004

  2. Focus on Applications. Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, Eric Siegel, Wiley, February 2013

  3. Focus on Rattle. Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery, Springer, 2011

  4. Focus on Text Mining using Python. Natural Language Processing with Python, Steven Bird, O.Reilly Media, 2009

  5. Focus on Network Analysis. Social Network Analysis for Startups, M. Tsvetovat & A. Kouznetsov, O.Reilly Media, 2011

COURSE DESCRIPTION:

The digital tools we are using every day are creating data from everything we do at an unprecedented rate: every day, 2.5 quintillion (1018) bytes of data are created and 90% of the data in the world today was created within the past two years.

Data can be structured – generated by business applications – and unstructured – generated by the web, often as text. Data piles up quickly and compound annual data growth both threatens to bury today’s application infrastructure and provides a great opportunity to have insides on customers, processes, markets.

Getting usable information from such a vast amount of data may require more than intuition. The intuition we use to make judgments is an excellent guide some of the time, but gives a distorted view at other times. Creating views, extracting trends, define patterns, identify clusters is all something we need to actually manage large data.

This mining process requires a combination of tools, ability to represent knowledge and domain-specific expertise. A number of successful applications have been reported in areas such as credit rating, fraud detection, database marketing, customer relationship management, stock market investments, security. The field of data mining has evolved from the disciplines of statistics and artificial intelligence.

page1image5597120page1image5593856page1image5594816page1image5588480

Page 1

This course will examine methods that have emerged from both fields and proven to be of value in recognizing patterns and making predictions from an applications perspective. We will survey applications and tools, providing also opportunities for hands-on experimentation with algorithms for data mining using software tools and cases. Final goal of the course is to provide the students with a “data toolbox” they can use in their activities. This “toolbox” contains methods and tools that students will use themselves during the course for real world applications.

COURSE OBJECTIVES:

The course aims to:

  • Provide the student with a way to understand the potential value of the data for application purposes and how to manage and prepare data to extract context-specific value

  • Help the student to understand how analytical techniques, text mining and network analysis can enhance decision making by converting data into information and insights for decision-making

  • Provide insight into how to choose and use the most effective data mining techniques and tools based on the problem at hand

  • Provide the student with a software toolkit to apply models and techniques to real decision problems

  • Develop and modify data mining prototypes using R for data mining, Gephi for network analysis and

    Python for text mining.

  • Make the students work on engineering management applications, such as Marketing/Entrepreneurship, Product indirect testing, Organizations analysis.

    Overall, the course will provide the student with an application oriented “data toolbox” of methods, techniques and tools to be applied to the data-intensive applications students may have in their future professional activities.

    COURSE OUTCOMES:

    On top of what detailed in the course objectives, through this course students will develop:Knowledge

    o Abilitytounderstand,analyze,planandselecttheproperimplementationstrategytoextract actionable information from data

    Attitude
    o Abilitytofacestructuredandunstructuredpotentiallyvastamountofdatawithapragmaticand

    solution oriented attitudeSkills

    o Basicabilitytousesomeofthemostpopulartoolsandtechniquesfordata/textminingand network analysis

    GRADING:

  • Homework 35%

  • Midterm Exam 25%

  • Final Exam 40%

page2image5496128page2image5496512page2image5496704

Page 2

Midterm Exam will be performed in class with a 3 hours duration. Students will work on a data mining case using one or more of the software presented during the classes. Students can use notes and books.

Final Exam will be a project students will prepare individually at home and submit on Canvas. A selection of projects will be presented in class.

Homework will be done using one or more of the software tools used in class.

If part of an assignment is not original and not from a cited source, the case will be considered as “cheating”/plagiarism. Cheating of any kind will result in a zero grade for the assignment

MODULES:

page3image5614080

Week

page3image28581984

Topics Covered

1

page3image5761536

Machine Learning and Data Mining: Introduction, life cycle and case studies

2

page3image5762112

Assessing the value of data: understanding, cleaning and transforming

page3image5760384

3

Data management: generalized tools and techniques - Excel and DBMS

4

page3image5582656

Data mining specific tools: introduction to R with Rattle GUI and to Knime

5

page3image5576512

Supervised and un-supervised learning – theory and examples

page3image5583808

6

Clustering and association analysis using kMeans and basket analysis – R/Rattle - Knime applications

7

page3image5583040

Decision Trees: definitions, algorithms, applications, optimizations and implementation using R/Rattle - Knime

8

page3image5681152

Midterm exam discussion

page3image5670784

9

Neural Networks for classification and prediction: applications and examples using R/Rattle - Knime

10

page3image5721280

Network Analysis to mine complex data: introduction, applications and examples using Gephi

11

page3image5734144

Network analysis implementations using Gephi

12

page3image5738816

Text mining: introduction, applications and techniques

13

page3image5758464

More on Text Mining & Course recap

14

page3image5666880

Final exam discussion

page3image5667072
page3image5581504

SOFTWARE:

We will use:

  • Knime and R for most examples. R will be used with an IDE, RStudio and a GUI for data mining, Rattle

  • Knime and Wordij for text mining

  • Gephi for Network Analysis.

    All the software is Open Source and run on all the major platforms (Mac, Windows and Linux). Latest versions recommended.

page3image5573248page3image5570560page3image5570944page3image5572480page3image5571136page3image5585920page3image5572864page3image5581696

Page 3

A virtual machine with all the tools will also be provided.

Students are required to install all the software into their computers before starting week 3. Contact the instructor for help.


文章分类: 成功案例
分享到:

微信或qq:1197239543

电子邮箱:1197239543@qq.com