Syllabus for EM 623 - Data Science and Knowledge DiscoveryPURPOSE: This syllabus provides the student with information about the details and guidance necessary to complete EM 623. TEXT:
Additional and recommended texts:
COURSE DESCRIPTION: The digital tools we are using every day are creating data from everything we do at an unprecedented rate: every day, 2.5 quintillion (1018) bytes of data are created and 90% of the data in the world today was created within the past two years. Data can be structured – generated by business applications – and unstructured – generated by the web, often as text. Data piles up quickly and compound annual data growth both threatens to bury today’s application infrastructure and provides a great opportunity to have insides on customers, processes, markets. Getting usable information from such a vast amount of data may require more than intuition. The intuition we use to make judgments is an excellent guide some of the time, but gives a distorted view at other times. Creating views, extracting trends, define patterns, identify clusters is all something we need to actually manage large data. This mining process requires a combination of tools, ability to represent knowledge and domain-specific expertise. A number of successful applications have been reported in areas such as credit rating, fraud detection, database marketing, customer relationship management, stock market investments, security. The field of data mining has evolved from the disciplines of statistics and artificial intelligence. Page 1 This course will examine methods that have emerged from both fields and proven to be of value in recognizing patterns and making predictions from an applications perspective. We will survey applications and tools, providing also opportunities for hands-on experimentation with algorithms for data mining using software tools and cases. Final goal of the course is to provide the students with a “data toolbox” they can use in their activities. This “toolbox” contains methods and tools that students will use themselves during the course for real world applications. COURSE OBJECTIVES: The course aims to:
Page 2 Midterm Exam will be performed in class with a 3 hours duration. Students will work on a data mining case using one or more of the software presented during the classes. Students can use notes and books. Final Exam will be a project students will prepare individually at home and submit on Canvas. A selection of projects will be presented in class. Homework will be done using one or more of the software tools used in class. If part of an assignment is not original and not from a cited source, the case will be considered as “cheating”/plagiarism. Cheating of any kind will result in a zero grade for the assignment MODULES:
SOFTWARE: We will use:
Page 3 A virtual machine with all the tools will also be provided. Students are required to install all the software into their computers before starting week 3. Contact the instructor for help. |