DATA 601: Introduction to Data Science
The goal of this class is to give students an introduction to and hands on experience with all phases of the data science process using real data and modern tools. Topics that will be covered include data formats, loading, and cleaning; data storage in relational and non-relational stores; data governance, data analysis using supervised and unsupervised learning using R and similar tools, and sound evaluation methods; data visualization; and scaling up with cluster computing, MapReduce, Hadoop, and Spark.
Prerequisite: Enrollment in the Data Science program. Other students may be admitted with instructor permission.
DATA 602: Introduction to Data Analysis and Machine Learning
This course provides a broad introduction to the practical side of machine-learning and data analysis. This course examines the end-to-end processing pipeline for extracting and identifying useful features that best represent data, a few of the most important machine algorithms, and evaluating their performance for modeling data. Topics covered include decision trees, logistic regression, linear discriminant analysis, linear and non-linear regression, basic functions, support vector machines, neural networks, Bayesian networks, bias/variance theory, ensemble methods, clustering, evaluation methodologies, and experiment design.
Prerequisite: DATA 601: Introduction to Data Science and enrollment in the Data Science program. Non-Data Science students may be permitted with instructor permission.
DATA 603: Platforms for Big Data Processing
The goal of this course is to introduce methods, technologies, and computing platforms for performing data analysis at scale. Topics include the theory and techniques for data acquisition, cleansing, aggregation, management of large heterogeneous data collections, processing, information and knowledge extraction. Students are introduced to map-reduce, streaming, and external memory algorithms and their implementations using Hadoop and its eco-system (HBase, Hive, Pig and Spark). Students will gain practical experience in analyzing large existing databases.
Prerequisite: Enrollment in the Data Science program and DATA 601. Other students may be admitted with program director’s permission.
DATA 604: Data Management
This course introduces students to the data management, storage and manipulation tools common in data science. Students will get an overview of relational database management systems and various NoSQL database technologies, and apply them to real scenarios. Topics include: ER and relational data models, storage and concurrency preliminaries, relational databases and SQL queries, NoSQL databases, and Data Governance.
Prerequisite: Enrollment in the Data Science program. Other students may be admitted with instructor permission. Corequisite: DATA 601: Introduction to Data Science
Looking for more info?