Active Outline
General Information
- Course ID (CB01A and CB01B)
- CISD344F
- Course Title (CB02)
- Introduction to Big Data and Analytics
- Course Credit Status
- Non-Credit
- Effective Term
- Fall 2024
- Course Description
- Introduction to Big-Data deluge, management of unstructured and structured data, and design of large-scale database systems. Concepts covered include Map-reduce parallel processing algorithms, Real-time analytics, classification, and predictive analytics, attributes of Big-Data, and related issues. Introduction to large-scale file systems and operations and parallel processing algorithms.
- Faculty Requirements
- Discipline 1
- [Computer Information Systems (Computer network installation, microcomputer technology, computer applications)]
- Discipline 3
- [Computer Science]
- FSA
- [FHDA FSA - CIS]
- Course Family
- Not Applicable
Course Justification
This is a noncredit enhanced course that belongs on the certificate of completion in Database Development Practitioner. It introduces learners with language to access extremely large storage systems for creating and managing a database. It is beneficial for those with careers in IT, including Database Architects, Database Administrators, and Database Designers, to hold certification for a specific database software program.
Foothill Equivalency
- Does the course have a Foothill equivalent?
- No
- Foothill Course ID
Formerly Statement
Course Development Options
- Basic Skill Status (CB08)
- Course is not a basic skills course.
- Grade Options
- Letter Grade
- Pass/No Pass
- Repeat Limit
- 99
Transferability & Gen. Ed. Options
- Transferability
- Not transferable
Units and Hours
Summary
- Minimum Credit Units
- 0.0
- Maximum Credit Units
- 0.0
Weekly Student Hours
Type | In Class | Out of Class |
---|---|---|
Lecture Hours | 4.0 | 8.0 |
Laboratory Hours | 0.0 | 0.0 |
Course Student Hours
- Course Duration (Weeks)
- 12.0
- Hours per unit divisor
- 36.0
Course In-Class (Contact) Hours
- Lecture
- 48.0
- Laboratory
- 0.0
- Total
- 48.0
Course Out-of-Class Hours
- Lecture
- 96.0
- Laboratory
- 0.0
- NA
- 0.0
- Total
- 96.0
Prerequisite(s)
Corequisite(s)
Advisory(ies)
ESL D272. and ESL D273., or ESL D472. and ESL D473., or eligibility for EWRT D001A or EWRT D01AH or ESL D005.
Limitation(s) on Enrollment
Entrance Skill(s)
General Course Statement(s)
NONCREDIT: (This is a noncredit enhanced, CTE course.)
Methods of Instruction
Lecture and visual aids
Discussion of assigned reading
Discussion and problem solving performed in class
Collaborative learning and small group exercises
Collaborative projects
Collaborative learning and small group exercises
Homework and extended projects
Assignments
- Readings from Text.
- Documenting, coding, testing and debugging six to ten programs with guidance provided with clearly documented design, half completed in the computer lab, half completed as homework.
Methods of Evaluation
- One or two midterm examinations requiring some programming, concepts clarification and exhibiting mastery of large scale database systems principles.
- A final examination requiring concepts clarification and exhibiting mastery of large scale database system principles.
- Evaluation of programming assignments, based on correctness, documentation, code quality, and test plan executions.
Essential Student Materials/Essential College Facilities
Essential Student Materials:Â
- None
- None
Examples of Primary Texts and References
Author | Title | Publisher | Date/Edition | ISBN |
---|---|---|---|---|
Danette McGilvray | Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information (TM) | Academic Press | June 4, 2021 - 2nd edition | ISBN-13 : 978-0128180150 |
Examples of Supporting Texts and References
None.
Learning Outcomes and Objectives
Course Objectives
- Explore big-data technologies as means to solving key business analytical problems.
- Interpret and analyze techniques for setting up patterns for data analysis.
- Compare and contrast the data and relation algorithms.
- Examine data pre-processing and visualization techniques for enabling data analytic scenarios.
- Articulate the characteristics of regression, forecasting and classification techniques for predictive analytics.
- Interpret and analyze architecture of database clustering technologies.
CSLOs
- Design, implement and debug a large scale database system using technology like Hadoop or Cassandra.
- Perform data analysis using a large-scale database systems given a set of user requirements.
Outline
- Explore big-data technologies as means to solving key business analytical problems.
- Data analytics, Data mining and knowledge discovery.
- Competitor, intelligence and big data.
- Business case studies: Electronic Health Records (EHR), US Dept of Transportation.
- Interpret and analyze techniques for setting up patterns for data analysis.
- RDBMS Relational Modeling
- No-SQL DB Modeling
- Datawarehousing modeling, data mining and online analytical processing.
- Compare and contrast the data and relation algorithms.
- Auto-Associator
- Component Analysis
- Diagrams
- Multidimensional Scaling
- Histograms
- Examine data pre-processing and visualization techniques for enabling data analytic scenarios.
- Error Type and Error Handling
- Filtering
- Data Transformation
- Data Merging
- Linear Correlation, correlation and causality.
- Chi-square test for independence
- Articulate the characteristics of regression, forecasting and classification techniques for predictive analytics.
- Linear regression, linear regression with nonlinear substitution and robust regression.
- Cross validation and feature selection.
- Finite state machines, recurrent models and autoregressive models.
- Classification criteria, naive bayes classifier and linear discriminant analysis.
- Support vector machines, nearest neighbor classifier and learning vector quantization.
- Decision Trees
- Interpret and analyze architecture of database clustering technologies.
- Hadoop
- Oracle RAC
- MySQL Clusters
- Windows Clustering
- Cassandra
- Trackvia, nCluster from Teradata.