Active Outline

General Information


Course ID (CB01A and CB01B)
CISD344F
Course Title (CB02)
Introduction to Big Data and Analytics
Course Credit Status
Non-Credit
Effective Term
Fall 2024
Course Description
Introduction to Big-Data deluge, management of unstructured and structured data, and design of large-scale database systems. Concepts covered include Map-reduce parallel processing algorithms, Real-time analytics, classification, and predictive analytics, attributes of Big-Data, and related issues. Introduction to large-scale file systems and operations and parallel processing algorithms.
Faculty Requirements
Discipline 1
[Computer Information Systems (Computer network installation, microcomputer technology, computer applications)]
Discipline 3
[Computer Science]
FSA
[FHDA FSA - CIS]
Course Family
Not Applicable

Course Justification


This is a noncredit enhanced course that belongs on the certificate of completion in Database Development Practitioner. It introduces learners with language to access extremely large storage systems for creating and managing a database. It is beneficial for those with careers in IT, including Database Architects, Database Administrators, and Database Designers, to hold certification for a specific database software program.

Foothill Equivalency


Does the course have a Foothill equivalent?
No
Foothill Course ID

Course Philosophy


Formerly Statement


Course Development Options


Basic Skill Status (CB08)
Course is not a basic skills course.
Grade Options
  • Letter Grade
  • Pass/No Pass
Repeat Limit
99

Transferability & Gen. Ed. Options


Transferability
Not transferable

Units and Hours


Summary

Minimum Credit Units
0.0
Maximum Credit Units
0.0

Weekly Student Hours

TypeIn ClassOut of Class
Lecture Hours4.08.0
Laboratory Hours0.00.0

Course Student Hours

Course Duration (Weeks)
12.0
Hours per unit divisor
36.0
Course In-Class (Contact) Hours
Lecture
48.0
Laboratory
0.0
Total
48.0
Course Out-of-Class Hours
Lecture
96.0
Laboratory
0.0
NA
0.0
Total
96.0

Prerequisite(s)


Corequisite(s)


Advisory(ies)


ESL D272. and ESL D273., or ESL D472. and ESL D473., or eligibility for EWRT D001A or EWRT D01AH or ESL D005.

Limitation(s) on Enrollment


Entrance Skill(s)


General Course Statement(s)


NONCREDIT: (This is a noncredit enhanced, CTE course.)

Methods of Instruction


Lecture and visual aids

Discussion of assigned reading

Discussion and problem solving performed in class

Collaborative learning and small group exercises

Collaborative projects

Collaborative learning and small group exercises

Homework and extended projects

Assignments


  1. Readings from Text.
  2. Documenting, coding, testing and debugging six to ten programs with guidance provided with clearly documented design, half completed in the computer lab, half completed as homework.

Methods of Evaluation


  1. One or two midterm examinations requiring some programming, concepts clarification and exhibiting mastery of large scale database systems principles.
  2. A final examination requiring concepts clarification and exhibiting mastery of large scale database system principles.
  3. Evaluation of programming assignments, based on correctness, documentation, code quality, and test plan executions.

Essential Student Materials/Essential College Facilities


Essential Student Materials: 
  • None
Essential College Facilities:
  • None

Examples of Primary Texts and References


AuthorTitlePublisherDate/EditionISBN
Danette McGilvrayExecuting Data Quality Projects: Ten Steps to Quality Data and Trusted Information (TM)Academic PressJune 4, 2021 - 2nd editionISBN-13 : 978-0128180150

Examples of Supporting Texts and References


None.

Learning Outcomes and Objectives


Course Objectives

  • Explore big-data technologies as means to solving key business analytical problems.
  • Interpret and analyze techniques for setting up patterns for data analysis.
  • Compare and contrast the data and relation algorithms.
  • Examine data pre-processing and visualization techniques for enabling data analytic scenarios.
  • Articulate the characteristics of regression, forecasting and classification techniques for predictive analytics.
  • Interpret and analyze architecture of database clustering technologies.

CSLOs

  • Design, implement and debug a large scale database system using technology like Hadoop or Cassandra.

  • Perform data analysis using a large-scale database systems given a set of user requirements.

Outline


  1. Explore big-data technologies as means to solving key business analytical problems.
    1. Data analytics, Data mining and knowledge discovery.
    2. Competitor, intelligence and big data.
    3. Business case studies: Electronic Health Records (EHR), US Dept of Transportation.
  2. Interpret and analyze techniques for setting up patterns for data analysis.
    1. RDBMS Relational Modeling
    2. No-SQL DB Modeling
    3. Datawarehousing modeling, data mining and online analytical processing.
  3. Compare and contrast the data and relation algorithms.
    1. Auto-Associator
    2. Component Analysis
    3. Diagrams
    4. Multidimensional Scaling
    5. Histograms
  4. Examine data pre-processing and visualization techniques for enabling data analytic scenarios.
    1. Error Type and Error Handling
    2. Filtering
    3. Data Transformation
    4. Data Merging
    5. Linear Correlation, correlation and causality.
    6. Chi-square test for independence
  5. Articulate the characteristics of regression, forecasting and classification techniques for predictive analytics.
    1. Linear regression, linear regression with nonlinear substitution and robust regression.
    2. Cross validation and feature selection.
    3. Finite state machines, recurrent models and autoregressive models.
    4. Classification criteria, naive bayes classifier and linear discriminant analysis.
    5. Support vector machines, nearest neighbor classifier and learning vector quantization.
    6. Decision Trees
  6. Interpret and analyze architecture of database clustering technologies.
    1. Hadoop
    2. Oracle RAC
    3. MySQL Clusters
    4. Windows Clustering
    5. Cassandra
    6. Trackvia, nCluster from Teradata.
Back to Top