Subject: Information and Big Data Security (CYB 304)
Big Data means huge amounts of information that are too large or complex for normal computers or traditional databases to handle easily. It’s not just about size, it’s also about how fast the data comes in, how many different types it has, and how useful it can be.
Think of it as data that is too big, too fast, or too varied for regular tools.
The 5 Vs of Big Data
1. Volume: The sheer size of data. For example, Facebook generates billions of posts and photos every day. 2. Velocity: The speed at which data is created and shared. For example, WhatsApp messages sent every second worldwide. 3. Variety: Different forms of data. Example, school records (structured tables), YouTube videos (unstructured), tweets (semi-structured). 4. Veracity: How trustworthy the data is. For example a rumor spreading on social media vs. verified news from BBC. 5. Value: The usefulness of data. For example, Netflix uses viewing history to recommend movies you’ll enjoy.
Big Data Techniques
Big data techniques are the methods or approaches we use to handle, process, and make sense of very large and complex datasets. Think of them as the “skills” or “strategies” that help us deal with data that is too big or too fast for normal tools.
1. Data Collection: Gathering data from multiple sources. Example a university collecting student grades, attendance, and online portal activity. 2. Data Cleaning: Fixing errors, removing duplicates, and filling missing values. Example making sure a student’s name isn’t entered twice in the database. 3. Data Integration: Combining data from different systems. For example linking exam scores from the school portal with attendance records from the classroom system. 4. Data Analysis: Using statistics or algorithms to find patterns. Example, analyzing past exam results to see which subjects students struggle with most. 5. Data Visualization: Presenting data in charts, graphs, or dashboards. Example, showing student performance trends over time in a bar chart. (e.g., Power BI, Tableau). 6. Predictive Modeling: Using past data to predict future outcomes. For example predicting which students might need extra help based on their past performance.
Big Data Tools and Technologies
1. Hadoop: think of Hadoop as a giant filing cabinet spread across many computers. Instead of storing millions of student records on one machine, Hadoop spreads them out so they can be processed faster and more safely. 2. Spark: spark is like a super-fast calculator that works in memory. If Hadoop is the filing cabinet, Spark is the brain that quickly analyzes the files. For example, banks use Spark to detect fraud in real-time while transactions are happening. 3. NoSQL Databases: traditional databases (like MySQL) store data in tables. But social media posts, videos, and chat logs don’t fit neatly into tables. NoSQL databases (like MongoDB or Cassandra) are flexible “notebooks” that can store messy, unstructured data. 4. MySQL + Python: for structured data (like student grades or sales records), MySQL is still very useful. Combined with Python libraries like pandas, students can query, clean, and analyze datasets. This is perfect for classroom labs. 5. Cloud Platforms: services like AWS, Azure, or Google Cloud are like renting a huge warehouse online. Instead of buying expensive servers, organizations rent space and computing power in the cloud to store and analyze big data.