Our objective is to comprehend the network.
Although Pcap isn’t accessible, we’ll utilize Python instead.
Learning Objectives #
- Gain an introductory understanding of data science and its application in Cybersecurity.
- Familiarize ourselves with Python.
- Utilize popular Python libraries like Pandas and Matplotlib to analyze data.
Overview #
The session focuses on Data Science and provides a concise overview of the topic. Here are the key takeaways:
- The essence of data science lies in interpreting data to address questions.
- Responsibilities of a Data Scientist include:
- Data Collection: Gathering raw data, such as a list of recent transactions.
- Data Processing: Converting raw data into a standardized format for analysis (time-consuming!).
- Data Mining: Clustering/Classification. Identifying relationships, patterns, and correlations.
- Analysis: Exploratory/Confirmatory. Exploring data to provide answers and future projections. For instance, an e-commerce store can leverage data science to understand trending products.
- Communication: Visualization. Crucial for clarity in presenting answers, visualizations can include charts, tables, maps, etc.
- Data Science in Cybersecurity involves:
- Analyzing log events for intelligent insights (Anomaly detection).
- Utilizing SIEMs to gather and correlate large datasets, offering a comprehensive understanding of an organization’s landscape.
- Tracking and understanding emerging threats.
- Analyzing historical events to anticipate future threat landscapes.
We’re utilizing Jupyter Notebooks, an open-source document tool encompassing code, text, and terminal functionalities.
Python: Pandas #
Pandas is a Python library enabling manipulation, processing, and structuring of data. It’s convenient to import with an alias, import pandas as pd.
Pandas: Series #
A series resembles a singular column in a table, utilizing key-value pairs where the key is the index number and the value is the stored data. To create a series:
Create a list: transportation = ['Train', 'Plane', 'Car']
Assign the series to a new variable: transportation_series = pd.Series(transportation)
Print the series: print(transportation_series)
The dtype: denotes the type of object within the series, such as Strings or Integers.
Pandas: DataFrame #
DataFrames expand on a series, serving as a grouping of series akin to a spreadsheet or database. Think of it as a table with rows and columns.
This creates a DataFrame with rows and columns, allowing for efficient data organization and analysis.
Let’s say we only wanted to return a specific row; we can use Pandas’ loc with the index number.
This refers to row #2!
Grouping #
Grouping is a Pandas operation facilitating data categorization and analysis, allowing for comparisons and insights.