Skip to main content

THM: Advent of Cyber 2023 - Day 02 - O Data, All Ye Faithful

·405 words·2 mins
TryHackMe Log-Analysis Python Python-Pandas Python-Matplotlib
eplots.io
Author
eplots.io
Systemcoordinator, Dabble in Cybersecurity, Self-hosting Hobbyist.
Table of Contents
Advent of Cyber 2023 - This article is part of a series.
Part 2: This Article
The second day of Advent of Cyber ‘23 presents a Log Analysis task.
Our objective is to comprehend the network.
Although Pcap isn’t accessible, we’ll utilize Python instead.

Learning Objectives
#

  • Gain an introductory understanding of data science and its application in Cybersecurity.
  • Familiarize ourselves with Python.
  • Utilize popular Python libraries like Pandas and Matplotlib to analyze data.

Overview
#

The session focuses on Data Science and provides a concise overview of the topic. Here are the key takeaways:

  • The essence of data science lies in interpreting data to address questions.
  • Responsibilities of a Data Scientist include:
    • Data Collection: Gathering raw data, such as a list of recent transactions.
    • Data Processing: Converting raw data into a standardized format for analysis (time-consuming!).
    • Data Mining: Clustering/Classification. Identifying relationships, patterns, and correlations.
    • Analysis: Exploratory/Confirmatory. Exploring data to provide answers and future projections. For instance, an e-commerce store can leverage data science to understand trending products.
    • Communication: Visualization. Crucial for clarity in presenting answers, visualizations can include charts, tables, maps, etc.
  • Data Science in Cybersecurity involves:
    • Analyzing log events for intelligent insights (Anomaly detection).
    • Utilizing SIEMs to gather and correlate large datasets, offering a comprehensive understanding of an organization’s landscape.
    • Tracking and understanding emerging threats.
    • Analyzing historical events to anticipate future threat landscapes.

We’re utilizing Jupyter Notebooks, an open-source document tool encompassing code, text, and terminal functionalities.

Python: Pandas
#

Pandas is a Python library enabling manipulation, processing, and structuring of data. It’s convenient to import with an alias, import pandas as pd.

Pandas: Series
#

A series resembles a singular column in a table, utilizing key-value pairs where the key is the index number and the value is the stored data. To create a series:

Create a list: transportation = ['Train', 'Plane', 'Car']
Assign the series to a new variable: transportation_series = pd.Series(transportation)
Print the series: print(transportation_series)

The dtype: denotes the type of object within the series, such as Strings or Integers.

Pandas: DataFrame
#

DataFrames expand on a series, serving as a grouping of series akin to a spreadsheet or database. Think of it as a table with rows and columns.

This creates a DataFrame with rows and columns, allowing for efficient data organization and analysis.

Let’s say we only wanted to return a specific row; we can use Pandas’ loc with the index number.

This refers to row #2!

Grouping
#

Grouping is a Pandas operation facilitating data categorization and analysis, allowing for comparisons and insights.

Advent of Cyber 2023 - This article is part of a series.
Part 2: This Article