Skip to main content

THM: Advent of Cyber 2023 - Day 16 - Can't CAPTCHA this Machine!

·1006 words·5 mins
TryHackMe Machine-Learning Captcha Python
eplots
Author
eplots
Systemcoordinator, Dabble in Cybersecurity, Self-hosting Hobbyist.
Table of Contents
Advent of Cyber 2023 - This article is part of a series.
Part 16: This Article
The sixteenth day of AoC23 contains another Machine Learning task.
We’re going to build a custom brute force script that uses ML to solve a CAPTCHA challenge!

Learning Objectives
#

  • Complex neural network structures
  • How does a convolutional neural networks function?
  • Using neural networks for optical character recognition
  • Integrating neural networks into red team tooling

Convolutional Neural Networks
#

We have in previous days looked at neural network structures. In todays room we look at a structure called convolutional neural network (CNN).

In previous rooms we worked with a concept called shit-in, shit-out to explain how important inputs and data is. A CNN have the ability to extract features that can be used to train a neural network.

We can divide our CNN into three main components:

  • Feature extraction
  • Fully connected layers
  • Classification

The last two (Fully connected layers and Classification) have been covered in previous rooms, so today we’ll focus on the first one, Feature extraction.

Feature Extraction
#

CNNs are often used to classify images. We’re going to crack a CAPTCHA with numbers.

Image Representation
#

When a computer looks at an image, it’s easiest to make it perceive it as a 2D array of pixels. Imagine a picture with white background and some red numbers.

A computer can easily convert the pixels in the image by just looking at the pixel-values. This is done by starting in the top left corner of the image and moving to the right. When we hit the end of the first row, we move down one row and start again to the far left. Each row and column is captured and the pixel-value is saved in a grid to determine what colors are represented.

Two popular formats for pixel values:

  • RGB: Pixel is represented by three numbers from 0 to 255. Red, Blue and Green colours of the pixel making up the color.
  • Greyscale: Pixel is represented by a single number from 0 to 255. 0 means the pixel is fully black, and 255 means the pixel is fully white.

Convolution
#

During the convolution step of the CNN’s feature extraction, we want to reduce the size of the input. We can train a neural network to consider all the pixels in the image but it will be incredibly slow (since an image often have several thousand pixels). We perform convolution to “summarize” the image instead. To do this, we move a kernel matrix across the entire iamge, calculating the summary. The kernel matrix is a smaller 2D array that tells us where we are in the image and where we are currently creating our summary. This kernel slides across the height and width of the image to create a summary image.

Inside the kernel matrix we use a 3*3 section. We then calculate the summary by multiplying each pixel with the value in the kernel.

Pooling
#

The second step performed in the CNN feature extraction process is pooling. The pooling step aims to further summarise the data using a statistical method. For each kernel, we create a summary based on the statistical method.

ML
#

The next steps are very similar to the basic neural network structure that’s been covered in earlier rooms, and therefore wont be written about here again.

Time to crack some CAPTCHA
#

We’re going to use the Attention OCR for our CNN model.

In order to crack CAPTCHAs, we will have to go through the following steps:

  1. Gather CAPTCHAs so we can create labelled data
  2. Label the CAPTCHAs to use in a supervised learning model
  3. Train our CAPTCHA-cracking CNN
  4. Verify and test our CAPTCHA-cracking CNN
  5. Export and host the trained model so we can feed it CAPTCHAs to solve
  6. Create and execute a brute force script that will receive the CAPTCHA, pass it on to be solved, and then run the brute force attack

Steps 1–4 are quite taxing, so they have already been completed for you. We’ll do a quick recap of what these steps involve before moving on to hosting the model and cracking some CAPTCHAs!

We’re going to use a docker container to make this work.

docker run -d -v /tmp/data:/tempdir/ aocr/full

This starts a container that has TensorFlow and AOCR already installed Now we need to connect to the container.

First we need to find the container’s ID: docker ps

Then run the following command: docker exec -it <CONTAINER_ID> /bin/bash

Gathering Training Data
#

Visiting the page (http://hqadmin.thm:8000) inside a VPN connected VM shows a sign-in form with a captcha. If we curl the page we get a base64 encoded version of the CAPTCHA image.

We can write a script to download this image and then prompt us to provide the answer for the CAPTCHA to store in a training dataset. This has already been done for us, but here on my forgejo is the script.

Creating the Training Dataset
#

Next we need to create the training dataset in a format that AOCR can use. We need a simple text file that lists the path for each CAPTCHA and the correct answer. This has already been done for us but here on my forgejo for the script used.

The file labels/training.txt then looks like this:

...[snip]...
dataset/46990.png 46990
dataset/68348.png 68348
dataset/57264.png 57264
dataset/66094.png 66094
dataset/10427.png 10427
dataset/21211.png 21211
dataset/32226.png 32226
dataset/85324.png 85324
dataset/37328.png 37328
dataset/78365.png 78365
dataset/56638.png 56638
...[snip]...

It has to be transformed into a TensorFlow record using aocr dataset ./labels/training.txt ./training.tfrecords.

To start training: cd labels && aocr train training.tfrecords

Let it run for some steps and you have a (pretty basic) trained CNN.

To test the CNN, run aocr test testing.tfrecords.

Let it run for some steps.

The next steps are overkill for this blog-post, but I recommend doing the room for yourself to fully grasp this concept!

Starting from today, I won’t write down the questions/solutions since I think it’s overkill. If i write a cool script or something, I will put it up on the post but nothing else…

And here on my forgejo is the final brute force script!

Advent of Cyber 2023 - This article is part of a series.
Part 16: This Article