Assignment 4 Due Tuesday Apr 22, 11:00am ET

Fine-tuning a GPT2 with One Tiny Cloud Thread

Start early to avoid last-minute issues!

FAQs

Heads-up: Pretrain/fine-tune using a single CPU core can be time-consuming (a single run may be taking 40 minutes to 2 hours). So start early.
For this assignment, recommended EC2 instance type is m5.large with a volume size is 100GB.
Part 2 and Part 3 do NOT reuse the model trained in Part 1. Instead, follow the instructions to download a base GPT2 model using the HuggingFace Transformers API.

Overview

This assignment is for you to gain hands-on experience in LLM training/fine-tuning on AWS EC2. You will just need a single EC2 instance with a 100GB SSD volume. You will then develop two training applications using HuggingFace’s Transformers library:

a “pretrain” program,
and a “fine-tune” program,

to train/fine-tune a small, 124M-parameter GPT2 model.

You will also deploy your fine-tuned model using a Dockerized server and chat with your own custom GPT2 from a local computer, or anywhere on the Internet. You will also prepare a short report detailing your observations, experience, with links to a few demo videos that you will make at the end of this assignment.

This will be even more fun! 😎

Learning outcomes

The learning outcomes of this assignment are to:

Write simple training applications using the Transformers library.
Fine-tune a small language model with different fine-tuning datasets.
Deploy a Docker container for model serving and inference.
Deploy a minimal ChatGPT-like client to enable client-server communication.

Part 0: Environment setup

Like the previous assignments you will complete this assignment on AWS. Hopefully by now you have got yourself familiar with setting things up in AWS via Academy. See Assignment 0 about how to use AWS Academy to access AWS cloud resources. See Assignment 1 about how to change network settings to enable ALL TCP traffic.

In this assignment, you only need to create one EC2 VM instances of type m5.large. (You can use t3.large for this assignment; it’s just that m5.large provides a lightly better and more steady computing power than t3.large.) (Create five EC2 instances of the same You can name it with llm or anything you like.

I don’t recommend reusing your existing EC2 instances from A3, as those instances come with smaller, 50GB GP2 SSD storage. Fine-tuning is both storage-intensive and compute-intensive, so it’s best to configure your VM using a 100GB GP2 SSD volume to ensure you have enough storage space.

Change network settings by adding a new ALL TCP security group

You need to configure the Network settings of all EC2 instances. You can reuse the existing security group that you created for A3.

Add Security Group in Network Setting

For a new EC2 instance, always run apt update before installing any dependencies.

$ sudo apt update
$ sudo apt install -y python3-pip

Install the following Python dependencies including Jupyter Notebook:

$ pip3 install notebook
$ pip3 install torch numpy transformers datasets tiktoken wandb tqdm 'accelerate>=0.26.0' ipywidgets
$ source .profile

You also need to install Docker. First, set up Docker’s apt repository:

$ sudo apt install ca-certificates curl
$ sudo install -m 0755 -d /etc/apt/keyrings
$ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
$ sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
$ echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
$ sudo apt update

Next, install the Docker packages:

$ sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

To verify that the Docker installation is successful, run:

$ sudo docker --version
$ sudo docker run hello-world

Note you need sudo permission to run docker commands.

Finally, download all code skeleton files and shell scripts at this link for Part 1-3.

Part 1: Pretrain a GPT2

Your job in this part is to implement a training process to “pretrain” a GPT2 with a very basic training dataset provided in the code skeleton file a4_part1_gpt2_pretrain_skeleton.ipynb.

Task 1: Your first task for this part is to write code for training setup and initialization.

Task 2: Your second task is to update the learning rate manually for each param_group in optimizer.param_groups.

Task 3: Your third task is to evaluate the model, record the best validation loss, and save a checkpoint file at eval_interval.

Task 4: Your fourth task is to perform forward and backward pass, calculate the average loss value, and prefetch next batch of training sample.

Task 5: Your last task is to run the optimization process.

Each training iteration runs for 1-1.4 seconds on average. So running 2000 iterations will require around 30-40 minutes to complete. You could change the max_iters to shorten or extend the training. With 1500-2000 iterations, the model should be able to answer basic question about GPT based on your predefined training dataset.

Document the training time and prompt test results in your report. See Deliverables for the requirement of a report.

Part 2: Fine-tune a GPT2 using a Q&A dataset

Your job for Part 2 is to implement a training process in a4_part2_finetune_skeleton.ipynb to “fine-tune” a base GPT2 with a dataset called alpaca-1k. This dataset contains 1,000 Q&A entries. The base GPT2 is not able to handle almost all of these Q&As at all. Our goal for Part 2 is to train a fine-tuned GPT2 so that it has some limited ability to handle basic or paraphrased questions similar to those in alpaca-1k. In reality, you’ll find, even after a few hours’ fine-tuning, your custom GPT2 may still struggle with many questions, but it should perform much better than the base GPT2. If you observe such improvement, you’ve successfully achieved the goal for Part 2.

According to the Scaling Laws, “language modeling performance improves smoothly as we increase the model size, dataset size, and amount of compute for training.” So, this seems like the best we can achieve with just half of an MB worth of dataset and a single CPU core — though that’s a bit of a stretch 😅.

You don’t have to download the dataset as the Transformers library will download it automatically for you.

There are three tasks for Part 2:

Task 1: In this task you will just fill the training cell by adding the following code:

Step 1: Set the seed for reproducibility
Step 2: Load model and tokenizer using the Transformer library API
Step 3: Prepare the dataset (dolo650/alpaca-1k)
Step 4: Set up trainer
Step 5: Start the fine-tuning process

Your default fine-tuning process will run for 100 steps (set via max_steps). With 1000 Q&A entries, it will take around 1 hour and a half to 2 hours to complete 100 steps.

A commodity GPU would train much faster than this 🚀. Try it on a free NVIDIA T4 GPU in a Colab notebook and see how much speedup you achieve, though using Colab GPU is not required for this assignment.

Task 2: Your second task is to reduce the training dataset by explicitly limiting the fine-tuning range to the first N Q&A entries. This can be set in prepare_dataset(). This task is divided into two subtasks:

Task 2.1: Configure N in prepare_dataset()
Task 2.2: Modify output_dir in the configurations cell so that your next fine-tuning will not overwrite the existing fine-tuned model from previous runs.

Document the training time and the testing result by varying N in your report.

Task 3: Your third task is to implement a prompt testing program to prompt the fine-tuned model. You will need to add code to a4_part2_test_finetuned_skeleton.ipynb:

Task 3.1: Load a model and a tokenizer
Task 3.2: Write a test prompt that paraphrases a sample in your fine-tuning dataset and tokenize the input
Task 3.3: Generate output tokens and decode it

Report the training times and prompt test results in your report. See Deliverables for the requirement of a report.

Part 3: Fine-tune a GPT2 using your own dataset

The most fun part: you will fine-tune a base GPT2 model using a dataset of your choice. You can pick any dataset you like. There is only one requirement:

The dataset you choose should not be too big. In case you use a large dataset, you need to write code to explicitly select a subset of samples to keep the fine-tuning time reasonable.

NOTE: Part 3 is open-ended, so you can do any types of customization in any part of your code.

Duplicate your .ipynb file from Part 2 and name it a4_part3_finetune_<dataset_name>.ipynb.

Part 3 includes three required tasks and one optional task (for extra credit 🌟):

Task 1: Customize the tokenize_function in prepare_dataset(), as different datasets have different formats.

Task 2: Run the fine-tuning process and document your training time and prompt test results in the report.

Task 3: In this task, you will launch a containerized text generation inference server that hosts your fine-tuned model and interact with it through a Chatbot web interface.

Containerize a Python server for text generation inference. The Docker container is from ghcr.io/huggingface/text-generation-inference. A shell script named launch_gpt2.sh is provided to help you provision and launch a text-generation-inference Docker container on your EC2 VM instance. You will need to change line 3 of the script with the path to your own model directory:

model_path=$PWD/<YOUR_FINE_TUNED_MODEL_DIR>  # Path to your model directory

Before running the Docker container, you need to grant execution permission to launch_gpt2.sh:

$ chmod +x launch_gpt2.sh

To deploy and run the Docker container:

$ ./launch_gpt2.sh

If this is the first time you run the script, it will first download the Docker container from DockerHub and then start the server–it takes several minutes to fully download the container image. You will see a Docker pull log while it’s downloading the container:

ubuntu@ip-170-30-11-11:~/a4$ ./launch_gpt2.sh
Unable to find image 'ghcr.io/huggingface/text-generation-inference:3.1.0' locally
3.1.0: Pulling from huggingface/text-generation-inference
aece8493d397: Pull complete
45f7ea5367fe: Pull complete
3d97a47c3c73: Pull complete
 ... # Omitted a bunch of Pull completes in the middle...
8a050b7ef1d1: Pull complete
7975bedcb14f: Pull complete
3c346d5d6340: Pull complete
Digest: sha256:da2982809d778a3e81e2d459c4aa516c50f6add2d7dcbf1180fc973fd6ecf379
Status: Downloaded newer image for ghcr.io/huggingface/text-generation-inference:3.1.0

After the container is downloaded, it takes about a minute to fully start the server. Wait until you see a Connected message shown as the following:

2025-03-28T02:16:12.905714Z  INFO text_generation_router::server: router/src/server.rs:2423: Connected

Verify that the text generation inference server container image has been downloaded by running:

ubuntu@ip-170-30-11-11:~/a4$ sudo docker images
REPOSITORY                                      TAG       IMAGE ID       CREATED       SIZE
ghcr.io/huggingface/text-generation-inference   3.1.0     6724f1647f1b   8 weeks ago   12.2GB

A prompt.sh script is provided for communication with the server. Again, make sure to give it execution permission by running chmod +x prompt.sh.

To run the script:

$ ./prompt.sh "<YOUR_PROMPT_TEXT>" <MAX_OUTPUT_TOKEN_LEN>

Replace <YOUR_PROMPT_TEXT> and <MAX_OUTPUT_TOKEN_LEN> with your prompt test. An example script command looks like the following:

$ ./prompt.sh "Provide a list of three random colors." 100

If you can see a returned output that looks like the output tokens generated by your model, congratulations 🎉 – your minimalist chatbot service is up and running!

NOTE: If you run prompt.sh from a local computer, you will need to replace the localhost with the public IPv4 DNS address of your EC2 instance.

Having a good interface frontend is part of a good data system design. Now that you have a Dockerized backend server running, next, you will deploy a simple frontend for your own model.

To do so, download the HTML file at this link to a local computer (NOT your EC2 instance). This HTML is your minimal ChatGPT-style client (a minimalist version), which connects to your Dockerized fine-tuned model running on your EC2.

NOTE: Replace <EC2_public_IPv4_DNS_address> in line 81 with your actual EC2 IPv4 DNS address.

Open it in a browser, you can now chat with your Dockerized chatbot.

chatbot web interface1

chatbot web interface2

What you need to do for Task 3:

Record a short video demo that captures the following process:

Launching your Dockerized backend model server on EC2,
Opening your minimal ChatGPT-style client webpage in your browser,
Having a few rounds of conversation with your chatbot.

Task 4 (extra credit 🌟): This optional task encourages you to do something interesting! In this task, you will do some vibe coding to design / create / renovate a better client interface. This interface could be a complete redesign based on the HTML that I provided, or an enhancement with additional functionality–such as configuring the output token sequence length, temperature or uploading a JSON file as a prompt. You’re also welcome to use more advanced frontend tools like React, Node.js, and Tailwind.

Again, record a short video demo that captures the following process:

Launching your Dockerized backend model server on EC2,
Opening your advanced client webpage in your browser,
Having a few rounds of interactions with your chatbot.

Deliverables

Finally, you should submit a tar.gz file to Canvas, which follows the naming convention of LastName_FirstName_ComputingID_A4.tar.gz. The submitted file should include:

a .ipynb file containing the code of all tasks for Part 1: pretraining a GPT2 (filename: LastName_FirstName_ComputingID_A4_Part1.ipynb),
a .ipynb file containing the code of Task 1-2 for Part 2: fine-tuning a GPT2 using dolo650/alpaka_1k dataset (filename: LastName_FirstName_ComputingID_A4_Part2_FineTuning.ipynb),
a .ipynb file containing the code of Task 3 for Part 2: prompt testing your fine-tuned model (filename: LastName_FirstName_ComputingID_A4_Part2_Testing.ipynb),
two .ipynb files of Part 3: one for the custom dataset fine-tuning code (filename: LastName_FirstName_ComputingID_A4_Part3_FineTuning_DatasetName.ipynb and the other for the testing code (filename: LastName_FirstName_ComputingID_A4_Part3_Testing_DatasetName.ipynb),
a short PDF report including major training results and URL(s) of your video recording(s) (filename: LastName_FirstName_ComputingID_A4_report.pdf),
(extra credit) a folder containing your implementation of the enhanced chatbot client interface.

The skeleton we supplied is well documented with clearly marked TODOs to tell you where you should add your own code. Your code should be well commented as well. The code comment should include Part number and Task number and include a brief description of the implementation. Comments are worth some percentage of your grade for the assignment, the GTA will be looking at your code as well as the comments (comments serve as a README for a particular task).

In addition to including comments, make sure to keep all cell outputs saved in your submitted .ipynb file for grading.

For the PDF report, in addition to the training time and prompt testing results for Part 1, Part 2, and Part 3, you should also include the link to a viewable video recording for Part 3 (and optionally, a separate link to a video recording for Part 3 Task 4). You can use Zoom to make the video and share the Zoom recording via URL. Or, alternatively, you can upload your video files to Google Drive and share the viewable Google Drive video file via URL. If you use Google Drive, make sure to test the link in incognito mode to ensure the video is playable. DO NOT include any video files to your submitted tar.gz file.

You’re more than welcome to share anything that you’ve learned from this assignment, from observations and insights that you learned, new skills that you picked up, to something that you would have liked to try but did not get a chance to, just to list a few.