{ "cells": [ { "cell_type": "markdown", "id": "11bb4456-5dc2-440c-9320-4af43e032aeb", "metadata": {}, "source": [ "# Hands-on Beispiel LLM (2)" ] }, { "cell_type": "markdown", "id": "47a74c0f-6038-4871-9038-df59a4be02a1", "metadata": {}, "source": [ "### 2. Fine-tuning - Anpassung an juristische Fachtexte\n", "##### --- Juristische Fragen an ein fine-tuned Modell (Lokale LLM)\n", "\n", "In diesem Abschnitt fine-tunen wir das Modell `dbmdz/german-gpt2` und stellen ihm die gleichen zwei juristischen Fragen zum AI Act wie im Baseline-Notebook.\n", "\n", "Ziel ist es, dass das feingetunte Modell (llm-2) nun fundiertere und korrekte Antworten liefert.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "8f0f45fe-361d-4965-a821-e69419fced67", "metadata": { "execution": { "iopub.execute_input": "2026-03-24T18:13:06.354720Z", "iopub.status.busy": "2026-03-24T18:13:06.354568Z", "iopub.status.idle": "2026-03-24T18:13:06.359355Z", "shell.execute_reply": "2026-03-24T18:13:06.357651Z", "shell.execute_reply.started": "2026-03-24T18:13:06.354702Z" } }, "outputs": [], "source": [ "# falls noch nicht installiert \n", "\n", "import sys\n", "# !{sys.executable} -m pip install transformers datasets\n", "# !{sys.executable} -m pip install 'accelerate>=1.10.0'" ] }, { "cell_type": "code", "execution_count": 2, "id": "7e552b09-89c0-4168-8e6c-c59babe09ea5", "metadata": { "execution": { "iopub.execute_input": "2026-03-24T18:13:06.360112Z", "iopub.status.busy": "2026-03-24T18:13:06.359959Z", "iopub.status.idle": "2026-03-24T18:13:10.070226Z", "shell.execute_reply": "2026-03-24T18:13:10.069715Z", "shell.execute_reply.started": "2026-03-24T18:13:06.360096Z" } }, "outputs": [], "source": [ "import torch\n", "from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForLanguageModeling\n", "from datasets import Dataset" ] }, { "cell_type": "code", "execution_count": 3, "id": "6b807925-978d-4d97-8765-ad2beaf99097", "metadata": { "execution": { "iopub.execute_input": "2026-03-24T18:13:10.070728Z", "iopub.status.busy": "2026-03-24T18:13:10.070543Z", "iopub.status.idle": "2026-03-24T18:13:12.602873Z", "shell.execute_reply": "2026-03-24T18:13:12.602349Z", "shell.execute_reply.started": "2026-03-24T18:13:10.070719Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "00edbe7a58d546d586443954281f376b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading weights: 0%| | 0/148 [00:00\n", " \n", " \n", " [6/6 00:02, Epoch 3/3]\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StepTraining Loss
12.830693
23.512209
32.700200
43.442378
52.657775
63.438793

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0a35267676b748888838431fc72a33fc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Writing model shards: 0%| | 0/1 [00:00