{ "cells": [ { "cell_type": "markdown", "id": "c2fce892-a05e-493e-8ba7-659bd9495e00", "metadata": {}, "source": [ "# Fallstudie: Vorhersage von Immobilienpreisen\n", "\n", "## Ziel dieser Fallstudie\n", "\n", "* Anwendung der erlernten Methoden zur Vorhersage von Immobilienpreisen.\n", "* Verwendung eines realen Datensatzes zur Modellierung.\n", "* Umsetzung in Python mit `scikit-learn`.\n", "\n", "## Schritte zur Umsetzung\n", "\n", "1. Daten laden und verstehen\n", "\n", " * Nutzung eines offenen Datensatzes (z.B. California Housing Dataset oder Kaggle Immobilienpreise).\n", " * Untersuchung der Datenverteilung, Korrelationen und möglicher Ausreißer.\n", "\n", "2. Datenvorbereitung\n", "\n", " * Umwandlung kategorischer Merkmale (One-Hot-Encoding).\n", " * Normalisierung und Skalierung numerischer Merkmale.\n", " * Aufteilung in Trainings- und Testdaten.\n", "\n", "3. Modelltraining mit Linearer Regression\n", "\n", " * Trainieren eines linearen Regressionsmodells mit scikit-learn.\n", " * Verwendung von Metriken zur Bewertung der Modellgüte (z.B. MSE, R²).\n", "\n", "4. Modellbewertung und Interpretation\n", "\n", " * Bewertung der Modellperformance auf dem Testdatensatz.\n", " * Interpretation der wichtigsten Einflussgrößen.\n", "\n", "## Code-Beispiel" ] }, { "cell_type": "code", "execution_count": 1, "id": "d7f69c6d-4db0-484d-9c68-c3ec4fce16b9", "metadata": { "execution": { "iopub.execute_input": "2026-03-24T15:32:57.740201Z", "iopub.status.busy": "2026-03-24T15:32:57.740095Z", "iopub.status.idle": "2026-03-24T15:32:57.742815Z", "shell.execute_reply": "2026-03-24T15:32:57.742048Z", "shell.execute_reply.started": "2026-03-24T15:32:57.740192Z" } }, "outputs": [], "source": [ "import sys\n", "# !{sys.executable} -m pip install scikit-learn\n" ] }, { "cell_type": "markdown", "id": "9d22c329-4a99-4a23-af24-6e4b2eafec89", "metadata": {}, "source": [ "### Importe" ] }, { "cell_type": "code", "execution_count": 2, "id": "b9dc4ae8-b7b7-48c2-b4bb-8d8e28cd3f82", "metadata": { "execution": { "iopub.execute_input": "2026-03-24T15:32:57.743235Z", "iopub.status.busy": "2026-03-24T15:32:57.743164Z", "iopub.status.idle": "2026-03-24T15:32:58.836858Z", "shell.execute_reply": "2026-03-24T15:32:58.836398Z", "shell.execute_reply.started": "2026-03-24T15:32:57.743226Z" } }, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.metrics import mean_squared_error, r2_score" ] }, { "cell_type": "markdown", "id": "7346190a-b639-455f-95ef-420470149e57", "metadata": {}, "source": [ "### Beispieldatensatz laden (California Housing Dataset)" ] }, { "cell_type": "code", "execution_count": 3, "id": "950468c1-4b1f-4f1d-987f-8c09ed9f18c9", "metadata": { "execution": { "iopub.execute_input": "2026-03-24T15:32:58.837419Z", "iopub.status.busy": "2026-03-24T15:32:58.837277Z", "iopub.status.idle": "2026-03-24T15:32:58.884204Z", "shell.execute_reply": "2026-03-24T15:32:58.883553Z", "shell.execute_reply.started": "2026-03-24T15:32:58.837410Z" } }, "outputs": [ { "data": { "text/html": [ "
| \n", " | MedInc | \n", "HouseAge | \n", "AveRooms | \n", "AveBedrms | \n", "Population | \n", "AveOccup | \n", "Latitude | \n", "Longitude | \n", "PRICE | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "8.3252 | \n", "41.0 | \n", "6.984127 | \n", "1.023810 | \n", "322.0 | \n", "2.555556 | \n", "37.88 | \n", "-122.23 | \n", "4.526 | \n", "
| 1 | \n", "8.3014 | \n", "21.0 | \n", "6.238137 | \n", "0.971880 | \n", "2401.0 | \n", "2.109842 | \n", "37.86 | \n", "-122.22 | \n", "3.585 | \n", "
| 2 | \n", "7.2574 | \n", "52.0 | \n", "8.288136 | \n", "1.073446 | \n", "496.0 | \n", "2.802260 | \n", "37.85 | \n", "-122.24 | \n", "3.521 | \n", "
| 3 | \n", "5.6431 | \n", "52.0 | \n", "5.817352 | \n", "1.073059 | \n", "558.0 | \n", "2.547945 | \n", "37.85 | \n", "-122.25 | \n", "3.413 | \n", "
| 4 | \n", "3.8462 | \n", "52.0 | \n", "6.281853 | \n", "1.081081 | \n", "565.0 | \n", "2.181467 | \n", "37.85 | \n", "-122.25 | \n", "3.422 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 20635 | \n", "1.5603 | \n", "25.0 | \n", "5.045455 | \n", "1.133333 | \n", "845.0 | \n", "2.560606 | \n", "39.48 | \n", "-121.09 | \n", "0.781 | \n", "
| 20636 | \n", "2.5568 | \n", "18.0 | \n", "6.114035 | \n", "1.315789 | \n", "356.0 | \n", "3.122807 | \n", "39.49 | \n", "-121.21 | \n", "0.771 | \n", "
| 20637 | \n", "1.7000 | \n", "17.0 | \n", "5.205543 | \n", "1.120092 | \n", "1007.0 | \n", "2.325635 | \n", "39.43 | \n", "-121.22 | \n", "0.923 | \n", "
| 20638 | \n", "1.8672 | \n", "18.0 | \n", "5.329513 | \n", "1.171920 | \n", "741.0 | \n", "2.123209 | \n", "39.43 | \n", "-121.32 | \n", "0.847 | \n", "
| 20639 | \n", "2.3886 | \n", "16.0 | \n", "5.254717 | \n", "1.162264 | \n", "1387.0 | \n", "2.616981 | \n", "39.37 | \n", "-121.24 | \n", "0.894 | \n", "
20640 rows × 9 columns
\n", "LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.