{
"cells": [
{
"cell_type": "markdown",
"id": "f417dc67-8099-4e04-bfdb-3cd8602a72c4",
"metadata": {},
"source": [
"# Praxisbeispiel - Telko Vertragskundenabgang-Vorhersage - Daten simulieren/kreieren\n",
"\n",
"Da der Original-Datensatz von kaggle schon recht weit vorbereitet ist als Feature-Matrix, möchten wir unseren eigenen dazu passenden Datensatz erstellen, um zu sehen, wie Feature Engineering auf dynamischen Datensätzen aussieht. \n",
"\n",
"Hierfür nehmen wir uns das Beispiel des Kunden-Services. \n",
"\n",
"Wiefolgt sind die Schritte für die Kreierung eines solchen simulierten Datensatzes: \n",
"\n",
"1. **Original-Datensatz** laden\n",
"2. **Stichtag und Vertragsbeginn** definieren:\n",
" Setze den Stichtag auf den 31.12.2024. Für jeden Kunden berechnen wir einen simulierten Vertragsstart, indem wir den Stichtag um die Anzahl der Monate (aus der Spalte tenure) zurückrechnen. \n",
"\n",
"3. **Anzahl der Tickets pro Kunde** simulieren:\n",
"Wir simulieren bzw bestimmen für jeden Kunden zufällig (z. B. basierend auf einer Poisson-Verteilung oder einer zufälligen Zahl), wie viele Tickets (also Interaktions-Threads) im Betrachtungszeitraum generiert werden.\n",
"\n",
"4.\t**Ticket-Interaktionen** erzeugen:\n",
"Für jedes Ticket erstellen wir mehrere Interaktionen (z. B. eine erste Anfrage und dann ein bis zwei Follow-ups).\n",
"Dabei werden folgende Spalten befüllt:\n",
"\n",
"\t•\t**ticket_id**: Eindeutige Kennung (zum Beispiel als Kombination aus Customer-ID und einem Zähler).\n",
"\n",
" •\t**customer_id**: Übernimmt den Wert aus dem Telco-Datensatz.\n",
"\t\n",
" •\t**time_request**: Zufälliger Zeitpunkt der Anfrage zwischen Vertragsstart und Stichtag.\n",
"\t\n",
" •\t**time_reply**: Ein Zeitpunkt, der einige Stunden oder Tage nach time_request liegt.\n",
"\t\n",
" •\t**channel**: Zufällig ausgewählter Kommunikationskanal (z. B. “Email”, “Hotline”, “On-site”, “Chat”, “Social Media”).\n",
"\t\n",
" •\t**request und reply**: Simulierte Texte (z. B. “Problem mit Rechnung” bzw. “Problem gelöst”).\n",
"\t\n",
" •\t**solved**: Boolean, der anzeigt, ob das Problem mit der Antwort gelöst wurde (z. B. basierend auf Stichwörtern im Antworttext).\n",
"\t\n",
" •\t**original_request**: True, wenn es sich um die erste Interaktion eines Tickets handelt, sonst False.\n",
"\t\n",
" •\t**original_ticket_id**: Bei der ersten Interaktion bleibt dieser Wert None, ansonsten verweist er auf die ursprüngliche Ticket-ID.\n",
"\n",
"6.\t**Zusätzliche Spalten** (optional:\n",
"Optional können wir weitere Informationen wie beispielsweise den bearbeitenden Agenten, Ticket-Priorität oder Notizen hinzufügen."
]
},
{
"cell_type": "markdown",
"id": "00f525a4-b2bf-4ee8-9d0f-17f952de0eed",
"metadata": {},
"source": [
"## Datensatz \n",
"\n",
"Original Datensatz:\n",
"\n",
"https://www.kaggle.com/datasets/blastchar/telco-customer-churn "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8ee77e59-bc5e-49ce-9b96-a735aa3822e3",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:32.952667Z",
"iopub.status.busy": "2026-03-24T18:07:32.952583Z",
"iopub.status.idle": "2026-03-24T18:07:32.956544Z",
"shell.execute_reply": "2026-03-24T18:07:32.955992Z",
"shell.execute_reply.started": "2026-03-24T18:07:32.952656Z"
}
},
"outputs": [],
"source": [
"import sys\n",
"# !{sys.executable} -m pip install kagglehub\n",
"# !{sys.executable} -m pip install plotly\n",
"# !{sys.executable} -m pip install missingno"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3e492ddf-3d00-42f1-ba7e-8781f8ba1752",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:32.957053Z",
"iopub.status.busy": "2026-03-24T18:07:32.956984Z",
"iopub.status.idle": "2026-03-24T18:07:33.099751Z",
"shell.execute_reply": "2026-03-24T18:07:33.099259Z",
"shell.execute_reply.started": "2026-03-24T18:07:32.957045Z"
}
},
"outputs": [],
"source": [
"import kagglehub\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bd1c7c90-db0a-4de3-84df-bafbdc2010e7",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:33.100182Z",
"iopub.status.busy": "2026-03-24T18:07:33.100037Z",
"iopub.status.idle": "2026-03-24T18:07:33.392406Z",
"shell.execute_reply": "2026-03-24T18:07:33.391701Z",
"shell.execute_reply.started": "2026-03-24T18:07:33.100170Z"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import random\n",
"from datetime import datetime, timedelta\n",
"from dateutil.relativedelta import relativedelta\n",
"\n",
"\n",
"pd.set_option('display.max_columns', None)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b58bd575-3ead-42fc-89fd-7f2fa8f86e09",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:33.392878Z",
"iopub.status.busy": "2026-03-24T18:07:33.392733Z",
"iopub.status.idle": "2026-03-24T18:07:34.306496Z",
"shell.execute_reply": "2026-03-24T18:07:34.305474Z",
"shell.execute_reply.started": "2026-03-24T18:07:33.392869Z"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import plotly.express as px\n",
"import plotly.graph_objects as go\n",
"from plotly.subplots import make_subplots\n",
"import warnings"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "db7881ec-b4ee-4b1d-8be4-6defa5ba0d44",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:34.307001Z",
"iopub.status.busy": "2026-03-24T18:07:34.306856Z",
"iopub.status.idle": "2026-03-24T18:07:34.583561Z",
"shell.execute_reply": "2026-03-24T18:07:34.583069Z",
"shell.execute_reply.started": "2026-03-24T18:07:34.306992Z"
},
"scrolled": true
},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler, LabelEncoder\n",
"\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.naive_bayes import GaussianNB\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.svm import SVC\n",
"from sklearn.neural_network import MLPClassifier\n",
"from sklearn.ensemble import AdaBoostClassifier\n",
"from sklearn.ensemble import GradientBoostingClassifier\n",
"from sklearn.ensemble import ExtraTreesClassifier\n",
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"from sklearn import metrics\n",
"from sklearn.metrics import roc_curve\n",
"from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report\n",
"\n",
"# from xgboost import XGBClassifier\n",
"# from catboost import CatBoostClassifier"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "342a585f-0c81-4945-ac31-e3cb18f0e5df",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:34.584093Z",
"iopub.status.busy": "2026-03-24T18:07:34.583920Z",
"iopub.status.idle": "2026-03-24T18:07:34.974155Z",
"shell.execute_reply": "2026-03-24T18:07:34.973336Z",
"shell.execute_reply.started": "2026-03-24T18:07:34.584084Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Path to files: /Users/veit/.cache/kagglehub/datasets/blastchar/telco-customer-churn/versions/1\n"
]
}
],
"source": [
"# Download data from Kaggle\n",
"path = kagglehub.dataset_download(\"blastchar/telco-customer-churn\")\n",
"\n",
"print(\"Path to files:\", path)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c290009f-1838-445a-b214-eb8177f33f84",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:34.974999Z",
"iopub.status.busy": "2026-03-24T18:07:34.974827Z",
"iopub.status.idle": "2026-03-24T18:07:34.978519Z",
"shell.execute_reply": "2026-03-24T18:07:34.977099Z",
"shell.execute_reply.started": "2026-03-24T18:07:34.974982Z"
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"~/.cache/kagglehub/datasets/blastchar/telco-customer-churn/versions/1/WA_Fn-UseC_-Telco-Customer-Churn.csv\n"
]
}
],
"source": [
"path_file = \"~/.cache/kagglehub/datasets/blastchar/telco-customer-churn/versions/1/WA_Fn-UseC_-Telco-Customer-Churn.csv\"\n",
"print(path_file)"
]
},
{
"cell_type": "markdown",
"id": "6ca0ade6-ace5-4ef1-9cc9-0f40bd4e1c45",
"metadata": {},
"source": [
"## Datensatz \n",
"\n",
"Lade den Original-Datensatz"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "ee8380c5-4ae6-4e3c-babc-bd8c77ee0cb4",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:34.979623Z",
"iopub.status.busy": "2026-03-24T18:07:34.979363Z",
"iopub.status.idle": "2026-03-24T18:07:35.018025Z",
"shell.execute_reply": "2026-03-24T18:07:35.017471Z",
"shell.execute_reply.started": "2026-03-24T18:07:34.979598Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" customerID | \n",
" gender | \n",
" SeniorCitizen | \n",
" Partner | \n",
" Dependents | \n",
" tenure | \n",
" PhoneService | \n",
" MultipleLines | \n",
" InternetService | \n",
" OnlineSecurity | \n",
" OnlineBackup | \n",
" DeviceProtection | \n",
" TechSupport | \n",
" StreamingTV | \n",
" StreamingMovies | \n",
" Contract | \n",
" PaperlessBilling | \n",
" PaymentMethod | \n",
" MonthlyCharges | \n",
" TotalCharges | \n",
" Churn | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 7590-VHVEG | \n",
" Female | \n",
" 0 | \n",
" Yes | \n",
" No | \n",
" 1 | \n",
" No | \n",
" No phone service | \n",
" DSL | \n",
" No | \n",
" Yes | \n",
" No | \n",
" No | \n",
" No | \n",
" No | \n",
" Month-to-month | \n",
" Yes | \n",
" Electronic check | \n",
" 29.85 | \n",
" 29.85 | \n",
" No | \n",
"
\n",
" \n",
" | 1 | \n",
" 5575-GNVDE | \n",
" Male | \n",
" 0 | \n",
" No | \n",
" No | \n",
" 34 | \n",
" Yes | \n",
" No | \n",
" DSL | \n",
" Yes | \n",
" No | \n",
" Yes | \n",
" No | \n",
" No | \n",
" No | \n",
" One year | \n",
" No | \n",
" Mailed check | \n",
" 56.95 | \n",
" 1889.5 | \n",
" No | \n",
"
\n",
" \n",
" | 2 | \n",
" 3668-QPYBK | \n",
" Male | \n",
" 0 | \n",
" No | \n",
" No | \n",
" 2 | \n",
" Yes | \n",
" No | \n",
" DSL | \n",
" Yes | \n",
" Yes | \n",
" No | \n",
" No | \n",
" No | \n",
" No | \n",
" Month-to-month | \n",
" Yes | \n",
" Mailed check | \n",
" 53.85 | \n",
" 108.15 | \n",
" Yes | \n",
"
\n",
" \n",
" | 3 | \n",
" 7795-CFOCW | \n",
" Male | \n",
" 0 | \n",
" No | \n",
" No | \n",
" 45 | \n",
" No | \n",
" No phone service | \n",
" DSL | \n",
" Yes | \n",
" No | \n",
" Yes | \n",
" Yes | \n",
" No | \n",
" No | \n",
" One year | \n",
" No | \n",
" Bank transfer (automatic) | \n",
" 42.30 | \n",
" 1840.75 | \n",
" No | \n",
"
\n",
" \n",
" | 4 | \n",
" 9237-HQITU | \n",
" Female | \n",
" 0 | \n",
" No | \n",
" No | \n",
" 2 | \n",
" Yes | \n",
" No | \n",
" Fiber optic | \n",
" No | \n",
" No | \n",
" No | \n",
" No | \n",
" No | \n",
" No | \n",
" Month-to-month | \n",
" Yes | \n",
" Electronic check | \n",
" 70.70 | \n",
" 151.65 | \n",
" Yes | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 7038 | \n",
" 6840-RESVB | \n",
" Male | \n",
" 0 | \n",
" Yes | \n",
" Yes | \n",
" 24 | \n",
" Yes | \n",
" Yes | \n",
" DSL | \n",
" Yes | \n",
" No | \n",
" Yes | \n",
" Yes | \n",
" Yes | \n",
" Yes | \n",
" One year | \n",
" Yes | \n",
" Mailed check | \n",
" 84.80 | \n",
" 1990.5 | \n",
" No | \n",
"
\n",
" \n",
" | 7039 | \n",
" 2234-XADUH | \n",
" Female | \n",
" 0 | \n",
" Yes | \n",
" Yes | \n",
" 72 | \n",
" Yes | \n",
" Yes | \n",
" Fiber optic | \n",
" No | \n",
" Yes | \n",
" Yes | \n",
" No | \n",
" Yes | \n",
" Yes | \n",
" One year | \n",
" Yes | \n",
" Credit card (automatic) | \n",
" 103.20 | \n",
" 7362.9 | \n",
" No | \n",
"
\n",
" \n",
" | 7040 | \n",
" 4801-JZAZL | \n",
" Female | \n",
" 0 | \n",
" Yes | \n",
" Yes | \n",
" 11 | \n",
" No | \n",
" No phone service | \n",
" DSL | \n",
" Yes | \n",
" No | \n",
" No | \n",
" No | \n",
" No | \n",
" No | \n",
" Month-to-month | \n",
" Yes | \n",
" Electronic check | \n",
" 29.60 | \n",
" 346.45 | \n",
" No | \n",
"
\n",
" \n",
" | 7041 | \n",
" 8361-LTMKD | \n",
" Male | \n",
" 1 | \n",
" Yes | \n",
" No | \n",
" 4 | \n",
" Yes | \n",
" Yes | \n",
" Fiber optic | \n",
" No | \n",
" No | \n",
" No | \n",
" No | \n",
" No | \n",
" No | \n",
" Month-to-month | \n",
" Yes | \n",
" Mailed check | \n",
" 74.40 | \n",
" 306.6 | \n",
" Yes | \n",
"
\n",
" \n",
" | 7042 | \n",
" 3186-AJIEK | \n",
" Male | \n",
" 0 | \n",
" No | \n",
" No | \n",
" 66 | \n",
" Yes | \n",
" No | \n",
" Fiber optic | \n",
" Yes | \n",
" No | \n",
" Yes | \n",
" Yes | \n",
" Yes | \n",
" Yes | \n",
" Two year | \n",
" Yes | \n",
" Bank transfer (automatic) | \n",
" 105.65 | \n",
" 6844.5 | \n",
" No | \n",
"
\n",
" \n",
"
\n",
"
7043 rows × 21 columns
\n",
"
"
],
"text/plain": [
" customerID gender SeniorCitizen Partner Dependents tenure \\\n",
"0 7590-VHVEG Female 0 Yes No 1 \n",
"1 5575-GNVDE Male 0 No No 34 \n",
"2 3668-QPYBK Male 0 No No 2 \n",
"3 7795-CFOCW Male 0 No No 45 \n",
"4 9237-HQITU Female 0 No No 2 \n",
"... ... ... ... ... ... ... \n",
"7038 6840-RESVB Male 0 Yes Yes 24 \n",
"7039 2234-XADUH Female 0 Yes Yes 72 \n",
"7040 4801-JZAZL Female 0 Yes Yes 11 \n",
"7041 8361-LTMKD Male 1 Yes No 4 \n",
"7042 3186-AJIEK Male 0 No No 66 \n",
"\n",
" PhoneService MultipleLines InternetService OnlineSecurity \\\n",
"0 No No phone service DSL No \n",
"1 Yes No DSL Yes \n",
"2 Yes No DSL Yes \n",
"3 No No phone service DSL Yes \n",
"4 Yes No Fiber optic No \n",
"... ... ... ... ... \n",
"7038 Yes Yes DSL Yes \n",
"7039 Yes Yes Fiber optic No \n",
"7040 No No phone service DSL Yes \n",
"7041 Yes Yes Fiber optic No \n",
"7042 Yes No Fiber optic Yes \n",
"\n",
" OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies \\\n",
"0 Yes No No No No \n",
"1 No Yes No No No \n",
"2 Yes No No No No \n",
"3 No Yes Yes No No \n",
"4 No No No No No \n",
"... ... ... ... ... ... \n",
"7038 No Yes Yes Yes Yes \n",
"7039 Yes Yes No Yes Yes \n",
"7040 No No No No No \n",
"7041 No No No No No \n",
"7042 No Yes Yes Yes Yes \n",
"\n",
" Contract PaperlessBilling PaymentMethod \\\n",
"0 Month-to-month Yes Electronic check \n",
"1 One year No Mailed check \n",
"2 Month-to-month Yes Mailed check \n",
"3 One year No Bank transfer (automatic) \n",
"4 Month-to-month Yes Electronic check \n",
"... ... ... ... \n",
"7038 One year Yes Mailed check \n",
"7039 One year Yes Credit card (automatic) \n",
"7040 Month-to-month Yes Electronic check \n",
"7041 Month-to-month Yes Mailed check \n",
"7042 Two year Yes Bank transfer (automatic) \n",
"\n",
" MonthlyCharges TotalCharges Churn \n",
"0 29.85 29.85 No \n",
"1 56.95 1889.5 No \n",
"2 53.85 108.15 Yes \n",
"3 42.30 1840.75 No \n",
"4 70.70 151.65 Yes \n",
"... ... ... ... \n",
"7038 84.80 1990.5 No \n",
"7039 103.20 7362.9 No \n",
"7040 29.60 346.45 No \n",
"7041 74.40 306.6 Yes \n",
"7042 105.65 6844.5 No \n",
"\n",
"[7043 rows x 21 columns]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(path_file)\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "82bd0ec8-f6ee-49cc-9f01-110dbcf8bef8",
"metadata": {},
"source": [
"Der Datensatz enthält Informationen über:\n",
"\n",
"- ob der Kunde innerhalb des letzten Monats gekündigt hat - Spalte \"Churn\"\n",
"\n",
"- Dienste, für die sich jeder Kunde angemeldet hat - \"PhoneService\", \"MultipleLines\", \"InternetService\", \"OnlineSecurity\", \"OnlineBackup\", \"DeviceProtection\", \"TechSupport\" sowie \"StreamingTV\" und \"StreamingMovies\". \n",
"\n",
"- Informationen zum Kundenkonto - \"tenure\" wie lange der Kunde bereits Kunde ist, \"Contract\" (Vertrag), \"PaperlessBilling\" (papierlose Abrechnung), \"PaymentMethod\" (Zahlungsmethode), \"MonthlyCharges\" (monatliche Gebühren) und \"TotalCharges\" (Gesamtgebühren).\n",
"\n",
"- Demografische Informationen über Kunden - \"gender\" (Geschlecht), \"SeniorCitizen\" (Alter) und ob sie \"Partner\" (Partner) und \"Dependents\" (Familienangehörige) haben."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3591c412-b42d-40a0-ade8-2c9ae2564919",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.018396Z",
"iopub.status.busy": "2026-03-24T18:07:35.018309Z",
"iopub.status.idle": "2026-03-24T18:07:35.020176Z",
"shell.execute_reply": "2026-03-24T18:07:35.019802Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.018387Z"
}
},
"outputs": [],
"source": [
"# Für Reproduzierbarkeit definieren wir den selben Seed für den Randomisierung \n",
"\n",
"np.random.seed(42)\n",
"random.seed(42)"
]
},
{
"cell_type": "markdown",
"id": "abb21a06-cbc3-4615-a6a4-081bfd3232c2",
"metadata": {},
"source": [
"### Stichtag \n",
"\n",
"Für unsere Simulation definieren wir einen Stichtag - das ist der Tag, an dem der Datensatz gezogen wird. Das ist auch die Referenz für das Startdatum der Tenure eines Kunden, somit dürften keine Datenpunkte vor dem berechneten Start-datum oder nach diesem Stichtag kreiert werden. \n",
"\n",
"Der Stichtag in diesem Beispiel-Code ist der 2024-12-31. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "74ad941f-d4f3-4644-b9e5-421b669573b8",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.020490Z",
"iopub.status.busy": "2026-03-24T18:07:35.020432Z",
"iopub.status.idle": "2026-03-24T18:07:35.024156Z",
"shell.execute_reply": "2026-03-24T18:07:35.023800Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.020484Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2024-12-31 00:00:00')"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 2. Stichtag definieren\n",
"stichtag = pd.to_datetime(\"2024-12-31\")\n",
"stichtag"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "e38d6d92-ec80-4044-8bf9-2701d00bc601",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.024495Z",
"iopub.status.busy": "2026-03-24T18:07:35.024427Z",
"iopub.status.idle": "2026-03-24T18:07:35.078846Z",
"shell.execute_reply": "2026-03-24T18:07:35.078414Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.024488Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" customerID | \n",
" tenure | \n",
" contract_start | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 7590-VHVEG | \n",
" 1 | \n",
" 2024-11-30 | \n",
"
\n",
" \n",
" | 1 | \n",
" 5575-GNVDE | \n",
" 34 | \n",
" 2022-02-28 | \n",
"
\n",
" \n",
" | 2 | \n",
" 3668-QPYBK | \n",
" 2 | \n",
" 2024-10-31 | \n",
"
\n",
" \n",
" | 3 | \n",
" 7795-CFOCW | \n",
" 45 | \n",
" 2021-03-31 | \n",
"
\n",
" \n",
" | 4 | \n",
" 9237-HQITU | \n",
" 2 | \n",
" 2024-10-31 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" customerID tenure contract_start\n",
"0 7590-VHVEG 1 2024-11-30\n",
"1 5575-GNVDE 34 2022-02-28\n",
"2 3668-QPYBK 2 2024-10-31\n",
"3 7795-CFOCW 45 2021-03-31\n",
"4 9237-HQITU 2 2024-10-31"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Berechne den Vertragsstart als: stichtag - tenure in Monaten\n",
"df[\"contract_start\"] = df[\"tenure\"].apply(lambda t: stichtag - relativedelta(months=int(t)))\n",
"df[[\"customerID\", \"tenure\", \"contract_start\"]].head()"
]
},
{
"cell_type": "markdown",
"id": "fe754f01-2a94-4068-a092-40c30b2e4f49",
"metadata": {},
"source": [
"## Funktion zur Berechnung des Reply-Zeitpunkts\n",
"\n",
"Hier definieren wir eine Funktion get_reply_time, die für einen gegebenen Anfrage-Zeitpunkt (request_time) den Reply-Zeitpunkt simuliert.\n",
"\n",
"Die Bedingungen sind:\n",
"\n",
"- Ist die Anfrage am Stichtag, erfolgt die Antwort zwangsläufig am selben Tag.\n",
"- Sonst wird zufällig entschieden, ob die Antwort am selben Tag oder am nächsten Tag erfolgt.\n",
"- Falls der nächste Tag den Stichtag überschreiten würde, wird ebenfalls der gleiche Tag gewählt."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "6ce5b666-3c56-438d-b57e-5f9de5c925e7",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.079396Z",
"iopub.status.busy": "2026-03-24T18:07:35.079308Z",
"iopub.status.idle": "2026-03-24T18:07:35.083128Z",
"shell.execute_reply": "2026-03-24T18:07:35.082580Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.079384Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anfragezeit: 2024-12-30 15:00:00\n",
"Antwortzeit: 2024-12-30 16:13:39\n"
]
}
],
"source": [
"def get_reply_time(request_time, stichtag):\n",
" \"\"\"\n",
" Berechnet einen Reply-Zeitpunkt, der entweder am selben Tag\n",
" oder am Tag nach der Anfrage liegt, unter Berücksichtigung des Stichtags.\n",
" \"\"\"\n",
" # Wenn die Anfrage bereits am Stichtag erfolgt, muss die Antwort am selben Tag erfolgen.\n",
" if request_time.date() >= stichtag.date():\n",
" reply_day_offset = 0\n",
" else:\n",
" reply_day_offset = random.choice([0, 1])\n",
" \n",
" if reply_day_offset == 0:\n",
" # Antwort am selben Tag: Wähle eine zufällige Verzögerung, sodass die Antwort\n",
" # zwischen 1 Stunde nach der Anfrage und bis zum Tagesende erfolgt.\n",
" end_of_day = request_time.replace(hour=23, minute=59, second=59)\n",
" max_delay_seconds = int((end_of_day - request_time).total_seconds())\n",
" # Falls genügend Zeit vorhanden ist, mindestens 1 Stunde Verzögerung, sonst minimal\n",
" min_delay = 3600 if max_delay_seconds >= 3600 else 60\n",
" delay_seconds = random.randint(min_delay, max_delay_seconds) if max_delay_seconds >= min_delay else 0\n",
" reply_time = request_time + timedelta(seconds=delay_seconds)\n",
" else:\n",
" # Antwort am nächsten Tag: Stelle sicher, dass der nächste Tag den Stichtag nicht überschreitet.\n",
" next_day = request_time + timedelta(days=1)\n",
" if next_day > stichtag:\n",
" # Fallback: Antworte am selben Tag\n",
" return get_reply_time(request_time, stichtag)\n",
" # Wähle eine zufällige Uhrzeit am nächsten Tag\n",
" random_hour = random.randint(0, 23)\n",
" random_minute = random.randint(0, 59)\n",
" random_second = random.randint(0, 59)\n",
" reply_time = datetime.combine(next_day.date(), datetime.min.time()) + timedelta(hours=random_hour, minutes=random_minute, seconds=random_second)\n",
" \n",
" return reply_time\n",
"\n",
"# Teste die Funktion einmal:\n",
"test_request = datetime(2024, 12, 30, 15, 0, 0)\n",
"print(\"Anfragezeit:\", test_request)\n",
"print(\"Antwortzeit:\", get_reply_time(test_request, stichtag))"
]
},
{
"cell_type": "markdown",
"id": "0b4df1b4-b8b8-4ed3-8528-a4414189d1b1",
"metadata": {},
"source": [
"### Vor-test: Simuliere Interaktionen für einen einzelnen Kunden (Beispiel anhand des ersten Kunden)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "38bf9fd8-d574-4041-8114-ad1c14ff3b90",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.083544Z",
"iopub.status.busy": "2026-03-24T18:07:35.083462Z",
"iopub.status.idle": "2026-03-24T18:07:35.086881Z",
"shell.execute_reply": "2026-03-24T18:07:35.086505Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.083536Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"customerID 7590-VHVEG\n",
"gender Female\n",
"SeniorCitizen 0\n",
"Partner Yes\n",
"Dependents No\n",
"tenure 1\n",
"PhoneService No\n",
"MultipleLines No phone service\n",
"InternetService DSL\n",
"OnlineSecurity No\n",
"OnlineBackup Yes\n",
"DeviceProtection No\n",
"TechSupport No\n",
"StreamingTV No\n",
"StreamingMovies No\n",
"Contract Month-to-month\n",
"PaperlessBilling Yes\n",
"PaymentMethod Electronic check\n",
"MonthlyCharges 29.85\n",
"TotalCharges 29.85\n",
"Churn No\n",
"contract_start 2024-11-30 00:00:00\n",
"Name: 0, dtype: object"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Wähle den ersten Kunden aus dem Datensatz\n",
"customer_row = df.iloc[0]\n",
"customer_id = customer_row[\"customerID\"]\n",
"contract_start = customer_row[\"contract_start\"]\n",
"customer_row"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "c540e732-d145-4263-a74d-c0c4332bbf98",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.087309Z",
"iopub.status.busy": "2026-03-24T18:07:35.087242Z",
"iopub.status.idle": "2026-03-24T18:07:35.089525Z",
"shell.execute_reply": "2026-03-24T18:07:35.088942Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.087302Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl der Tickets für den Kunden: 4\n"
]
}
],
"source": [
"# Bestimme zufällig die Anzahl der Tickets (z.B. mittels einer Poisson-Verteilung)\n",
"num_tickets = np.random.poisson(lam=2)\n",
"print(\"Anzahl der Tickets für den Kunden:\", num_tickets)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "fa7516f6-4659-4c1a-a293-f35f38d68996",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.090046Z",
"iopub.status.busy": "2026-03-24T18:07:35.089973Z",
"iopub.status.idle": "2026-03-24T18:07:35.091866Z",
"shell.execute_reply": "2026-03-24T18:07:35.091323Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.090038Z"
}
},
"outputs": [],
"source": [
"# Wir definieren die verschiedenen Kanäle, die für den Kunden zur Verfügung steht, den Kundenservice zu erreichen\n",
"\n",
"list_channels = [\"Email\", \"Hotline\", \"On-site\", \"Chat\", \"Social Media\"]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "03225bee-4676-4e20-ab1a-c98b0ff4d76c",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.092376Z",
"iopub.status.busy": "2026-03-24T18:07:35.092286Z",
"iopub.status.idle": "2026-03-24T18:07:35.094180Z",
"shell.execute_reply": "2026-03-24T18:07:35.093738Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.092370Z"
}
},
"outputs": [],
"source": [
"# Der Einfachheitshalber definieren wir für diese Simulation die möglichen Optionen für Request-Text und Reply-Text vor: \n",
"\n",
"# Liste an request-Optionen (die häufigsten Request-Gründe)\n",
"list_request = [\"Problem mit Rechnung\", \"Internet funktioniert nicht\", \"Frage zum Vertrag\", \"Beschwerde über Kundenservice\", \"Technische Störung\"]\n",
"\n",
"# Liste an request-Optionen (die häufigsten Request-Gründe)\n",
"list_reply = [\"Problem gelöst\", \"Weitere Informationen benötigt\", \"Ticket geschlossen\", \"Anfrage wird bearbeitet\"]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "5fb84bb8-6485-44b6-930c-c09039de518f",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.094511Z",
"iopub.status.busy": "2026-03-24T18:07:35.094434Z",
"iopub.status.idle": "2026-03-24T18:07:35.096172Z",
"shell.execute_reply": "2026-03-24T18:07:35.095739Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.094503Z"
}
},
"outputs": [],
"source": [
"# Wir definieren die möglichen Zustände eines Tickets\n",
"list_zustand = [\"gelöst\", \"geschlossen\"]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "df14e240-f187-4657-94bb-0da546426202",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.096515Z",
"iopub.status.busy": "2026-03-24T18:07:35.096450Z",
"iopub.status.idle": "2026-03-24T18:07:35.098172Z",
"shell.execute_reply": "2026-03-24T18:07:35.097825Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.096509Z"
}
},
"outputs": [],
"source": [
"# Initiere eine Liste, um die Interaktionen dieses Kunden zu speichern, sowie ticket-counter\n",
"interactions = []\n",
"ticket_counter = 0"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "bf21e04e-1d18-4cee-bcf9-65e7a6634ebc",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.098551Z",
"iopub.status.busy": "2026-03-24T18:07:35.098474Z",
"iopub.status.idle": "2026-03-24T18:07:35.102433Z",
"shell.execute_reply": "2026-03-24T18:07:35.101974Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.098544Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'ticket_id': '7590-VHVEG_T1', 'customer_id': '7590-VHVEG', 'time_request': Timestamp('2024-12-17 00:00:00'), 'time_reply': Timestamp('2024-12-17 06:04:49'), 'channel': 'Email', 'request': 'Technische Störung', 'reply': 'Problem gelöst', 'solved': True, 'original_request': True, 'original_ticket_id': None}\n",
"{'ticket_id': '7590-VHVEG_T2', 'customer_id': '7590-VHVEG', 'time_request': Timestamp('2024-12-27 00:00:00'), 'time_reply': Timestamp('2024-12-27 04:24:40'), 'channel': 'Hotline', 'request': 'Internet funktioniert nicht', 'reply': 'Problem gelöst', 'solved': True, 'original_request': True, 'original_ticket_id': None}\n",
"{'ticket_id': '7590-VHVEG_T3', 'customer_id': '7590-VHVEG', 'time_request': Timestamp('2024-12-12 00:00:00'), 'time_reply': datetime.datetime(2024, 12, 13, 7, 28, 37), 'channel': 'On-site', 'request': 'Problem mit Rechnung', 'reply': 'Weitere Informationen benötigt', 'solved': False, 'original_request': True, 'original_ticket_id': None}\n",
"{'ticket_id': '7590-VHVEG_T3', 'customer_id': '7590-VHVEG', 'time_request': datetime.datetime(2024, 12, 14, 7, 28, 37), 'time_reply': datetime.datetime(2024, 12, 15, 8, 9, 13), 'channel': 'On-site', 'request': 'Problem mit Rechnung', 'reply': 'Problem gelöst', 'solved': True, 'original_request': False, 'original_ticket_id': '7590-VHVEG_T3'}\n",
"{'ticket_id': '7590-VHVEG_T3', 'customer_id': '7590-VHVEG', 'time_request': datetime.datetime(2024, 12, 16, 8, 9, 13), 'time_reply': datetime.datetime(2024, 12, 16, 15, 41, 19), 'channel': 'On-site', 'request': 'Technische Störung', 'reply': 'Ticket geschlossen', 'solved': True, 'original_request': False, 'original_ticket_id': '7590-VHVEG_T3'}\n",
"{'ticket_id': '7590-VHVEG_T4', 'customer_id': '7590-VHVEG', 'time_request': Timestamp('2024-12-02 00:00:00'), 'time_reply': datetime.datetime(2024, 12, 3, 17, 7, 59), 'channel': 'Chat', 'request': 'Problem mit Rechnung', 'reply': 'Ticket geschlossen', 'solved': True, 'original_request': True, 'original_ticket_id': None}\n",
"{'ticket_id': '7590-VHVEG_T4', 'customer_id': '7590-VHVEG', 'time_request': datetime.datetime(2024, 12, 4, 17, 7, 59), 'time_reply': datetime.datetime(2024, 12, 4, 18, 45, 58), 'channel': 'Email', 'request': 'Internet funktioniert nicht', 'reply': 'Ticket geschlossen', 'solved': True, 'original_request': False, 'original_ticket_id': '7590-VHVEG_T4'}\n",
"{'ticket_id': '7590-VHVEG_T4', 'customer_id': '7590-VHVEG', 'time_request': datetime.datetime(2024, 12, 4, 18, 45, 58), 'time_reply': datetime.datetime(2024, 12, 4, 23, 42, 34), 'channel': 'Email', 'request': 'Beschwerde über Kundenservice', 'reply': 'Ticket geschlossen', 'solved': True, 'original_request': False, 'original_ticket_id': '7590-VHVEG_T4'}\n"
]
}
],
"source": [
"# Für diesen ersten Testkunden erstellen wir nun für die definierte Anzahl an Tickets die Inhalte:\n",
"\n",
"for t in range(num_tickets):\n",
" ticket_counter += 1\n",
" ticket_id = f\"{customer_id}_T{ticket_counter}\"\n",
" \n",
" # Zufälliger Startzeitpunkt des Tickets zwischen Vertragsstart und Stichtag\n",
" total_days = (stichtag - contract_start).days\n",
" ticket_start_date = contract_start + timedelta(days=random.randint(0, total_days))\n",
" \n",
" # Setze den initialen \"last_reply_time\" auf ticket_start_date, wird im Laufe des Prozesses geupdated\n",
" last_reply_time = ticket_start_date \n",
" \n",
" # Bestimme zufällig, wie viele Interaktionen (1 bis 3) dieses Ticket hat\n",
" num_interactions = random.randint(1, 3)\n",
" \n",
" for i in range(num_interactions):\n",
" is_original = (i == 0)\n",
" if is_original:\n",
" request_time = ticket_start_date\n",
" else:\n",
" # Für Folgeinteraktionen: Anfragezeit ist einige Tage nach der letzten Antwort,\n",
" # aber achte darauf, dass wir nicht über den Stichtag hinausgehen.\n",
" max_days = (stichtag - last_reply_time).days\n",
" \n",
" # Wenn max_days 0 ist, muss es am selben Tag bleiben\n",
" additional_days = random.randint(0, 1) if max_days >= 1 else 0\n",
" request_time = last_reply_time + timedelta(days=additional_days)\n",
" \n",
" # Berechne den Reply-Zeitpunkt mit der neuen Funktion\n",
" time_reply = get_reply_time(request_time, stichtag)\n",
" last_reply_time = time_reply # Update für die nächste Iteration\n",
" \n",
" channel = random.choice(list_channels)\n",
" request_text = random.choice(list_request)\n",
" reply_text = random.choice(list_reply)\n",
" solved = any(keyword in reply_text.lower() for keyword in list_zustand)\n",
" \n",
" interaction = {\n",
" \"ticket_id\": ticket_id,\n",
" \"customer_id\": customer_id,\n",
" \"time_request\": request_time,\n",
" \"time_reply\": time_reply,\n",
" \"channel\": channel,\n",
" \"request\": request_text,\n",
" \"reply\": reply_text,\n",
" \"solved\": solved,\n",
" \"original_request\": is_original,\n",
" \"original_ticket_id\": None if is_original else ticket_id\n",
" }\n",
" print(interaction)\n",
" interactions.append(interaction)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "91fdda9b-cdc7-45c9-86f2-0a9270600dfe",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.102808Z",
"iopub.status.busy": "2026-03-24T18:07:35.102738Z",
"iopub.status.idle": "2026-03-24T18:07:35.109776Z",
"shell.execute_reply": "2026-03-24T18:07:35.109246Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.102802Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ticket_id | \n",
" customer_id | \n",
" time_request | \n",
" time_reply | \n",
" channel | \n",
" request | \n",
" reply | \n",
" solved | \n",
" original_request | \n",
" original_ticket_id | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 7590-VHVEG_T1 | \n",
" 7590-VHVEG | \n",
" 2024-12-17 00:00:00 | \n",
" 2024-12-17 06:04:49 | \n",
" Email | \n",
" Technische Störung | \n",
" Problem gelöst | \n",
" True | \n",
" True | \n",
" NaN | \n",
"
\n",
" \n",
" | 1 | \n",
" 7590-VHVEG_T2 | \n",
" 7590-VHVEG | \n",
" 2024-12-27 00:00:00 | \n",
" 2024-12-27 04:24:40 | \n",
" Hotline | \n",
" Internet funktioniert nicht | \n",
" Problem gelöst | \n",
" True | \n",
" True | \n",
" NaN | \n",
"
\n",
" \n",
" | 2 | \n",
" 7590-VHVEG_T3 | \n",
" 7590-VHVEG | \n",
" 2024-12-12 00:00:00 | \n",
" 2024-12-13 07:28:37 | \n",
" On-site | \n",
" Problem mit Rechnung | \n",
" Weitere Informationen benötigt | \n",
" False | \n",
" True | \n",
" NaN | \n",
"
\n",
" \n",
" | 3 | \n",
" 7590-VHVEG_T3 | \n",
" 7590-VHVEG | \n",
" 2024-12-14 07:28:37 | \n",
" 2024-12-15 08:09:13 | \n",
" On-site | \n",
" Problem mit Rechnung | \n",
" Problem gelöst | \n",
" True | \n",
" False | \n",
" 7590-VHVEG_T3 | \n",
"
\n",
" \n",
" | 4 | \n",
" 7590-VHVEG_T3 | \n",
" 7590-VHVEG | \n",
" 2024-12-16 08:09:13 | \n",
" 2024-12-16 15:41:19 | \n",
" On-site | \n",
" Technische Störung | \n",
" Ticket geschlossen | \n",
" True | \n",
" False | \n",
" 7590-VHVEG_T3 | \n",
"
\n",
" \n",
" | 5 | \n",
" 7590-VHVEG_T4 | \n",
" 7590-VHVEG | \n",
" 2024-12-02 00:00:00 | \n",
" 2024-12-03 17:07:59 | \n",
" Chat | \n",
" Problem mit Rechnung | \n",
" Ticket geschlossen | \n",
" True | \n",
" True | \n",
" NaN | \n",
"
\n",
" \n",
" | 6 | \n",
" 7590-VHVEG_T4 | \n",
" 7590-VHVEG | \n",
" 2024-12-04 17:07:59 | \n",
" 2024-12-04 18:45:58 | \n",
" Email | \n",
" Internet funktioniert nicht | \n",
" Ticket geschlossen | \n",
" True | \n",
" False | \n",
" 7590-VHVEG_T4 | \n",
"
\n",
" \n",
" | 7 | \n",
" 7590-VHVEG_T4 | \n",
" 7590-VHVEG | \n",
" 2024-12-04 18:45:58 | \n",
" 2024-12-04 23:42:34 | \n",
" Email | \n",
" Beschwerde über Kundenservice | \n",
" Ticket geschlossen | \n",
" True | \n",
" False | \n",
" 7590-VHVEG_T4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ticket_id customer_id time_request time_reply channel \\\n",
"0 7590-VHVEG_T1 7590-VHVEG 2024-12-17 00:00:00 2024-12-17 06:04:49 Email \n",
"1 7590-VHVEG_T2 7590-VHVEG 2024-12-27 00:00:00 2024-12-27 04:24:40 Hotline \n",
"2 7590-VHVEG_T3 7590-VHVEG 2024-12-12 00:00:00 2024-12-13 07:28:37 On-site \n",
"3 7590-VHVEG_T3 7590-VHVEG 2024-12-14 07:28:37 2024-12-15 08:09:13 On-site \n",
"4 7590-VHVEG_T3 7590-VHVEG 2024-12-16 08:09:13 2024-12-16 15:41:19 On-site \n",
"5 7590-VHVEG_T4 7590-VHVEG 2024-12-02 00:00:00 2024-12-03 17:07:59 Chat \n",
"6 7590-VHVEG_T4 7590-VHVEG 2024-12-04 17:07:59 2024-12-04 18:45:58 Email \n",
"7 7590-VHVEG_T4 7590-VHVEG 2024-12-04 18:45:58 2024-12-04 23:42:34 Email \n",
"\n",
" request reply solved \\\n",
"0 Technische Störung Problem gelöst True \n",
"1 Internet funktioniert nicht Problem gelöst True \n",
"2 Problem mit Rechnung Weitere Informationen benötigt False \n",
"3 Problem mit Rechnung Problem gelöst True \n",
"4 Technische Störung Ticket geschlossen True \n",
"5 Problem mit Rechnung Ticket geschlossen True \n",
"6 Internet funktioniert nicht Ticket geschlossen True \n",
"7 Beschwerde über Kundenservice Ticket geschlossen True \n",
"\n",
" original_request original_ticket_id \n",
"0 True NaN \n",
"1 True NaN \n",
"2 True NaN \n",
"3 False 7590-VHVEG_T3 \n",
"4 False 7590-VHVEG_T3 \n",
"5 True NaN \n",
"6 False 7590-VHVEG_T4 \n",
"7 False 7590-VHVEG_T4 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Erstelle einen DataFrame für diesen Kunden\n",
"interactions_df = pd.DataFrame(interactions)\n",
"interactions_df"
]
},
{
"cell_type": "markdown",
"id": "9b2ff718-0036-448b-a632-c992908d95ff",
"metadata": {},
"source": [
"### Simuliere/Kreiere Interaktionen für alle Kunden"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "65e737ea-1dc9-40d6-9419-55acca354faa",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.110219Z",
"iopub.status.busy": "2026-03-24T18:07:35.110142Z",
"iopub.status.idle": "2026-03-24T18:07:35.447056Z",
"shell.execute_reply": "2026-03-24T18:07:35.446549Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.110212Z"
},
"scrolled": true
},
"outputs": [],
"source": [
"all_interactions = []\n",
"\n",
"for _, row in df.iterrows():\n",
" customer_id = row[\"customerID\"]\n",
" contract_start = row[\"contract_start\"]\n",
" num_tickets = np.random.poisson(lam=2)\n",
" ticket_counter = 0\n",
" \n",
" for t in range(num_tickets):\n",
" ticket_counter += 1\n",
" ticket_id = f\"{customer_id}_T{ticket_counter}\"\n",
" num_interactions = random.randint(1, 3)\n",
" total_days = (stichtag - contract_start).days\n",
" ticket_start_date = contract_start + timedelta(days=random.randint(0, total_days))\n",
" last_reply_time = ticket_start_date\n",
" \n",
" for i in range(num_interactions):\n",
" is_original = (i == 0)\n",
" if is_original:\n",
" request_time = ticket_start_date\n",
" else:\n",
" max_days = (stichtag - last_reply_time).days\n",
" additional_days = random.randint(0, 1) if max_days >= 1 else 0\n",
" request_time = last_reply_time + timedelta(days=additional_days)\n",
" \n",
" time_reply = get_reply_time(request_time, stichtag)\n",
" last_reply_time = time_reply\n",
" channel = random.choice(list_channels)\n",
" request_text = random.choice(list_request)\n",
" reply_text = random.choice(list_reply)\n",
" \n",
" solved = any(keyword in reply_text.lower() for keyword in list_zustand)\n",
" \n",
" interaction = {\n",
" \"ticket_id\": ticket_id,\n",
" \"customer_id\": customer_id,\n",
" \"time_request\": request_time,\n",
" \"time_reply\": time_reply,\n",
" \"channel\": channel,\n",
" \"request\": request_text,\n",
" \"reply\": reply_text,\n",
" \"solved\": solved,\n",
" \"original_request\": is_original,\n",
" \"original_ticket_id\": None if is_original else ticket_id\n",
" }\n",
" all_interactions.append(interaction)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "9580a313-ee4e-4d39-abf1-2ac5f4bf7115",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.447581Z",
"iopub.status.busy": "2026-03-24T18:07:35.447470Z",
"iopub.status.idle": "2026-03-24T18:07:35.482039Z",
"shell.execute_reply": "2026-03-24T18:07:35.481426Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.447570Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(28158, 10)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ticket_id | \n",
" customer_id | \n",
" time_request | \n",
" time_reply | \n",
" channel | \n",
" request | \n",
" reply | \n",
" solved | \n",
" original_request | \n",
" original_ticket_id | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 7590-VHVEG_T1 | \n",
" 7590-VHVEG | \n",
" 2024-12-23 00:00:00 | \n",
" 2024-12-23 14:28:40 | \n",
" On-site | \n",
" Internet funktioniert nicht | \n",
" Ticket geschlossen | \n",
" True | \n",
" True | \n",
" NaN | \n",
"
\n",
" \n",
" | 1 | \n",
" 7590-VHVEG_T1 | \n",
" 7590-VHVEG | \n",
" 2024-12-23 14:28:40 | \n",
" 2024-12-23 20:20:22 | \n",
" Hotline | \n",
" Internet funktioniert nicht | \n",
" Anfrage wird bearbeitet | \n",
" False | \n",
" False | \n",
" 7590-VHVEG_T1 | \n",
"
\n",
" \n",
" | 2 | \n",
" 5575-GNVDE_T1 | \n",
" 5575-GNVDE | \n",
" 2023-09-03 00:00:00 | \n",
" 2023-09-03 12:48:24 | \n",
" Email | \n",
" Internet funktioniert nicht | \n",
" Problem gelöst | \n",
" True | \n",
" True | \n",
" NaN | \n",
"
\n",
" \n",
" | 3 | \n",
" 5575-GNVDE_T1 | \n",
" 5575-GNVDE | \n",
" 2023-09-04 12:48:24 | \n",
" 2023-09-05 08:04:13 | \n",
" Social Media | \n",
" Frage zum Vertrag | \n",
" Weitere Informationen benötigt | \n",
" False | \n",
" False | \n",
" 5575-GNVDE_T1 | \n",
"
\n",
" \n",
" | 4 | \n",
" 5575-GNVDE_T2 | \n",
" 5575-GNVDE | \n",
" 2024-12-16 00:00:00 | \n",
" 2024-12-17 20:29:09 | \n",
" On-site | \n",
" Internet funktioniert nicht | \n",
" Weitere Informationen benötigt | \n",
" False | \n",
" True | \n",
" NaN | \n",
"
\n",
" \n",
" | 5 | \n",
" 5575-GNVDE_T2 | \n",
" 5575-GNVDE | \n",
" 2024-12-18 20:29:09 | \n",
" 2024-12-19 18:25:23 | \n",
" Hotline | \n",
" Internet funktioniert nicht | \n",
" Anfrage wird bearbeitet | \n",
" False | \n",
" False | \n",
" 5575-GNVDE_T2 | \n",
"
\n",
" \n",
" | 6 | \n",
" 5575-GNVDE_T2 | \n",
" 5575-GNVDE | \n",
" 2024-12-19 18:25:23 | \n",
" 2024-12-19 20:25:15 | \n",
" Hotline | \n",
" Internet funktioniert nicht | \n",
" Anfrage wird bearbeitet | \n",
" False | \n",
" False | \n",
" 5575-GNVDE_T2 | \n",
"
\n",
" \n",
" | 7 | \n",
" 5575-GNVDE_T3 | \n",
" 5575-GNVDE | \n",
" 2022-07-08 00:00:00 | \n",
" 2022-07-09 12:38:29 | \n",
" Social Media | \n",
" Frage zum Vertrag | \n",
" Problem gelöst | \n",
" True | \n",
" True | \n",
" NaN | \n",
"
\n",
" \n",
" | 8 | \n",
" 5575-GNVDE_T3 | \n",
" 5575-GNVDE | \n",
" 2022-07-09 12:38:29 | \n",
" 2022-07-10 20:21:07 | \n",
" On-site | \n",
" Beschwerde über Kundenservice | \n",
" Weitere Informationen benötigt | \n",
" False | \n",
" False | \n",
" 5575-GNVDE_T3 | \n",
"
\n",
" \n",
" | 9 | \n",
" 5575-GNVDE_T3 | \n",
" 5575-GNVDE | \n",
" 2022-07-11 20:21:07 | \n",
" 2022-07-11 22:33:02 | \n",
" Social Media | \n",
" Internet funktioniert nicht | \n",
" Problem gelöst | \n",
" True | \n",
" False | \n",
" 5575-GNVDE_T3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ticket_id customer_id time_request time_reply \\\n",
"0 7590-VHVEG_T1 7590-VHVEG 2024-12-23 00:00:00 2024-12-23 14:28:40 \n",
"1 7590-VHVEG_T1 7590-VHVEG 2024-12-23 14:28:40 2024-12-23 20:20:22 \n",
"2 5575-GNVDE_T1 5575-GNVDE 2023-09-03 00:00:00 2023-09-03 12:48:24 \n",
"3 5575-GNVDE_T1 5575-GNVDE 2023-09-04 12:48:24 2023-09-05 08:04:13 \n",
"4 5575-GNVDE_T2 5575-GNVDE 2024-12-16 00:00:00 2024-12-17 20:29:09 \n",
"5 5575-GNVDE_T2 5575-GNVDE 2024-12-18 20:29:09 2024-12-19 18:25:23 \n",
"6 5575-GNVDE_T2 5575-GNVDE 2024-12-19 18:25:23 2024-12-19 20:25:15 \n",
"7 5575-GNVDE_T3 5575-GNVDE 2022-07-08 00:00:00 2022-07-09 12:38:29 \n",
"8 5575-GNVDE_T3 5575-GNVDE 2022-07-09 12:38:29 2022-07-10 20:21:07 \n",
"9 5575-GNVDE_T3 5575-GNVDE 2022-07-11 20:21:07 2022-07-11 22:33:02 \n",
"\n",
" channel request \\\n",
"0 On-site Internet funktioniert nicht \n",
"1 Hotline Internet funktioniert nicht \n",
"2 Email Internet funktioniert nicht \n",
"3 Social Media Frage zum Vertrag \n",
"4 On-site Internet funktioniert nicht \n",
"5 Hotline Internet funktioniert nicht \n",
"6 Hotline Internet funktioniert nicht \n",
"7 Social Media Frage zum Vertrag \n",
"8 On-site Beschwerde über Kundenservice \n",
"9 Social Media Internet funktioniert nicht \n",
"\n",
" reply solved original_request original_ticket_id \n",
"0 Ticket geschlossen True True NaN \n",
"1 Anfrage wird bearbeitet False False 7590-VHVEG_T1 \n",
"2 Problem gelöst True True NaN \n",
"3 Weitere Informationen benötigt False False 5575-GNVDE_T1 \n",
"4 Weitere Informationen benötigt False True NaN \n",
"5 Anfrage wird bearbeitet False False 5575-GNVDE_T2 \n",
"6 Anfrage wird bearbeitet False False 5575-GNVDE_T2 \n",
"7 Problem gelöst True True NaN \n",
"8 Weitere Informationen benötigt False False 5575-GNVDE_T3 \n",
"9 Problem gelöst True False 5575-GNVDE_T3 "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Erstelle den DataFrame für die simulierte \"kunden-service\"-Tabelle\n",
"df_kunden_service = pd.DataFrame(all_interactions)\n",
"\n",
"print(df_kunden_service.shape)\n",
"\n",
"df_kunden_service.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "59579043-f37c-4da7-86f4-616ec14ade94",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-24T18:07:35.482407Z",
"iopub.status.busy": "2026-03-24T18:07:35.482329Z",
"iopub.status.idle": "2026-03-24T18:07:35.558647Z",
"shell.execute_reply": "2026-03-24T18:07:35.558084Z",
"shell.execute_reply.started": "2026-03-24T18:07:35.482399Z"
}
},
"outputs": [],
"source": [
"# Tabelle abspeichern, um auch anderweitig weiternutzen zu können\n",
"df_kunden_service.to_csv(\"kunden_service.csv\", index=False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}