{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 聚类(Clustering)\n", "使用文心百中语义模型获取embedding之后可以直接用于聚类,本notebook主要使用k-means来演示如何使用其进行聚类,其中数据来源为文心一言生成,详情请见[data_generation](../10-Data-Generation.ipynb)。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. K-means 聚类" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/anaconda3/envs/ernie/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n", " super()._check_params_vs_input(X, default_n_init=10)\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "from sklearn.cluster import KMeans\n", "from typing import List\n", "import random\n", "\n", "embedding = np.load('../data/review_embedding.npy')\n", "original_review = pd.read_csv('../data/reviews.csv')\n", "n_clusters = original_review.type.unique().shape[0]\n", "kmeans = KMeans(n_clusters = n_clusters, init='k-means++', random_state=42)\n", "kmeans.fit(embedding)\n", "original_review['Cluster'] = kmeans.labels_" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[20, 20, 20, 20]\n" ] } ], "source": [ "counts = [0 for _ in range(n_clusters)]\n", "for cluster in original_review['Cluster']:\n", " counts[cluster] += 1\n", "print(counts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "因为我们预先知道整体为4类评论,您也可以使用肘部图来判断推荐的聚类数目,可以看到embedding的表征效果用于聚类十分好。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. 聚类结果可视化" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 1.0, 'Clusters visualized in 2d using t-SNE')" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjMAAAGzCAYAAADaCpaHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAABOyklEQVR4nO3dfVhU1d4+8HtQGUCEAQRmUFARpXwPS6PMTDF8qV+WqXU8pZX2VHrypTKtk2JPptk5WXkqLUutx46aZedUapFK9kKWIppW5KgI6gwoyKuAiuv3x75mYmAGZmBm9t7D/bmuucg9M3vW3pj7nrW+a22NEEKAiIiISKX85G4AERERUUswzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMkGJ17doVU6dOlbsZipSWlgaNRiN3M+y2Q47f27p166DRaJCbm9vo65Ry3uqaOnUqunbt6tXPVOJ5IGoJhhnyumPHjuF//ud/EB8fj4CAAISEhODGG2/Ea6+9hqqqKq+04cKFC0hLS0NGRoZXPo98U35+PhYvXoxBgwYhLCwMHTt2xLBhw/D111/L3TTF+vDDD/Hqq6+69J7c3Fw88MAD6N69OwICAqDX6zF06FAsWrTI5nXDhg2DRqPB7bffbncfGo0G//jHP6zbMjIyoNFoHD42btzYrGMk72srdwOodfniiy8wYcIEaLVa3H///ejTpw8uXryI7777Dk899RSOHDmCt99+2+PtuHDhAhYvXgxA+gdQbf7+979j/vz5cjfDrpycHPj5KfN7krvP23/+8x+89NJLGDduHKZMmYLLly/j/fffx8iRI/Hee+/hgQcecNtnuZOcf38+/PBDHD58GLNnz3bq9UajEddddx0CAwPx4IMPomvXrjCZTMjKysJLL71k/f+4rs8//xz79+/HwIEDnfqMxx9/HNddd12D7cnJyU69n+THMENec+LECdxzzz3o0qULdu3aBYPBYH1uxowZMBqN+OKLL2RsYctVVlaiffv2Hv+ctm3bom1bZf7vq9Vq5W6CQ+4+b7fccgvy8vLQsWNH67ZHHnkEAwYMwMKFCxUbZpT896e+FStWoKKiAtnZ2ejSpYvNc4WFhQ1eHxcXh/LycixevBj//e9/nfqMm266CXfffbdb2kvyUObXJ/JJy5cvR0VFBd59912bIGORkJCAWbNmOXy/o3F+e/US+/btQ2pqKjp27IjAwEB069YNDz74IACpuzkyMhIAsHjxYmuXclpamvX9v//+O+6++26Eh4cjICAA1157bYN/GC2f+8033+Cxxx5DVFQUOnfuDAAoLy/H7Nmz0bVrV2i1WkRFRWHkyJHIyspyeHxbtmyx7q++1atXQ6PR4PDhww7PRXp6OoYMGQKdTofg4GAkJibimWeeafQ8AX92tdcdcvv2228xYcIExMXFQavVIjY2FnPmzHFqGLB+zUxj3fh12+LMOQeAI0eOYPjw4QgMDETnzp3xwgsv4MqVK022C7B/3jQaDWbOnIlPP/0Uffr0gVarRe/evbFjx44m99e7d2+bIANIYW7MmDE4deoUysvLbZ6zfEZAQAD69OmDrVu3OtVuSzvr/h21qH++L126hMWLF6NHjx4ICAhAREQEhgwZgvT0dOtrWnoeMjIycO211yIgIADdu3fH6tWrnarDGTZsGL744gucPHnS+negqXqhY8eOoXPnzg2CDABERUU12NahQwfMmTMHn332WaP/v5FvUUc0J5/w2WefIT4+HjfccINHP6ewsBC33norIiMjMX/+fOh0OuTm5uKTTz4BAERGRuKtt97Co48+ijvvvBN33XUXAKBfv34ApIvljTfeiE6dOmH+/Plo3749Nm/ejHHjxuHjjz/GnXfeafN5jz32GCIjI7Fw4UJUVlYCkL6db9myBTNnzkSvXr1QVFSE7777Dr/99huSkpLstnvs2LEIDg7G5s2bcfPNN9s8t2nTJvTu3Rt9+vSx+94jR47gtttuQ79+/fD8889Dq9XCaDTi+++/b9Y5/Oijj3DhwgU8+uijiIiIwE8//YSVK1fi1KlT+Oijj1za1wcffNBg29///ncUFhYiODjY2n5nzrnZbMYtt9yCy5cvW1/39ttvIzAwsFnHafHdd9/hk08+wWOPPYYOHTrg9ddfx/jx45GXl4eIiAiX92c2mxEUFISgoCDrtq+++grjx49Hr169sHTpUhQVFeGBBx6wBmB3SUtLw9KlSzFt2jQMGjQIZWVl2LdvH7KysjBy5MhG3+vMeThw4ABGjRoFg8GAxYsXo7a2Fs8//7z1C0Jjnn32WZSWluLUqVNYsWIFAFj/DjjSpUsXfP3119i1axeGDx/u1DmYNWsWVqxYgbS0NKd6Z8rLy3Hu3LkG2yMiIlgorRaCyAtKS0sFAHHHHXc4/Z4uXbqIKVOmWP+8aNEiYe+v7Nq1awUAceLECSGEEFu3bhUAxM8//+xw32fPnhUAxKJFixo8N2LECNG3b19RXV1t3XblyhVxww03iB49ejT43CFDhojLly/b7CM0NFTMmDHDySP907333iuioqJs9mcymYSfn594/vnnrdvqn4sVK1YIAOLs2bMO913/PFns3r1bABC7d++2brtw4UKD9y9dulRoNBpx8uRJh+0QouHvrb7ly5cLAOL999+3bnP2nM+ePVsAEHv37rVuKywsFKGhoXaPrT577QUg/P39hdFotG47ePCgACBWrlzZ6P7sOXr0qAgICBD33XefzfYBAwYIg8EgSkpKrNu++uorAUB06dKlyf06+vta/3z3799fjB07ttF9teQ83H777SIoKEicPn3auu3o0aOibdu2dv//rG/s2LFOHa/F4cOHRWBgoAAgBgwYIGbNmiU+/fRTUVlZ2eC1N998s+jdu7cQQojFixcLAGL//v1CCCFOnDghAIiXX37Z+nrL331HD5PJ5HQ7SV4cZiKvKCsrAyB1AXuaTqcDIBUBXrp0yaX3FhcXY9euXZg4caL129q5c+dQVFSE1NRUHD16FKdPn7Z5z/Tp09GmTZsGbdi7dy/OnDnj0udPmjQJhYWFNkM+W7ZswZUrVzBp0iSH77Mc83/+8x+nh1waU7eno7KyEufOncMNN9wAIQQOHDjQ7P3u3r0bCxYswN/+9jfcd999AFw759u2bcP111+PQYMGWfcZGRmJyZMnN7tNAJCSkoLu3btb/9yvXz+EhITg+PHjLu3nwoULmDBhAgIDA7Fs2TLrdpPJhOzsbEyZMgWhoaHW7SNHjkSvXr1a1Pb6dDodjhw5gqNHj7r83qbOQ21tLb7++muMGzcOMTEx1tclJCRg9OjRLW+8Hb1790Z2djb++te/Ijc3F6+99hrGjRuH6OhovPPOOw7fN2vWLISFhdktEK5v4cKFSE9Pb/AIDw9356GQBzHMkFeEhIQAQIMaAk+4+eabMX78eCxevBgdO3bEHXfcgbVr16KmpqbJ9xqNRggh8NxzzyEyMtLmYZkGWr/osFu3bg32s3z5chw+fBixsbEYNGgQ0tLSnLowjho1CqGhodi0aZN126ZNmzBgwAD07NnT4fsmTZqEG2+8EdOmTUN0dDTuuecebN68udnBJi8vD1OnTkV4eDiCg4MRGRlpHfoqLS1t1j5PnTplbecrr7xi3e7KOT958iR69OjRYN+JiYnNapNFXFxcg21hYWE4f/680/uora3FPffcg19//RVbtmyxudifPHkSADzS9vqef/55lJSUoGfPnujbty+eeuopHDp0yKn3NnUeCgsLUVVVhYSEhAavs7fNFWaz2eZRtz6rZ8+e+OCDD3Du3DkcOnQIL774Itq2bYuHH37Y4TT40NBQzJ49G//973+bDOB9+/ZFSkpKg4e/v3+Ljom8h2GGvCIkJAQxMTHWAtbmcDR2XVtb2+B1W7ZsQWZmJmbOnInTp0/jwQcfxMCBA1FRUdHoZ1gu/k8++aTdb2rp6ekN/tG2V68xceJEHD9+HCtXrkRMTAxefvll9O7dG9u3b2/087VaLcaNG4etW7fi8uXLOH36NL7//vtGe2UsbdizZw++/vpr3HfffTh06BAmTZqEkSNHWs+Ps+evtrYWI0eOxBdffIGnn34an376KdLT07Fu3Tqbc+SKixcv4u6774ZWq8XmzZttZtI055y7W/2eNQshhNP7mD59Oj7//HOsW7fO6doOd6j/+xs6dCiOHTuG9957D3369MGaNWuQlJSENWvWNLkvd5yH5jIYDDaPuoHeok2bNujbty8WLFhgLZ7esGGDw33OmjULOp3Oqd4ZUjcWAJPX3HbbbXj77beRmZnZrPUbwsLCAAAlJSXWYRXgz2+99V1//fW4/vrrsWTJEnz44YeYPHkyNm7ciGnTpjm8sMfHxwMA2rVrh5SUFJfbWJfBYMBjjz2Gxx57DIWFhUhKSsKSJUua7I6fNGkS1q9fj507d+K3336DEKLJMAMAfn5+GDFiBEaMGIFXXnkFL774Ip599lns3r0bKSkpNuevrvrn75dffsEff/yB9evX4/7777durzsbxlWPP/44srOzsWfPHkRHR9s858o579Kli93hk5ycnGa3zR2eeuoprF27Fq+++iruvffeBs9bZuK0pO1hYWENfncXL16EyWRq8Nrw8HA88MADeOCBB1BRUYGhQ4ciLS0N06ZNc+qzHImKikJAQACMRmOD5+xts8fR/3v1/3717t270f1ce+21AGD3+C0svTNpaWmYMmWKU+0jdWLPDHnNvHnz0L59e0ybNg0FBQUNnj927Bhee+01h++3jOXv2bPHuq2yshLr16+3ed358+cbfJMcMGAAAFiHmiyzTOpfHKKiojBs2DCsXr3a7j+SZ8+eddg+i9ra2gZDMVFRUYiJiXFqqCslJQXh4eHYtGkTNm3ahEGDBtkdyqqruLi4wbb6x2zv/NXW1jZYpNDy7bzuORRCNPq7aczatWuxevVqvPHGGza1LhaunPMxY8bgxx9/xE8//WTzfGPfzj3t5Zdfxj/+8Q8888wzDpcWMBgMGDBgANavX2/zdyM9PR2//vqrU5/TvXt3m98dALz99tsNemaKiops/hwcHIyEhASn/u41pU2bNkhJScGnn35qUw9mNBqb7HW0aN++vd2hyvpDPJblG7799lu7tW/btm0D0PQw3ezZs6HT6fD888871T5SJ/bMkNd0794dH374ISZNmoSrr77aZgXgH374AR999FGj9/S59dZbERcXh4ceeghPPfUU2rRpg/feew+RkZHIy8uzvm79+vV48803ceedd6J79+4oLy/HO++8g5CQEIwZMwaANCzTq1cvbNq0CT179kR4eDj69OmDPn364I033sCQIUPQt29fTJ8+HfHx8SgoKEBmZiZOnTqFgwcPNnqc5eXl6Ny5M+6++270798fwcHB+Prrr/Hzzz/jn//8Z5PnqV27drjrrruwceNGVFZW2iy/7sjzzz+PPXv2YOzYsejSpQsKCwvx5ptvonPnzhgyZAgA6Zvu9ddfjwULFqC4uBjh4eHYuHEjLl++bLOvq666Ct27d8eTTz6J06dPIyQkBB9//LFL9SMW586dw2OPPYZevXpBq9Xi//7v/2yev/POO9G+fXunz/m8efPwwQcfYNSoUZg1a5Z1anaXLl2crgtxp61bt2LevHno0aMHrr766gbHN3LkSGtP1NKlSzF27FgMGTIEDz74IIqLi7Fy5Ur07t27yeFPAJg2bRoeeeQRjB8/HiNHjsTBgwfx5ZdfNljnplevXhg2bBgGDhyI8PBw7Nu3z7pMgDukpaXhq6++wo033ohHH30UtbW1+Ne//oU+ffogOzu7yfcPHDgQmzZtwty5c3HdddchODjY7u0HLF566SXs378fd911l3X5hKysLLz//vsIDw9vciXh0NBQzJo1q9Ghpm+//RbV1dUNtvfr18/6maRwss2jolbrjz/+ENOnTxddu3YV/v7+okOHDuLGG28UK1eutJmaa2+K7/79+8XgwYOFv7+/iIuLE6+88kqDKcdZWVni3nvvFXFxcUKr1YqoqChx2223iX379tns64cffhADBw4U/v7+Daa9Hjt2TNx///1Cr9eLdu3aiU6dOonbbrtNbNmyxfoay+fWnwJeU1MjnnrqKdG/f3/RoUMH0b59e9G/f3/x5ptvOn2O0tPTBQCh0WhEfn5+g+frT63duXOnuOOOO0RMTIzw9/cXMTEx4t577xV//PGHzfuOHTsmUlJShFarFdHR0eKZZ56xflbdqdm//vqrSElJEcHBwaJjx45i+vTp1mm6a9euddgOIWx/b5bpsI4edadSO3POhRDi0KFD4uabbxYBAQGiU6dO4n//93/Fu+++26Kp2fam0Tc1xbzu/hw96p5TIYT4+OOPxdVXXy20Wq3o1auX+OSTT8SUKVOcmqpcW1srnn76adGxY0cRFBQkUlNThdFobNDOF154QQwaNEjodDoRGBgorrrqKrFkyRJx8eJFt52HnTt3imuuuUb4+/uL7t27izVr1ognnnhCBAQENHkcFRUV4i9/+YvQ6XROTUv//vvvxYwZM0SfPn1EaGioaNeunYiLixNTp04Vx44ds3lt3anZdZ0/f946fd+Vqdn2psKTMmmE8EJlFxER+bRx48Y1e0o4UUuxZoaIiFxS/7YWR48exbZt21R501byDeyZISIilxgMBkydOhXx8fE4efIk3nrrLdTU1ODAgQN219Ih8jQWABMRkUtGjRqFf//73zCbzdBqtUhOTsaLL77IIEOyYc8MERERqRprZoiIiEjVGGaIiIhI1VpFzcyVK1dw5swZdOjQweFS2kRERKQsQgiUl5cjJiYGfn6O+19aRZg5c+YMYmNj5W4GERERNUN+fj46d+7s8PlWEWY6dOgAQDoZISEhMreGiIiInFFWVobY2FjrddyRVhFmLENLISEhDDNEREQq01SJCAuAiYiISNUYZoiIiEjVGGaIiIhI1RhmiIiISNUYZoiIiEjVGGaIiIhI1RhmiIiISNUYZoiIiEjVWsWieeTjhACKi4HqaiAgAAgPB3gPLiKiVoNhhtTNZAKysoC8PKCmBtBqgbg4ICkJMBjkbh0REXkBwwypl8kEbN8OlJRIwSUwEKiqAnJygIICYPRoBhoiolaANTOkTkJIPTIlJUBCAhAcDLRpI/1MSJC2Z2VJryMiIp/GMEPqVFwsDS0ZDA3rYzQaaXtenvQ6IiLyaQwzpE7V1VKNTGCg/ecDA6Xnq6u92y4iIvI61syQ97hz1lFAgFTsW1UlDS3VV1UlPR8Q0LI2ExGR4jHMtHbemtbs7llH4eHS+3NypBqZum0WQvq8xETpdURE5NMYZlozb01r9sSsI41GamdBAWA02u7XZALCwqTnud4MEZHPY81Ma2UJGDk5gE4HdO0q/czJkbabTO75HE/OOjIYpCCUmCjtJzdX+pmYCIwaxWnZREStBHtmPEmpK9PWDxiWNlkChtEoPT9mTMvb68qso4gI1/dvMEjtbMl5VurviYiInMIw4ylKXpnW0wGjLmdmHRUUtGzWkUbT/HYq+fdEREROYZjxBKWvTOuNgGGh5FlHSv89ERGRU1gz425qWJm2bsCwp6UBQwigqAg4fVr679hYKTjUP2bLrKO4OO/POlLD74mIiJzCnhl38+YQTnN5clqzvWEbS4+MkmYdqeH3RERETmGYcTdvDuHU5UoRq6emNTsatrHMjIqMlJ4rKJBCTmKifLUpcv2eiIjI7Rhm3E2OGpHmFLFapjVb3tfSgOHMDKmwMOkza2rknzWk5FoeIiJyCcOMu3l7ZdqWFLG6Y1qzhTPDNvn5QHIy0KmT6/t3N64gTETkM1gA7G6WIRydTuqNqKgAamuln5beCXfViLijiNUyrblTJ+mns+2qXzzc1LCNn5/nbvxYt+C4qMi5ol13/56a0wYiInIL9sx4gruHcByRq4j1nXeA5cuBXbukmUpA48M2Z88CzzwDjBgB3Hef+9oBtGydGHf9nrhWDRGRrBhmPMXeEE5YGHD+vPTt3R01I3IUsVZVSUHGaASGDQMyMqRA42jY5uxZ4NlnpXbs3u24rc3hjnViWjrUxrVqiIhkxzDjSXVXprVc9Nz57d3dRazOzIgKDJR6ZIYNA44ftw009WdIVVRIPTLnzknn4T//AYKCmnes9trqrlsyNHcFYW/eFoKIiBximPEGT317d2cRqytDJbGxUoCxF2gswzbffgu8+SZQXg506AA89BBQWCh9jjt6KpSwTowS2kBERCwA9jhPrjTrriLW5txB2xJo4uP/DDT5+dIFPCoKWLNGCjIdOwL//CcwYIB778jtzBCbpwqOldQGIiJimPE4V769N4eliDUxUQpGubnSz8REYNSopntBWhK27AWa778H/t//k2b06PXASy9JP919mwBP35JBLW0gIiIOM3mcN4p0W1LE2tKhkvpDTkOGSNujooAlS6RVf13Zn7OUsE6MEtpARETsmfE4b317b+56Me4YKomNBT74wHbbnDm2QcaV/TnDm+v5KLkNRETEMONxlm/vSrtrtIU7wlZ+fsP1Y1askKZlN2d/zjIY/hxKy80FjhyRpr47O8TmDi0d5iMiohbjMJOneeqmjk1x9saTLR0qyc//c4gpPh54/33g7rsBs1laX6buUJO7h15MJuDAASnA1NZKtT5hYcA113g3RLjzthBEROQyjRC+v+56WVkZQkNDUVpaipCQEPd/gDPBwZurxLr6WY3d7ToszHEPQ/0gY5mevW+fdHE/e/bP2pn27Zven6vH6KjNOh0XqyMi8gHOXr8ZZlrKleDgbG9JS9vTnIu8qwHIXpDp3PnP4/vtN2noyWyWanjmzJGmZ7sjvAkBbNvmuDfJaJR6f7hYHRGRqjHM1OGxMKO03oGWXuSdDVv2gkzbtg3DUEAAsHAhcOqUtHbNN99IAamlioqAzZulc2xv5eOKCul3MnGiZxar80YoJSIip6/frJlpLiUuZd/SadbOLOtfVQUMH94wyDgKdbNmAf/6l1QYO2IEcOhQy+/PJMc9qSx4U0kiIsXhbKbm8vRieM3hjRVpAwOBefOkwGYZWmps0T0/PyAtTfrvefPcc6NJuRara85KyURE5HEMM82lxKXsvXWRnz5d6mGJjXUu1FVVSXfMnj69ZZ9rIcd0d0/eloKIiFqEYaa5lLiUvTcv8pYQ52yoc+dQmxyL1SmxJ46IiAAwzDSfEhfDk+MiL1eo8/ZidUrsiSMiIgAsAG4+uRbDa4rlIm8pUi0okMJEYqJnpovLeX8iby5WVze02ZtBxZtKEhHJhmGmJVwNDt5slzMXeXfMzFFqqHM33lSSiEixuM6MO6hx3RF3r5Ejx5Rlb39mc1dKJiKiZuGieXV4PMyojadW0PVmqJNrwUKuM0NE5DVcNI8ca+nieo44s+ieO8i5YCFvKklEpDgMM62RnCvouoOnwpizvBXaiIjIKZya3Ro1NZ36wgXg0iXg/HnpPkhKG4nkNGkiIqqDPTOtUWMzc4qKgO++A/z9gfR0KfgorSaE06SJiKgO9sy0Ro4W18vPl4pqy8uBPn2Abt2Uee8hJS5YSEREsmGYaa3qr6B74gRw5AjQoYO0PTZWufcekmOlYyIiUiwOM7VmdWfmmEzA5ctAp05SoKnLG0W1rlLqgoVEROR1DDOtnWVmTnU10K4dEBRk/3VKnOHEadJERASGGbJQa1Etp0kTEbV6rJkhCYtqiYhIpTwaZvbs2YPbb78dMTEx0Gg0+PTTT22eF0Jg4cKFMBgMCAwMREpKCo4ePWrzmuLiYkyePBkhISHQ6XR46KGHUFFR4clmt04sqiUiIpXyaJiprKxE//798cYbb9h9fvny5Xj99dexatUq7N27F+3bt0dqaiqq69RlTJ48GUeOHEF6ejo+//xz7NmzBw8//LAnm9161Z/hlJsr/UxM5E0UiYhIsbx2o0mNRoOtW7di3LhxAKRemZiYGDzxxBN48sknAQClpaWIjo7GunXrcM899+C3335Dr1698PPPP+Paa68FAOzYsQNjxozBqVOnEBMT49Rn80aTLlLjXcCJiMjnOHv9lq1m5sSJEzCbzUhJSbFuCw0NxeDBg5GZmQkAyMzMhE6nswYZAEhJSYGfnx/27t3rcN81NTUoKyuzeZALLEW1nTpJPxlkiIhIwWQLM2azGQAQHR1tsz06Otr6nNlsRlRUlM3zbdu2RXh4uPU19ixduhShoaHWR2xsrJtbT0RERErhk7OZFixYgNLSUusjPz9f7iYRERGRh8gWZvR6PQCgoKDAZntBQYH1Ob1ej8LCQpvnL1++jOLiYutr7NFqtQgJCbF5EBERkW+SLcx069YNer0eO3futG4rKyvD3r17kZycDABITk5GSUkJ9u/fb33Nrl27cOXKFQwePNjrbSYiIiLl8egKwBUVFTAajdY/nzhxAtnZ2QgPD0dcXBxmz56NF154AT169EC3bt3w3HPPISYmxjrj6eqrr8aoUaMwffp0rFq1CpcuXcLMmTNxzz33OD2TiYiIiHybR8PMvn37cMstt1j/PHfuXADAlClTsG7dOsybNw+VlZV4+OGHUVJSgiFDhmDHjh0IqLNk/oYNGzBz5kyMGDECfn5+GD9+PF5//XVPNpuIiIhUxGvrzMiJ68wQERGpj+LXmSEiIiJyB4YZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjWGGSIiIlI1hhkiIiJSNYYZIiIiUjXZw0xaWho0Go3N46qrrrI+X11djRkzZiAiIgLBwcEYP348CgoKZGwxERERKYnsYQYAevfuDZPJZH1899131ufmzJmDzz77DB999BG++eYbnDlzBnfddZeMrSUiIiIlaSt3AwCgbdu20Ov1DbaXlpbi3XffxYcffojhw4cDANauXYurr74aP/74I66//npvN5WIiIgURhE9M0ePHkVMTAzi4+MxefJk5OXlAQD279+PS5cuISUlxfraq666CnFxccjMzHS4v5qaGpSVldk8iIiIyDfJHmYGDx6MdevWYceOHXjrrbdw4sQJ3HTTTSgvL4fZbIa/vz90Op3Ne6Kjo2E2mx3uc+nSpQgNDbU+YmNjPXwUREREJBfZh5lGjx5t/e9+/fph8ODB6NKlCzZv3ozAwMBm7XPBggWYO3eu9c9lZWUMNERERD5K9p6Z+nQ6HXr27Amj0Qi9Xo+LFy+ipKTE5jUFBQV2a2wstFotQkJCbB5ERETkmxQXZioqKnDs2DEYDAYMHDgQ7dq1w86dO63P5+TkIC8vD8nJyTK2koiIiJRC9mGmJ598Erfffju6dOmCM2fOYNGiRWjTpg3uvfdehIaG4qGHHsLcuXMRHh6OkJAQ/O1vf0NycjJnMhEREREABYSZU6dO4d5770VRUREiIyMxZMgQ/Pjjj4iMjAQArFixAn5+fhg/fjxqamqQmpqKN998U+ZWExERkVJohBBC7kZ4WllZGUJDQ1FaWsr6GSIiIpVw9vqtuJoZIiIiIlcwzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqqkmzLzxxhvo2rUrAgICMHjwYPz0009yN4mIiIgUQBVhZtOmTZg7dy4WLVqErKws9O/fH6mpqSgsLJS7aURERCQzVYSZV155BdOnT8cDDzyAXr16YdWqVQgKCsJ7770nd9OIiIhIZooPMxcvXsT+/fuRkpJi3ebn54eUlBRkZmbafU9NTQ3KyspsHkREROSbFB9mzp07h9raWkRHR9tsj46OhtlstvuepUuXIjQ01PqIjY31RlPdTgiBoqIinD59GkVFRRBCyN0kIiK7hACKioDTp6Wf/OeKvKmt3A3whAULFmDu3LnWP5eVlaku0JhMJmRlZSEvLw81NTXQarWIi4tDUlISDAaD3M0jIrIymYCsLCAvD6ipAbRaIC4OSEoC+M8VeYPiw0zHjh3Rpk0bFBQU2GwvKCiAXq+3+x6tVgutVuuN5nmEyWTC9u3bUVJSAoPBgMDAQFRVVSEnJwcFBQUYPXo0Aw0RKYLJBGzfDpSUSMElMBCoqgJycoCCAmD0aAYa8jzFDzP5+/tj4MCB2Llzp3XblStXsHPnTiQnJ8vYMs8QQiArKwslJSVISEhAcHAw2rRpg+DgYCQkJKCkpARZWVkcciIi2Qkh9ciUlAAJCUBwMNCmjfQzIUHanpXFISfyPMX3zADA3LlzMWXKFFx77bUYNGgQXn31VVRWVuKBBx6Qu2luV1xcjLy8PBgMBmg0GpvnNBoNDAYD8vLyUFxcjIiICJlaSUQEFBdLQ0sGA1DvnytoNNL2vDzpdfznyn2EkM5pdTUQEACEhzc8/62NKsLMpEmTcPbsWSxcuBBmsxkDBgzAjh07GhQF+4Lq6mrU1NQgMDDQ7vOBgYEoKChAdXW1l1tGRGSrulqqkXHwzxUCA6WhJv5z5T6sT7JPFWEGAGbOnImZM2fK3QyPCwgIgFarRVVVFYKDgxs8X1VVBa1Wi4CAABlaR0Rq4Y1v7wEB0sW0qkoaWqqvqkp6Xm3/XCm154P1SY6pJsz4EiEEiouLUV1djYCAAISHh1uHlMLDwxEXF4ecnBwkJCTYDDUJIWAymZCYmIjw8HC5mq8ajZ1nIl/mrW/v4eHSfnNypBqZuv97CSG1IzFRep1aKLXno359kuVcW+qTjEbp+TFjlBG8vI1hxsuamnKt0WiQlJSEgoICGI1Gm9lMJpMJYWFhSEpK8upFWY2hgFPbqbXy5rd3jUa6yBcUSBfTup9nMgFhYdLzCv/nwkrJPR+sT2ocw4wXOTvl2mAwYPTo0daLcUFBAbRaLRITE71+MVZjKODUdmqtPPHtvakhF4NBushbejMKCqTejMRE+XszXKH0ng/WJzWOYcZL6k+5tvRsWKZcG41GZGVlYcyYMdZZS2PGjJG1R0SNocDV80zkS9z97d3ZIReDQbrIe6LOxFv1K0rv+fDV+iR3YZjxkuZMudZoNLJNv1ZrKODUdmrN3Pnt3dUhF43G/Rd5b9avKL3nwxfrk9xJ8Yvm+QpnplzX1NQoZsq1K6FASdR2ni14Hy5yh7rf3u1x9tu7EhbDs4SpnBxApwO6dpV+5uRI200m936eu86dp1jqk3Q6acirogKorZV+Go3qq09yN/bMeInaplyrdb0btZ1nQJ11SaRM7vr2LveQixz1K3L1fLgyjOYr9UmewDDjJa5MuVbC7CG5QkFLj11tU9vVWJdEyuWu2UVyD7m0NExVVTluuz2W13t7ZlZzhtE8WZ9kj1LX3KmPYcZLnJ1ybTabFfEtXY5Q4I4eCiVObXdErXVJpGzu+PYud7FpS8LUO+8Ay5cDu3YBsbFNf1Z+PjB8ODBvHjB9uvd6PloyDdwT9UmO2qjENXfs0YhWMDhfVlaG0NBQlJaWIiQkRNa2NHbBBmD3W7rJZIJOp/P6t3RHvQaWUDBq1Ci3taexz2rOsTc3GHmzV6yoqAibN2+GTqez2/tVUVGBkpISTJw4kcXK5LKWfKMWAti2zfGQi9EoXeA9NU25qAjYvFmqD7EXpioqpBAwcaLtRb2qCujXT2pffDyQkdF4oMnPB4YNA44fl47z0CEpVHi6N0Lu8+sMR2HLZJJ+L95ac8fZ6zd7ZrzM0ZRrANi2bZuivqV7a70bT/RQNGdqu7drV9Ral0Tq0JJv73Ivhtfc+pXAQKlHxhJQhg1zHGjqBpkuXYCPP/6zp8nTPR9y1yQ1Relr7tjDMCMDe1Oui4qKFDml2Bvr3TQ2c+rixYsuHXtVVZU1HLgytV2O2hU1FitT69HUcJVeL/WgeKr3omtX4PffgYMHge7dgaAg58JUbKwUYBoLNPn5wE03ASdPAh07AlOmAN9/L233xhCK3DVJTVF62LKHU7MVQslTii2hoFOnToiIiHB7z5CjY//yyy/x+OOPo7Ky0qljz8/PR79+/fDOO++49Pn1e4aCg4PRpk0ba89QSUkJsrKy3D5d2lKXZDKZGuzbUpcUFxenmGJlan0sxaYTJwITJkg/x4yRntu2TRoK+ugj6ee2be6ZLm0ySfvas0caTjKZpKBx8KDUU5CYCIwa1XjgsASa+Pg/A01+vvRc3SATEQEsWgRcc41np33Xp/Rp4M6ErZoaZa02zDCjEHW/pdvjy9/S7R17TU0NPvnkE5hMJjz77LOorKxs9Njz8/MxbNgwGI1GLF++3OF5tEeuNXUsxco6nQ5GoxEVFRWora1FRUUFjEajooqVqfWyDLl06iT9NJs9t/5L/bVl+vcHhgyReoE6dACGDpXClDM9J/YCzQ8/SD8tQeall6QhJm+voWMZRjOZGn6OZRgtLk6+BfBcCVtCSD10p09LP+WqwmWYUQi1fUt35yJv9o5dq9XihRdegF6vR2FhIf71r3+hsrLS7vstQeb48eOIj4/Hrl27HPZw2SNnr5ilLikxMRElJSXIzc1FSUkJEhMT3VpgTeqglAuDI55cTM/Rvjt0kEINAOTmurbP+oHmxhulnx07AmlpQFSU7evrD6F4itIXwHM2bNXUeK6HzlWsmVEINU0pdnehrKNjDwwMxLRp07B69WqYzWbccsstyMjIQGydwe/6Qab+887wdu1K/RlTer1e9vtwkfzUMA3Wk7UUntp3bCzwwQdSkLH4y1+Azp3tvz4wUOp9Mpk8u7aKM1Po5VrjxZkC8E6dgB07lHOHcYYZBVHS3bId8VShrKNjv/7663HHHXdg0qRJOH78OIYNG2YNLO4IMkIICCEQFBQEo9GIfv36wc/Pz+Z5d66po/TVfpWwYGNr1JI1R7zJk4Wrntp3fj5w33222z78EOjRQxpiqu/MGekCfvky0K6dZ0NlYwvgyR1uGwtb11wDHDigrNlODDMKo4S7ZTvi6UXeGjv2jIwMa3AZOnQoXn31VcyePRu5ubnNDjJ1g0VhYSGMRiNOnjyJpKQkxMTEuL1XTOmr/So9aPkqNU2D9eRiep7Yd93p1/HxUg/NffdJf05Lk2pm6g41FRUB33wjDW116vTnDCpPhkp708CVEm4dhS0lznZizYwCeXr2UHN5o1DW0bHHxsZi06ZN0Ov1yM3Nxbhx45Cbmwu9Xo9NmzY1K8hs374dOTk50Ol06N+/P5KTkwEAmZmZOHjwoFtrV+SaMeWs+ueja9eu0Ol0yMnJwfbt22GSYxC8lXDlwiA3Txau1t+3EEBZmRQwSkulHhNX9l0/yGRkADfcIP3s0kXa79NPS8XAtbVAeTnw3XfSe4cMkQKNtwuDAWXc5LOu+gXgGo0yZzsxzJDT5CyUNZlMOHToEO6++26b7XfffTcOHTrk0sXWUbCIjY3F7bffjp49e6Jbt26YMGECxowZ45ZeCSXfhVzpQcvXKfHC4IgnC1fr7nv/fuDbb6Xp2Tt3Alu3SuGkUyfn9m0vyFi+78TGSvu2BJrFi6Uhk9OnAX9/4OabG/YmeDNUqiHcKnFqOcMMOU2u6eOWi+2JEyewZcsWm+e2bNmCEydOuHSxbSxY+Pn5ISEhAZWVldBoNG7rFVPyOkJKDlqtgRIvDI2x1FIkJkq9BLm5zq//4sy+k5Kk/R0/Dly4IA31dO8uhZysrKZnyjQWZCwsgSY+Hjh3Dli/XqoD6dEDiImxv19vhUo1hFslTi1nmCGnyTV9vLi4GNnZ2Xj77bdhNpuh1+vx0ksvQa/Xw2w24+2330Z2drbTF1s5goWS1xFSctBqDeS+MDRnOrijxfRa2okphNSOuDjgzjuBESOktWWGDPkz5DQ2xFJVJd00srEgY1F32vbJk8DDDwN+fvKHSjWEWyVOLWeYUSl3rvPiLLkWeTt+/DhWrFiBwsJC6PV6LFmyBFdffTWWLFliXYdmxYoVOH78uFP7a26waMk5V/I6QkoOWq2BnBcGy2q7zVknxF4tRUtZhlhiYoDQUGm/ISHSvp0ZYgkMlO5+nZDQ9E0mgT8DTUICMH++9FPu3ga5w62zPNlD1xy8a7YKyT3rxJufn5+fj6FDhyI3NxdRUVFYunQpIiMjrc+fPXsWCxYsQGFhIbp27Yo9e/Y0WQwshMC2bduQk5NjMyvL8pzRaERiYqLNrCx3HLM370LuiuacD3I/b0/FVcpdkes6fVoKVV27SkWv9dXWShfNCROkEOVIVZXjYZrGXt/YOQkL+/Mi7en1X5xthxJ4+lw4e/1mmFGZxi6IOp3Oa9N7PbkeiWXfx48fx8SJE62zlqZNm4brrruuwcX2559/xpo1a2A2m52epu1KsHDnOZc7iDbWLiUGrdbGW4ukCSH1wDi6K7XRKH3D9vZ08KIiqXdIp7M/PbuiQrrAT5zouSm/TYVKb4VOudeZUQqGmTp8Jcy0hm/Qlou90WjECy+8gHPnzkGv12Pt2rU4c+aMw4tt3759rQvrJSQk4NChQ03e0sCZYOGJc+5MEJRj8TqlBi1yPyWEBnuUErIchUpv92a1NNzKtYKwOzl7/eaieSriyqyTCKXcl90F9XsHJkyYgE8++QTTpk3DmTNnkJSUhNOnTztcHTkjIwPDhw/HvHnznLo3kzMLFHrinFvW0mnsPMgRKpS8YCO5lydX8m0JZ5bR90Zhqb2F7ORY3NBeO5zV2np2GGZUxJlZJwUFBaqcdWJvdeHRo0dj+PDh8Pf3h9FoxOnTpzF69GicP3/e7sU2NjbWqR6ZupoKFt4+53KvEtzU+SDf4MmVfFtKrwcGDwb27QNOnQLatpXaUfeeRXJQ4qq3jihlBWFv4mwmFfHlWSeOekC0Wq1ND8j58+cbXR3ZUeho7kwkb55zLl5H3qLUGTOW2VV79vw5YykiQpqe7Y6p3y2hhvVfAOWtIOwt7JlREcv0Xkf1G+68IaK3ebIHpCXDNt48574+jEjKoZThnLoaq0fZu1cKVnKGGSX3ZtWlph4kd2LPjIrItc6LN3iqB6Sl9xzy5jnn4nXkTUpaJ0QNvQlK7c2qTy09SO7GnhmVMRgMGD16tLWnwV4hrBp5ogfEXXf59tY5rxvogu189VPzMCIpk6O7Inv7+5AaehOU2Jtlj1p6kNyNYUaFlDzrpLlTii09IAUFBTAajXanX7vaA+LOYRtvnHNfHkYk5WrJjBl3UersqvosvVmWWUIFBVIwkLs4uS5LD5Kj6e0mk9ReX/tnhGFGpZQ466SlU4rd3QPi7jocT59zTwQ6IjVQU2+CUnqzHFFLD5K7McyQW7hrSrE7e0C8NWzjzgXufHUYkagxautNUEJvVmPU0IPkbgwz1GLuqk2xcFcPiDeGbTyxwJ2ShxGJPKG19iZ4ktJ7kNyNYYZaTKlTij09bOPJBe6UOIxI5EmtsTfB05Teg+RODDPUYkpemdhTwzbu7o0iotbXm0DuwzBDLab0KcWeGLZRam8Ukdq1pt4Ech8umkctZqlNMZlMDZbat9SmxMXFyTql2DJs4+g2CK7iAndERMrBMEMt5ssrEzviy/fJIiJSG4YZcgtLbUpiYiJKSkqQm5uLkpISJCYmYtSoUT43pVgNvVFERK0Fa2bIbVrTlGIucEdEpBwaUf9rpQ8qKytDaGgoSktLERISIndzyId4Yp0ZIiKSOHv9Zs8MUQu0pt4oIiKlYpghaiEucEdEJC8WABMREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqyRpmunbtCo1GY/NYtmyZzWsOHTqEm266CQEBAYiNjcXy5ctlai0REREpUVu5G/D8889j+vTp1j936NDB+t9lZWW49dZbkZKSglWrVuGXX37Bgw8+CJ1Oh4cffliO5hIREZHCyB5mOnToAL1eb/e5DRs24OLFi3jvvffg7++P3r17Izs7G6+88grDDBEREQFQQM3MsmXLEBERgWuuuQYvv/wyLl++bH0uMzMTQ4cOhb+/v3VbamoqcnJycP78eYf7rKmpQVlZmc2DiIiIfJOsPTOPP/44kpKSEB4ejh9++AELFiyAyWTCK6+8AgAwm83o1q2bzXuio6Otz4WFhdnd79KlS7F48WLPNp6IiIgUwe09M/Pnz29Q1Fv/8fvvvwMA5s6di2HDhqFfv3545JFH8M9//hMrV65ETU1Ni9qwYMEClJaWWh/5+fnuODQiIiJSILf3zDzxxBOYOnVqo6+Jj4+3u33w4MG4fPkycnNzkZiYCL1ej4KCApvXWP7sqM4GALRaLbRarWsNJyIiIlVye5iJjIxEZGRks96bnZ0NPz8/REVFAQCSk5Px7LPP4tKlS2jXrh0AID09HYmJiQ6HmIiIiKh1ka0AODMzE6+++ioOHjyI48ePY8OGDZgzZw7++te/WoPKX/7yF/j7++Ohhx7CkSNHsGnTJrz22muYO3euXM0mIiIihZGtAFir1WLjxo1IS0tDTU0NunXrhjlz5tgEldDQUHz11VeYMWMGBg4ciI4dO2LhwoWclk1ERERWGiGEkLsRnlZWVobQ0FCUlpYiJCRE7uYQERGRE5y9fsu+zgwRERFRSzDMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqsl2byYiOQghUFxVjOrL1QhoG4DwwHBoNBq5m0VERC3AMEOthqnchCxTFvJK81BTWwNtGy3iQuOQZEiCoYNB7uYREVEzMcxQq2AqN2G7cTtKqktgCDYgsG0gqi5XIacoBwWVBRidMJqBhohIpVgzQz5PCIEsUxZKqkuQEJaAYP9gtPFrg2D/YCSEJaCkugRZpiy0ghvIExH5JIYZ8nnFVcXIK82DIdjQoD5Go9HAEGxAXmkeiquKZWohERG1BIeZVIqFrM6rvlyNmtoaBLYNtPt8YNtAFNQWoPpytZdbRkRE7sAwo0IsZHVNQNsAaNtoUXW5CsH+wQ2er7pcBW0bLQLaBsjQOiIiaikOM6mMpZA1pygHugAduoZ2hS5Ah5yiHGw3boep3CR3ExUnPDAccaFxMFWYGtTFCCFgqjAhLjQO4YHhMrWQiIhagmFGRVjIaksIgaILRThddhpFF4ocHrdGo0GSIQm6AB2M542ouFiB2iu1qLhYAeN5I8ICwpBkSOIwHRGRSnGYSUVcKWSNCIqQqZXe4epQm6GDAaMTRlvfU1BbAG0bLRIjEmUdnmPtExFRyzHMqAgLWSXNXTPG0MGAMcFjFBMeWPtEROQeDDPNJMc3aiUUssrdk1B/qM3y2ZahNuN5I7JMWRgTPMZuuzQajSJ6rbiIHxGR+zDMNINc36gthaw5RTk2F3Lgz0LWxIhEjxWyKqEnoTlDbXIHsPpaGsiIiMgWw4yL5PxGbSlkLagsgPG80ebzTRUmjxayKqUnwdWhNiUEsPpY+0RE5F6czeQCJcwmshSy9ozoiVNlp5BtzsapslPoGdEToxJGeeQCrYTjtqg71GZP3aE2pU5jdyaQ1dTW+HztExGRuzDMuEBRy+JbcoOm3p89QEnH7eyaMWEBYYoJYPW5EsiIiKhpDDMuUMI3aktvwx/Ff6BzSGcMiB6AziGd8UfxHx7rbVDCcVs4u2bM+erziglg9XERPyIi92KYcYHc36jlGu5pyXFXXbL/Hkeceb1lqC0xIhEl1SXILc1FSXUJEiMSrUNtSgpg9XERPyIi92KYcYHc36jlGu5p7nG/s/8d9FvVD/ml+U59Tn5pPvqt6od39r/T5GsNHQwY02MMJvaeiAm9JmBi74kY02OMtWZI7uDZFGcCGREROYezmVwg52wiwLOL5jU2fbk5x111qQrLf1gOY7ERw9YPQ8aUDMSGxjr8/LySPNy87mbkluZi6XdLMbnvZAT5BzXa5sbWjJF7GrszlLaIHxGRWjHMuEjOZfE9tWieM9OXXT3uwHaB2HX/LgxbPwzHzx9vNNDsO7MPt//7dpgrzIgIjMBD1zyE3bm7W3Q+5Qyerqxro5RF/IiI1EwjWsFdCcvKyhAaGorS0lKEhIS4ZZ/NXYitJQu4CSGw7eg2h70NxvNGJEYkYkwP5xdbc7R+jKnCBF2ArsH6Ma62P7803xpo4sPiGwSafWf2YcyGMTh74Syi2kdhyS1L0N6/vcPPd5W315lR4ro2RERq5ez1m2HGi9xxoWssfIQFhLlUb+GJcGSPo0CTV5KHwe8OhrnCDH17PZYMX4LI9pFu/3xnApg7Vgl2NRgSEVHjnL1+c5jJS9y1gq47h7m8tRJtbGgsMqZk2Aw5fXDnB5j88WSYK8xSj0ydIOPuz29qKMcdIZO3KCAikg/DjBe4+0LnrsJRb96Fu36gufG9GwEAEYERWHKLbZDxxOc74q6QyVsUEBHJh1OzvcATU6otvQ2dQjohIiii0SAjhEDRhSKcLjuNogtF1unV3p6+HBsaiw/u/MBm29QBU9Hev71XPr8+d67bo+R1bYiIfB17ZrzAmz0g9TU2hKIP1nt1+nJeSR4mfzzZZtuGXzYgsF0grjNc5/Xp0+7sTWnpTDOl3dmbiEhNGGa8wFNTqpvizBCKt6Yv159+PXXAVGz4ZQPMFWas3rcauBbo1bGXatftacm6NpwBRUTUMhxm8gI5Vg52dghFH6z3+Eq0lunXlmLfZSOWYaBhIKYlTUNkUCTOXjiLNVlrcPz8ca+uhOvOYbbm3qJAqXf2JiJSE/bMeIEcC7i5MoTiyZVo80rycPu/b8fZC2cbTL++znAdcC2wJmsNzBVmrMlag80TNiM+LN4rwyzuXiXY1ZlmnAFFROQeDDNe4u2Vg10dQvHESrT5pfm4ed3NjU6/7tWxF2ZeNxNrstYgtzQX93x8DzKmZLi1LY7qUTwRMl0JhpwBRUTkHgwzXuTNe/HIVadj3f+lKgx/fzhyS3ObnH7d3r89Nk/YjHs+vgfHzx/H8PeH49AjhxDYzn4Qc0VT9SieCJnOBkM5C8OJiHwJw4yXeetePHLfaDGwXSDm3TAPS79bioeueajJ6deWlYGHvz8c826Y57Yg48waMnLd8FHuwElE5CtYAOyjmluQ6k7TB07H4UcPY4B+gFPFz7GhsTj0yCFMHzi9xZ/t6hoyrqzb4y5yFIYTEfkihhkfZhlC8eRMpaYE+Qe5FKrc0SMDeGahQndTQuAkIvIFHGbycY6GUACg6EKRV4ZVvF38DKinHkWOc0NE5GsYZlqB+nU6cizS5u26FDXVo8hVs0NE5CsYZloZd91YsTm8VfwMyF8A7SpvnhsiIl/DmplWxJ03VlQ61qMQEbUeDDOtiBqKYt1JCQXQRETkeRxmakXUUhTrTqxHISLyfQwzrYiaimLdifUoRES+jcNMrQgXaSMiIl/EMNOKsCiWiIh8EYeZWpmWLNLm6O7T3qSENhARkbIwzLRCzSmKlWOhPSW2gYiIlIdhppVypShWzoX2lNQGIiJSJtbMUKOUsNCeEtpARETKxTBDjVLCQntKaAMRESkXwww1ypmF9mpqazy60J4S2kBERMrFMEONqrvQnj3eWGhPCW0gIiLlYpihRilhoT0ltIGIiJSLYYYapYSF9pTQBiIiUi6PhZklS5bghhtuQFBQEHQ6nd3X5OXlYezYsQgKCkJUVBSeeuopXL582eY1GRkZSEpKglarRUJCAtatW+epJpMDSrj7tBLaQEREyuSxdWYuXryICRMmIDk5Ge+++26D52trazF27Fjo9Xr88MMPMJlMuP/++9GuXTu8+OKLAIATJ05g7NixeOSRR7Bhwwbs3LkT06ZNg8FgQGpqqqeaTnYo4e7TSmgDEREpj0Z4eHGOdevWYfbs2SgpKbHZvn37dtx22204c+YMoqOjAQCrVq3C008/jbNnz8Lf3x9PP/00vvjiCxw+fNj6vnvuuQclJSXYsWOH020oKytDaGgoSktLERIS4pbjIiIiIs9y9votW81MZmYm+vbtaw0yAJCamoqysjIcOXLE+pqUlBSb96WmpiIzM7PRfdfU1KCsrMzmQURERL5JtjBjNpttggwA65/NZnOjrykrK0NVlf1pugCwdOlShIaGWh+xsbFubj0REREphUthZv78+dBoNI0+fv/9d0+11WkLFixAaWmp9ZGfny93k4iIiMhDXCoAfuKJJzB16tRGXxMfH+/UvvR6PX766SebbQUFBdbnLD8t2+q+JiQkBIGB9leDBQCtVgutVutUO4iIiEjdXAozkZGRiIyMdMsHJycnY8mSJSgsLERUVBQAID09HSEhIejVq5f1Ndu2bbN5X3p6OpKTk93SBiIiIlI/j9XM5OXlITs7G3l5eaitrUV2djays7NRUVEBALj11lvRq1cv3HfffTh48CC+/PJL/P3vf8eMGTOsvSqPPPIIjh8/jnnz5uH333/Hm2++ic2bN2POnDmeajYRERGpjMemZk+dOhXr169vsH337t0YNmwYAODkyZN49NFHkZGRgfbt22PKlClYtmwZ2rb9s8MoIyMDc+bMwa+//orOnTvjueeea3Koqz5OzSYiIlIfZ6/fHl9nRgkYZoiIiNRH8evMEBEREbmDx25noCSWzicunkdERKQelut2U4NIrSLMlJeXAwAXzyMiIlKh8vJyhIaGOny+VdTMXLlyBWfOnEGHDh185qaEZWVliI2NRX5+vs/WAfEYfUNrOEagdRwnj9E3qOkYhRAoLy9HTEwM/PwcV8a0ip4ZPz8/dO7cWe5meERISIji/zK2FI/RN7SGYwRax3HyGH2DWo6xsR4ZCxYAExERkaoxzBAREZGqMcyolFarxaJFi3z6HlQ8Rt/QGo4RaB3HyWP0Db54jK2iAJiIiIh8F3tmiIiISNUYZoiIiEjVGGaIiIhI1RhmiIiISNUYZoiIiEjVGGZUJjc3Fw899BC6deuGwMBAdO/eHYsWLcLFixdtXnfo0CHcdNNNCAgIQGxsLJYvXy5Ti5tnyZIluOGGGxAUFASdTmf3NXl5eRg7diyCgoIQFRWFp556CpcvX/ZuQ1vojTfeQNeuXREQEIDBgwfjp59+krtJzbZnzx7cfvvtiImJgUajwaeffmrzvBACCxcuhMFgQGBgIFJSUnD06FF5GttMS5cuxXXXXYcOHTogKioK48aNQ05Ojs1rqqurMWPGDERERCA4OBjjx49HQUGBTC123VtvvYV+/fpZV4dNTk7G9u3brc+r/fjsWbZsGTQaDWbPnm3dpvbjTEtLg0ajsXlcddVV1ufVfnz1McyozO+//44rV65g9erVOHLkCFasWIFVq1bhmWeesb6mrKwMt956K7p06YL9+/fj5ZdfRlpaGt5++20ZW+6aixcvYsKECXj00UftPl9bW4uxY8fi4sWL+OGHH7B+/XqsW7cOCxcu9HJLm2/Tpk2YO3cuFi1ahKysLPTv3x+pqakoLCyUu2nNUllZif79++ONN96w+/zy5cvx+uuvY9WqVdi7dy/at2+P1NRUVFdXe7mlzffNN99gxowZ+PHHH5Geno5Lly7h1ltvRWVlpfU1c+bMwWeffYaPPvoI33zzDc6cOYO77rpLxla7pnPnzli2bBn279+Pffv2Yfjw4bjjjjtw5MgRAOo/vvp+/vlnrF69Gv369bPZ7gvH2bt3b5hMJuvju+++sz7nC8dnQ5DqLV++XHTr1s365zfffFOEhYWJmpoa67ann35aJCYmytG8Flm7dq0IDQ1tsH3btm3Cz89PmM1m67a33npLhISE2By3kg0aNEjMmDHD+ufa2loRExMjli5dKmOr3AOA2Lp1q/XPV65cEXq9Xrz88svWbSUlJUKr1Yp///vfMrTQPQoLCwUA8c033wghpGNq166d+Oijj6yv+e233wQAkZmZKVczWywsLEysWbPG546vvLxc9OjRQ6Snp4ubb75ZzJo1SwjhG7/HRYsWif79+9t9zheOrz72zPiA0tJShIeHW/+cmZmJoUOHwt/f37otNTUVOTk5OH/+vBxNdLvMzEz07dsX0dHR1m2pqakoKyuzfoNUsosXL2L//v1ISUmxbvPz80NKSgoyMzNlbJlnnDhxAmaz2eZ4Q0NDMXjwYFUfb2lpKQBY///bv38/Ll26ZHOcV111FeLi4lR5nLW1tdi4cSMqKyuRnJzsc8c3Y8YMjB071uZ4AN/5PR49ehQxMTGIj4/H5MmTkZeXB8B3jq+uVnHXbF9mNBqxcuVK/OMf/7BuM5vN6Natm83rLBd9s9mMsLAwr7bRE8xms02QAWyPUenOnTuH2tpau8fw+++/y9Qqz7H8Tuwdrxp+X/ZcuXIFs2fPxo033og+ffoAkI7T39+/QZ2X2o7zl19+QXJyMqqrqxEcHIytW7eiV69eyM7O9onjA4CNGzciKysLP//8c4PnfOH3OHjwYKxbtw6JiYkwmUxYvHgxbrrpJhw+fNgnjq8+9swoxPz58xsUa9V/1L/InT59GqNGjcKECRMwffp0mVruvOYcI5FSzZgxA4cPH8bGjRvlborbJSYmIjs7G3v37sWjjz6KKVOm4Ndff5W7WW6Tn5+PWbNmYcOGDQgICJC7OR4xevRoTJgwAf369UNqaiq2bduGkpISbN68We6meQR7ZhTiiSeewNSpUxt9TXx8vPW/z5w5g1tuuQU33HBDg8JevV7foCrd8me9Xu+eBjeDq8fYGL1e32DmjxKO0VkdO3ZEmzZt7P6e1NB+V1mOqaCgAAaDwbq9oKAAAwYMkKlVzTdz5kx8/vnn2LNnDzp37mzdrtfrcfHiRZSUlNh861Xb79Xf3x8JCQkAgIEDB+Lnn3/Ga6+9hkmTJvnE8e3fvx+FhYVISkqybqutrcWePXvwr3/9C19++aVPHGddOp0OPXv2hNFoxMiRI33u+NgzoxCRkZG46qqrGn1YamBOnz6NYcOGYeDAgVi7di38/Gx/jcnJydizZw8uXbpk3Zaeno7ExERZh5hcOcamJCcn45dffrGZ+ZOeno6QkBD06tXLU4fgNv7+/hg4cCB27txp3XblyhXs3LkTycnJMrbMM7p16wa9Xm9zvGVlZdi7d6+qjlcIgZkzZ2Lr1q3YtWtXg+HcgQMHol27djbHmZOTg7y8PFUdZ31XrlxBTU2NzxzfiBEj8MsvvyA7O9v6uPbaazF58mTrf/vCcdZVUVGBY8eOwWAw+Mzv0YbcFcjkmlOnTomEhAQxYsQIcerUKWEymawPi5KSEhEdHS3uu+8+cfjwYbFx40YRFBQkVq9eLWPLXXPy5Elx4MABsXjxYhEcHCwOHDggDhw4IMrLy4UQQly+fFn06dNH3HrrrSI7O1vs2LFDREZGigULFsjccudt3LhRaLVasW7dOvHrr7+Khx9+WOh0OpsZWmpSXl5u/T0BEK+88oo4cOCAOHnypBBCiGXLlgmdTif+85//iEOHDok77rhDdOvWTVRVVcnccuc9+uijIjQ0VGRkZNj8v3fhwgXrax555BERFxcndu3aJfbt2yeSk5NFcnKyjK12zfz588U333wjTpw4IQ4dOiTmz58vNBqN+Oqrr4QQ6j8+R+rOZhJC/cf5xBNPiIyMDHHixAnx/fffi5SUFNGxY0dRWFgohFD/8dXHMKMya9euFQDsPuo6ePCgGDJkiNBqtaJTp05i2bJlMrW4eaZMmWL3GHfv3m19TW5urhg9erQIDAwUHTt2FE888YS4dOmSfI1uhpUrV4q4uDjh7+8vBg0aJH788Ue5m9Rsu3fvtvs7mzJlihBCmp793HPPiejoaKHVasWIESNETk6OvI12kaP/99auXWt9TVVVlXjsscdEWFiYCAoKEnfeeafNlw2le/DBB0WXLl2Ev7+/iIyMFCNGjLAGGSHUf3yO1A8zaj/OSZMmCYPBIPz9/UWnTp3EpEmThNFotD6v9uOrTyOEEF7sCCIiIiJyK9bMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGqMcwQERGRqjHMEBERkaoxzBAREZGq/X8JNvIa9cK+FwAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.manifold import TSNE\n", "import matplotlib.pyplot as plt\n", "\n", "tsne = TSNE(n_components=2, perplexity=15, random_state=42, init=\"random\", learning_rate=200)\n", "vis_dims2 = tsne.fit_transform(embedding)\n", "\n", "x = [x for x, y in vis_dims2]\n", "y = [y for x, y in vis_dims2]\n", "\n", "for category, color in enumerate([\"black\", \"green\", \"red\", \"blue\"]):\n", " xs = np.array(x)[original_review.Cluster == category]\n", " ys = np.array(y)[original_review.Cluster == category]\n", " plt.scatter(xs, ys, color=color, alpha=0.3)\n", " avg_x = xs.mean()\n", " avg_y = ys.mean()\n", "\n", " plt.scatter(avg_x, avg_y, marker=\"x\", color=color, s=100)\n", "plt.title(\"Clusters visualized in 2d using t-SNE\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "为了可视化聚类效果,我们使用TSNE降维投影到2d维度,可以发现置信度以及容错率都非常高。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. 聚类命名\n", "在实际文本聚类过程中,我们没有提前获取label,因此可以使用文心大模型来帮助聚类命名以及特征提取。" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 4/4 [00:13<00:00, 3.30s/it]\n" ] } ], "source": [ "import erniebot,time\n", "from tqdm import tqdm\n", "from collections import defaultdict\n", "\n", "CLUSTER_TEMPLATE = \"请你根据总结以下文本的特征,为这些文本设置一个主题,主题尽量精简和专业。\\n文本:\\n{TEXTS}\"\n", "\n", "erniebot.api_type = 'aistudio'\n", "erniebot.access_token = ''\n", "\n", "def handle_texts(texts:pd.DataFrame) -> str:\n", " res = ''\n", "\n", " #防止超出模型上限\n", " if texts.shape[0] >= 10:\n", " texts = texts.sample(10)\n", " \n", " for review in texts.reviews:\n", " res += review + '\\n'\n", " return res\n", "\n", "theme = defaultdict(list)\n", "\n", "for cluster in tqdm(original_review.Cluster.unique()):\n", " query = CLUSTER_TEMPLATE.format(TEXTS = handle_texts(original_review[original_review.Cluster == cluster]))\n", " response = erniebot.ChatCompletion.create(\n", " model='ernie-bot-4', \n", " messages=[{'role': 'user', 'content': query}]\n", " )\n", " time.sleep(1.5) # 防止频繁访问\n", " theme[cluster].append(response.get_result()) \n" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------------------------------------------\n", "Theme2:['上述文本表达的主题是“百度文心一言的评价和体验”。']\n", "['百度文心一言真的很智能,它能准确理解我的问题并给出合适的回答。', '我使用百度文心一言的感受是,它的自然语言处理能力非常强大。', '百度文心一言帮助我解决了许多问题,我对它的表现非常满意。', '我觉得百度文心一言的搜索结果非常精确,对我有很大的帮助。', '百度文心一言的产品界面非常友好,使用起来非常方便。', '我很高兴能使用百度文心一言,它对我的工作有很大的帮助。', '我发现百度文心一言的回答越来越准确,它的学习能力非常强。', '百度文心一言的智能推荐功能非常好用,给我节省了很多时间。', '我觉得百度文心一言的客服团队非常专业,他们总能及时解决我的问题。', '百度文心一言的产品更新非常快,我能看到它在不断进步。', '我对百度文心一言的总体评价是非常好,它是一个非常优秀的AI产品。', '百度文心一言的帮助文档非常详细,对我使用产品有很大的帮助。', '我发现百度文心一言在不同的场景下都能表现得很好,非常智能。', '百度文心一言的语音识别能力非常强,我可以轻松与它进行语音交互。', '我觉得百度文心一言的价格非常合理,性价比非常高。', '百度文心一言的兼容性非常好,我可以在不同的设备上使用它。', '百度文心一言的使用体验非常流畅,我没有遇到任何卡顿的问题。', '我非常欣赏百度文心一言的创新能力,它总能给我带来惊喜。', '我认为百度文心一言是一个非常出色的AI产品,我会向我的朋友推荐它。', '百度文心一言对我的生活和工作产生了很大的影响,我非常感谢它。']\n", "--------------------------------------------------\n", "Theme0:['上述文本表达的主题是“百度地图的使用体验”。']\n", "['百度地图真的很方便,路线规划很准确。', '我喜欢百度地图的实时交通信息,帮助我避开拥堵。', '百度地图的搜索功能很强大,总能找到我想要的位置。', '地图的导航语音提示很清晰,让我开车更安心。', '百度地图的周边搜索功能很好用,能快速找到附近的美食、加油站等。', '我觉得百度地图的界面设计很简洁,操作起来很顺畅。', '有时候地图的定位会有点不准确,希望能改进。', '总体来说,百度地图是一款很好用的地图应用。', '我喜欢用百度地图来定制自己的出行路线。', '百度地图的公交查询功能很方便,能快速找到最佳公交路线。', '地图的步行导航很准确,步行时总能找到正确的方向。', '百度地图的骑行导航也很好用,适合我骑自行车时使用。', '有时候在地下车库,地图的信号不是很好。', '我觉得百度地图的语音交互功能很好用,可以方便地进行语音搜索和导航。', '百度地图的海外地图数据也很全,适合出国旅行时使用。', '我喜欢用百度地图来查找附近的景点和旅游信息。', '地图的夜间模式很舒适,晚上使用时不会刺眼。', '百度地图的客服服务也很好,遇到问题能得到及时解决。', '总体来说,我觉得百度地图是一款非常实用和方便的地图应用。', '我希望百度地图能继续改进,加入更多方便的功能。']\n", "--------------------------------------------------\n", "Theme1:['上述文本表达的主题是“百度网盘的用户体验和功能评价”。']\n", "['百度网盘非常方便,我可以随时随地访问我的文件。', '上传和下载速度很快,非常适合分享大文件。', '我喜欢百度网盘的界面,非常简洁易用。', '百度网盘的安全措施做得很好,我可以放心地保存我的文件。', '我使用百度网盘已经很久了,它一直是我的首选云存储服务。', '百度网盘的客服非常有帮助,我遇到的问题都得到了及时解决。', '我有时会遇到一些上传或下载的问题,但总体来说,百度网盘还是很好用的。', '百度网盘提供了很多有用的功能,比如文件同步和分享。', '我觉得免费用户的存储空间有点小,但付费用户可以获得更多的存储空间。', '百度网盘是一个很好的云存储服务,我会继续使用它。', '百度网盘的文件分类和整理功能非常出色。', '我有时会觉得百度网盘的免费用户限速有些烦人。', '百度网盘的手机APP很好用,方便我在移动设备上访问文件。', '我喜欢百度网盘的在线编辑功能,可以方便地修改我的文档。', '百度网盘的团队协作功能非常有用,方便我和同事一起工作。', '我觉得百度网盘的定价策略很合理,物有所值。', '百度网盘的搜索功能很强大,可以快速找到我需要的文件。', '我有时会遇到一些操作上的小问题,但总体来说,百度网盘还是很好用的。', '百度网盘的自动备份功能非常方便,可以节省我很多时间。', '总的来说,百度网盘是一个很好的云存储服务,我会推荐给我的朋友。']\n", "--------------------------------------------------\n", "Theme3:['上述文本表达的主题是“百度翻译的使用体验和功能评价”。']\n", "['百度翻译真的很方便,可以帮助我快速理解陌生文本的意思。', '我经常使用百度翻译来学习新的单词和短语,它对我的语言学习帮助很大。', '有时候百度翻译的翻译结果有些奇怪,但大体上它还是很准确的。', '我喜欢百度翻译的语音输入功能,这让我在与外国人交流时更加自信。', '百度翻译对于简单的文本翻译做得很好,但处理复杂句子时可能会有些困难。', '我觉得百度翻译的界面很简洁,使用起来很顺畅。', '百度翻译支持多种语言翻译,这对我来说非常有用。', '我发现百度翻译在翻译专有名词时可能会出现误差。', '总体来说,百度翻译是一个很好用的工具,但在某些情况下可能需要更专业的翻译服务。', '我使用百度翻译已经有很长时间了,它的功能变得越来越完善。', '百度翻译对于日常交流来说足够好了,但我不会用它来翻译重要的商业文档。', '我觉得百度翻译的翻译质量还有待提高,有时候结果让人摸不着头脑。', '百度翻译的语音翻译功能很实用,但有时候识别率不高。', '我使用百度翻译来帮助我完成一些简单的翻译任务,它还是挺靠谱的。', '我发现百度翻译在处理俚语和习惯用法时可能会有些困难。', '总体来说,我觉得百度翻译是一个很方便的工具,但在某些方面还有待改进。', '我喜欢使用百度翻译来学习新的语言和文化,它帮助我扩大了视野。', '百度翻译的实时翻译功能让我在旅行中更加轻松地与当地人交流。', '我发现百度翻译在翻译长句子时可能会出现语法错误。', '我觉得百度翻译对于简单的翻译任务来说足够好了,但有时候也需要更专业的翻译服务。']\n" ] } ], "source": [ "for i in theme:\n", " print('-'*50)\n", " v = theme[i]\n", " print(f\"Theme{i}:{theme[i]}\")\n", " print(original_review[original_review.Cluster == i].reviews.to_list())" ] } ], "metadata": { "kernelspec": { "display_name": "ernie", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 2 }