{ "cells": [ { "cell_type": "markdown", "id": "f35a75ba", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "7a8ad074", "metadata": {}, "source": [ "# Regresión y Clasificación con Redes Neuronales (MLP)" ] }, { "cell_type": "code", "execution_count": 680, "id": "23d79b8b", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import classification_report,confusion_matrix,ConfusionMatrixDisplay\n", "from sklearn import model_selection as ms\n", "from sklearn.metrics import r2_score \n", "from sklearn.metrics import mean_squared_error\n", "from sklearn import metrics as m\n", "from sklearn.neural_network import MLPRegressor\n", "from sklearn.neural_network import MLPClassifier\n", "from sklearn.preprocessing import StandardScaler\n" ] }, { "cell_type": "markdown", "id": "7058d734", "metadata": {}, "source": [ "# 1. Regresión" ] }, { "cell_type": "markdown", "id": "59941cf1", "metadata": {}, "source": [ "### Información del dataset" ] }, { "cell_type": "markdown", "id": "c24fba07", "metadata": {}, "source": [ " ### Bike Rents for the Day Dataset\n", " \n", " https://www.kaggle.com/datasets/ayessa/bike-sharing-dataset-regression\n", " \n", " \n", " " ] }, { "cell_type": "markdown", "id": "6a83e681", "metadata": {}, "source": [ "Este conjunto de datos contiene el recuento diario y por hora de bicicletas de alquiler entre los años 2011 y 2012 en el sistema de bicicletas compartidas de Capital con la información meteorológica y estacional correspondiente.\n", "\n", "\n", "Los sistemas de bicicletas compartidas son una nueva generación de alquileres de bicicletas tradicionales en los que todo el proceso, desde la afiliación, el alquiler y la devolución, se ha vuelto automático. A través de estos sistemas, el usuario puede alquilar fácilmente una bicicleta desde una posición particular y regresar en otra posición. Actualmente, hay alrededor de 500 programas de bicicletas compartidas en todo el mundo que se componen de más de 500 mil bicicletas. Hoy en día, existe un gran interés en estos sistemas debido a su importante papel en cuestiones de tráfico, medio ambiente y salud.\n", "\n", "Aparte de las interesantes aplicaciones del mundo real de los sistemas de bicicletas compartidas, las características de los datos generados por estos sistemas los hacen atractivos para la investigación. A diferencia de otros servicios de transporte como el autobús o el metro, en estos sistemas se registra explícitamente la duración del viaje, la posición de salida y de llegada. Esta función convierte el sistema de bicicletas compartidas en una red de sensores virtual que se puede utilizar para detectar la movilidad en la ciudad. Por lo tanto, se espera que la mayoría de los eventos importantes en la ciudad puedan detectarse a través del monitoreo de estos datos.\n", "\n", "\n", "Las columnas (Características) son:\n", "\n", "\n", "- instant: índice de registros\n", "- dteday : fecha\n", "- season : estación (1:invierno, 2:primavera, 3:verano, 4:otoño)\n", "- yr : año (0: 2011, 1:2012)\n", "- mnth : mes ( 1 a 12)\n", "- hr : hora (0 a 23)\n", "- holiday : si el día es festivo o no (extraído de [Web Link])\n", "- weekday : día de la semana\n", "- workingday : si el día no es ni fin de semana ni festivo es 1, en caso contrario es 0.\n", "+ weathersit : 1: Despejado, Pocas nubes, Parcialmente nublado, Parcialmente nublado, 2: Niebla + Nublado, Niebla + Nubes dispersas, Niebla + Pocas nubes, Niebla, 3: Nieve ligera, Lluvia ligera + Tormenta + Nubes dispersas, Lluvia ligera + Nubes dispersas, 4: Lluvia intensa + Paletas de hielo + Tormenta eléctrica + Nieve, Nieve + Niebla.\n", "- temp : Temperatura normalizada en Celsius. Los valores se obtienen mediante (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (sólo en escala horaria)\n", "- atemp: Temperatura de sensación normalizada en Celsius. Los valores se obtienen mediante (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (sólo en escala horaria)\n", "- hum: Humedad normalizada. Los valores se dividen entre 100 (máximo)\n", "- windspeed: Velocidad del viento normalizada. Los valores se dividen entre 67 (máx.)\n", "- casual: recuento de usuarios ocasionales\n", "- registered: recuento de usuarios registrados\n", "- cnt: recuento del total de bicicletas de alquiler, incluyendo las casuales y las registradas\n" ] }, { "cell_type": "markdown", "id": "6fd4e790", "metadata": {}, "source": [ "### Tarea\n", "\n", "Predecir el número de bicicletas alquiladas en un día, en función de la temperatura, el día y más." ] }, { "cell_type": "markdown", "id": "e132db38", "metadata": {}, "source": [ "### 1 Análisis exploratorio de los datos" ] }, { "cell_type": "markdown", "id": "4a818f11", "metadata": {}, "source": [ "1. Imprima el número de registros del dataset\n", "2. Imprima el número de variables del dataset\n", "3. Imprima el nombre de las columnas del dataset\n", "4. Imprima el **head** del dataset\n", "5. Imprima el **tail** del dataset\n", "6. Imprima **info** basica del dataset\n", "7. Imprima un **describe** del dataset" ] }, { "cell_type": "code", "execution_count": 78, "id": "0f981d48", "metadata": {}, "outputs": [], "source": [ "data = pd.read_csv(\"./resources/day.csv\")" ] }, { "cell_type": "code", "execution_count": 79, "id": "290328fb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
instantdtedayseasonyrmnthholidayweekdayworkingdayweathersittempatemphumwindspeedcasualregisteredcnt
012011-01-0110106020.3441670.3636250.8058330.160446331654985
122011-01-0210100020.3634780.3537390.6960870.248539131670801
232011-01-0310101110.1963640.1894050.4372730.24830912012291349
342011-01-0410102110.2000000.2121220.5904350.16029610814541562
452011-01-0510103110.2269570.2292700.4369570.1869008215181600
...................................................
7267272012-12-27111204120.2541670.2266420.6529170.35013324718672114
7277282012-12-28111205120.2533330.2550460.5900000.15547164424513095
7287292012-12-29111206020.2533330.2424000.7529170.12438315911821341
7297302012-12-30111200010.2558330.2317000.4833330.35075436414321796
7307312012-12-31111201120.2158330.2234870.5775000.15484643922902729
\n", "

731 rows × 16 columns

\n", "
" ], "text/plain": [ " instant dteday season yr mnth holiday weekday workingday \\\n", "0 1 2011-01-01 1 0 1 0 6 0 \n", "1 2 2011-01-02 1 0 1 0 0 0 \n", "2 3 2011-01-03 1 0 1 0 1 1 \n", "3 4 2011-01-04 1 0 1 0 2 1 \n", "4 5 2011-01-05 1 0 1 0 3 1 \n", ".. ... ... ... .. ... ... ... ... \n", "726 727 2012-12-27 1 1 12 0 4 1 \n", "727 728 2012-12-28 1 1 12 0 5 1 \n", "728 729 2012-12-29 1 1 12 0 6 0 \n", "729 730 2012-12-30 1 1 12 0 0 0 \n", "730 731 2012-12-31 1 1 12 0 1 1 \n", "\n", " weathersit temp atemp hum windspeed casual registered \\\n", "0 2 0.344167 0.363625 0.805833 0.160446 331 654 \n", "1 2 0.363478 0.353739 0.696087 0.248539 131 670 \n", "2 1 0.196364 0.189405 0.437273 0.248309 120 1229 \n", "3 1 0.200000 0.212122 0.590435 0.160296 108 1454 \n", "4 1 0.226957 0.229270 0.436957 0.186900 82 1518 \n", ".. ... ... ... ... ... ... ... \n", "726 2 0.254167 0.226642 0.652917 0.350133 247 1867 \n", "727 2 0.253333 0.255046 0.590000 0.155471 644 2451 \n", "728 2 0.253333 0.242400 0.752917 0.124383 159 1182 \n", "729 1 0.255833 0.231700 0.483333 0.350754 364 1432 \n", "730 2 0.215833 0.223487 0.577500 0.154846 439 2290 \n", "\n", " cnt \n", "0 985 \n", "1 801 \n", "2 1349 \n", "3 1562 \n", "4 1600 \n", ".. ... \n", "726 2114 \n", "727 3095 \n", "728 1341 \n", "729 1796 \n", "730 2729 \n", "\n", "[731 rows x 16 columns]" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "code", "execution_count": 80, "id": "e69d3aa7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de registros 731\n" ] } ], "source": [ "print(\"Número de registros\",)" ] }, { "cell_type": "code", "execution_count": 81, "id": "8156e300", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de variables 16\n" ] } ], "source": [ "print(\"Número de variables\"," ] }, { "cell_type": "code", "execution_count": 82, "id": "eee05f8a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['instant', 'dteday', 'season', 'yr', 'mnth', 'holiday', 'weekday',\n", " 'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',\n", " 'casual', 'registered', 'cnt'],\n", " dtype='object')\n" ] } ], "source": [] }, { "cell_type": "code", "execution_count": 83, "id": "443efe52", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
instantdtedayseasonyrmnthholidayweekdayworkingdayweathersittempatemphumwindspeedcasualregisteredcnt
012011-01-0110106020.3441670.3636250.8058330.160446331654985
122011-01-0210100020.3634780.3537390.6960870.248539131670801
232011-01-0310101110.1963640.1894050.4372730.24830912012291349
342011-01-0410102110.2000000.2121220.5904350.16029610814541562
452011-01-0510103110.2269570.2292700.4369570.1869008215181600
\n", "
" ], "text/plain": [ " instant dteday season yr mnth holiday weekday workingday \\\n", "0 1 2011-01-01 1 0 1 0 6 0 \n", "1 2 2011-01-02 1 0 1 0 0 0 \n", "2 3 2011-01-03 1 0 1 0 1 1 \n", "3 4 2011-01-04 1 0 1 0 2 1 \n", "4 5 2011-01-05 1 0 1 0 3 1 \n", "\n", " weathersit temp atemp hum windspeed casual registered \\\n", "0 2 0.344167 0.363625 0.805833 0.160446 331 654 \n", "1 2 0.363478 0.353739 0.696087 0.248539 131 670 \n", "2 1 0.196364 0.189405 0.437273 0.248309 120 1229 \n", "3 1 0.200000 0.212122 0.590435 0.160296 108 1454 \n", "4 1 0.226957 0.229270 0.436957 0.186900 82 1518 \n", "\n", " cnt \n", "0 985 \n", "1 801 \n", "2 1349 \n", "3 1562 \n", "4 1600 " ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 84, "id": "afc33abf", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
instantdtedayseasonyrmnthholidayweekdayworkingdayweathersittempatemphumwindspeedcasualregisteredcnt
7267272012-12-27111204120.2541670.2266420.6529170.35013324718672114
7277282012-12-28111205120.2533330.2550460.5900000.15547164424513095
7287292012-12-29111206020.2533330.2424000.7529170.12438315911821341
7297302012-12-30111200010.2558330.2317000.4833330.35075436414321796
7307312012-12-31111201120.2158330.2234870.5775000.15484643922902729
\n", "
" ], "text/plain": [ " instant dteday season yr mnth holiday weekday workingday \\\n", "726 727 2012-12-27 1 1 12 0 4 1 \n", "727 728 2012-12-28 1 1 12 0 5 1 \n", "728 729 2012-12-29 1 1 12 0 6 0 \n", "729 730 2012-12-30 1 1 12 0 0 0 \n", "730 731 2012-12-31 1 1 12 0 1 1 \n", "\n", " weathersit temp atemp hum windspeed casual registered \\\n", "726 2 0.254167 0.226642 0.652917 0.350133 247 1867 \n", "727 2 0.253333 0.255046 0.590000 0.155471 644 2451 \n", "728 2 0.253333 0.242400 0.752917 0.124383 159 1182 \n", "729 1 0.255833 0.231700 0.483333 0.350754 364 1432 \n", "730 2 0.215833 0.223487 0.577500 0.154846 439 2290 \n", "\n", " cnt \n", "726 2114 \n", "727 3095 \n", "728 1341 \n", "729 1796 \n", "730 2729 " ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 85, "id": "81f353dd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 731 entries, 0 to 730\n", "Data columns (total 16 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 instant 731 non-null int64 \n", " 1 dteday 731 non-null object \n", " 2 season 731 non-null int64 \n", " 3 yr 731 non-null int64 \n", " 4 mnth 731 non-null int64 \n", " 5 holiday 731 non-null int64 \n", " 6 weekday 731 non-null int64 \n", " 7 workingday 731 non-null int64 \n", " 8 weathersit 731 non-null int64 \n", " 9 temp 731 non-null float64\n", " 10 atemp 731 non-null float64\n", " 11 hum 731 non-null float64\n", " 12 windspeed 731 non-null float64\n", " 13 casual 731 non-null int64 \n", " 14 registered 731 non-null int64 \n", " 15 cnt 731 non-null int64 \n", "dtypes: float64(4), int64(11), object(1)\n", "memory usage: 91.5+ KB\n" ] } ], "source": [] }, { "cell_type": "code", "execution_count": 86, "id": "c6266239", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
instantseasonyrmnthholidayweekdayworkingdayweathersittempatemphumwindspeedcasualregisteredcnt
count731.000000731.000000731.000000731.000000731.000000731.000000731.000000731.000000731.000000731.000000731.000000731.000000731.000000731.000000731.000000
mean366.0000002.4965800.5006846.5198360.0287282.9972640.6839951.3953490.4953850.4743540.6278940.190486848.1764713656.1723674504.348837
std211.1658121.1108070.5003423.4519130.1671552.0047870.4652330.5448940.1830510.1629610.1424290.077498686.6224881560.2563771937.211452
min1.0000001.0000000.0000001.0000000.0000000.0000000.0000001.0000000.0591300.0790700.0000000.0223922.00000020.00000022.000000
25%183.5000002.0000000.0000004.0000000.0000001.0000000.0000001.0000000.3370830.3378420.5200000.134950315.5000002497.0000003152.000000
50%366.0000003.0000001.0000007.0000000.0000003.0000001.0000001.0000000.4983330.4867330.6266670.180975713.0000003662.0000004548.000000
75%548.5000003.0000001.00000010.0000000.0000005.0000001.0000002.0000000.6554170.6086020.7302090.2332141096.0000004776.5000005956.000000
max731.0000004.0000001.00000012.0000001.0000006.0000001.0000003.0000000.8616670.8408960.9725000.5074633410.0000006946.0000008714.000000
\n", "
" ], "text/plain": [ " instant season yr mnth holiday weekday \\\n", "count 731.000000 731.000000 731.000000 731.000000 731.000000 731.000000 \n", "mean 366.000000 2.496580 0.500684 6.519836 0.028728 2.997264 \n", "std 211.165812 1.110807 0.500342 3.451913 0.167155 2.004787 \n", "min 1.000000 1.000000 0.000000 1.000000 0.000000 0.000000 \n", "25% 183.500000 2.000000 0.000000 4.000000 0.000000 1.000000 \n", "50% 366.000000 3.000000 1.000000 7.000000 0.000000 3.000000 \n", "75% 548.500000 3.000000 1.000000 10.000000 0.000000 5.000000 \n", "max 731.000000 4.000000 1.000000 12.000000 1.000000 6.000000 \n", "\n", " workingday weathersit temp atemp hum windspeed \\\n", "count 731.000000 731.000000 731.000000 731.000000 731.000000 731.000000 \n", "mean 0.683995 1.395349 0.495385 0.474354 0.627894 0.190486 \n", "std 0.465233 0.544894 0.183051 0.162961 0.142429 0.077498 \n", "min 0.000000 1.000000 0.059130 0.079070 0.000000 0.022392 \n", "25% 0.000000 1.000000 0.337083 0.337842 0.520000 0.134950 \n", "50% 1.000000 1.000000 0.498333 0.486733 0.626667 0.180975 \n", "75% 1.000000 2.000000 0.655417 0.608602 0.730209 0.233214 \n", "max 1.000000 3.000000 0.861667 0.840896 0.972500 0.507463 \n", "\n", " casual registered cnt \n", "count 731.000000 731.000000 731.000000 \n", "mean 848.176471 3656.172367 4504.348837 \n", "std 686.622488 1560.256377 1937.211452 \n", "min 2.000000 20.000000 22.000000 \n", "25% 315.500000 2497.000000 3152.000000 \n", "50% 713.000000 3662.000000 4548.000000 \n", "75% 1096.000000 4776.500000 5956.000000 \n", "max 3410.000000 6946.000000 8714.000000 " ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 87, "id": "5db5b000", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
instantdtedayseasonyrmnthholidayweekdayworkingdayweathersittempatemphumwindspeedcasualregisteredcnt
012011-01-0110106020.3441670.3636250.8058330.160446331654985
122011-01-0210100020.3634780.3537390.6960870.248539131670801
232011-01-0310101110.1963640.1894050.4372730.24830912012291349
\n", "
" ], "text/plain": [ " instant dteday season yr mnth holiday weekday workingday \\\n", "0 1 2011-01-01 1 0 1 0 6 0 \n", "1 2 2011-01-02 1 0 1 0 0 0 \n", "2 3 2011-01-03 1 0 1 0 1 1 \n", "\n", " weathersit temp atemp hum windspeed casual registered \\\n", "0 2 0.344167 0.363625 0.805833 0.160446 331 654 \n", "1 2 0.363478 0.353739 0.696087 0.248539 131 670 \n", "2 1 0.196364 0.189405 0.437273 0.248309 120 1229 \n", "\n", " cnt \n", "0 985 \n", "1 801 \n", "2 1349 " ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "markdown", "id": "07523317", "metadata": {}, "source": [ "8. Agrupar la información por estación(season) y mes(mnth) y sumar el número de bicicletas prestadas." ] }, { "cell_type": "code", "execution_count": 88, "id": "58c718ce", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "44f8e26f", "metadata": {}, "source": [ "9. Usando la información obtenida en el punto anterior(8), realizar un diagrama de barra con la distribución de prestamos de bicicletas por mes " ] }, { "cell_type": "code", "execution_count": 89, "id": "90d12645", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [] }, { "cell_type": "markdown", "id": "f5c2b027", "metadata": {}, "source": [ "10. Realizar un diagrama de dispersión relacionando las variables atemp y cnt" ] }, { "cell_type": "code", "execution_count": 90, "id": "4d1a4f39", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'cnt')" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [] }, { "cell_type": "markdown", "id": "15f84cae", "metadata": {}, "source": [ "11. Eliminar la columna **instant** ya que no aporta información relevante (es un indice)" ] }, { "cell_type": "code", "execution_count": 91, "id": "8890b2e4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "5d66f01d", "metadata": {}, "source": [ "12. Realizar un mapa de calor usando Coeficiente de correlación de Pearson, ejecute la siguiente celda de código para lograr esto " ] }, { "cell_type": "code", "execution_count": 93, "id": "9b64c706", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "f, ax = plt.subplots(figsize=(10, 8))\n", "\n", "mask = np.triu(np.ones_like(data.corr()))\n", "sns.heatmap(data.corr('pearson'),annot = True,mask=mask)" ] }, { "cell_type": "markdown", "id": "a3bf4fe0", "metadata": {}, "source": [ "### 2. Tratamiento de missing, reparación dataset y codificación de variables" ] }, { "cell_type": "markdown", "id": "0a1415ed", "metadata": {}, "source": [ "1. Defina un vector **Y** con el valor de la columna **cnt** del dataset" ] }, { "cell_type": "code", "execution_count": 95, "id": "80d012b1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "a9787700", "metadata": {}, "source": [ "2. Elimine las columnas casual, registered y cnt del dataset ya que la columna casual + registered = cnt, registered es el número de usuarios registrados y casual son el número de usuarios casuales, ambas variables sumadas dan como resultado la columna cnt que es la variable que se busca predecir." ] }, { "cell_type": "code", "execution_count": 99, "id": "d2317c44", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "8af07aeb", "metadata": {}, "source": [ "3. Defina un vector X solo usando aquellas variables que tengan un coeffiente de correlación de Pearsonsuperior a 0.2\n", "\n", "Nota: Mirar mapa de calor" ] }, { "cell_type": "code", "execution_count": 103, "id": "6d0a188c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "8854e34f", "metadata": {}, "source": [ "### 3. Determinar el conjunto de entrenamiento y el de validación." ] }, { "cell_type": "markdown", "id": "2ba31431", "metadata": {}, "source": [ "1. Crear un vector X el cual contiene las características \n", "2. Crear un vector y el cual contiene las clases\n", "3. Imprimir el vector X\n", "4. Imprimir el vector y\n", "\n", "5. Hacer división de los datos 80% train , 20% test \n", "\n", "Ayuda: usar la función train_test_split de sklearn https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html\n", "\n", "6. Imprimir las dimensiones del conjunto de train y test\n" ] }, { "cell_type": "code", "execution_count": 104, "id": "e683cb50", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 105, "id": "27ae1add", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dimensiones vector de entrenamiento (584, 5)\n", "Dimensiones vector de prueba (147, 5)\n" ] } ], "source": [ "print(\"Dimensiones vector de entrenamiento\", )\n", "print(\"Dimensiones vector de prueba\", )" ] }, { "cell_type": "markdown", "id": "8bb7bb81", "metadata": {}, "source": [ "### 4. Entrenamiento del modelo" ] }, { "cell_type": "markdown", "id": "13929f8d", "metadata": {}, "source": [ "1. Crear un MLPRegressor model usando la librería sklearn https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html\n", "2. Entrenar el modelo\n", "3. Usar 6 capas ocultas de 1000,1000,100,50,10,5\n", "4. Utilizar un max_iter = 500\n", "5. Usar early_stopping = True\n", "\n", "Ayudas:\n", "\n", "- Usar la función fit\n", "- Solo usar el conjunto de entrenamiento (X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 385, "id": "f6e81333", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "64d88e97", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 395, "id": "ca460767", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de iteraciones necesarias para entrenar el modelo 263\n" ] } ], "source": [ "print(\"Número de iteraciones necesarias para entrenar el modelo\",)" ] }, { "cell_type": "markdown", "id": "603c3418", "metadata": {}, "source": [ "### 5. Calcular las métricas de evaluación" ] }, { "cell_type": "markdown", "id": "ee27e6df", "metadata": {}, "source": [ "1. Usar la función predict() para crear el vector de predicciones\n", "\n", "Ayuda: Utilice el conjunto de test (X_test)" ] }, { "cell_type": "code", "execution_count": 389, "id": "1dd4d731", "metadata": {}, "outputs": [], "source": [ "y_predict = " ] }, { "cell_type": "markdown", "id": "60b91a8a", "metadata": {}, "source": [ "**Nota:** Ejecutar la siguiente celda la cual calcula diversas métricas para problemas de regresión. " ] }, { "cell_type": "code", "execution_count": 392, "id": "7c576904", "metadata": {}, "outputs": [], "source": [ "mae_test = m.mean_absolute_error(y_test, y_predict )\n", "mape_test = np.mean(np.abs((y_test - y_predict)/ y_test))\n", "MSE_test = mean_squared_error(y_test,y_predict)\n", "RMSE_test = mean_squared_error(y_test,y_predict,squared=False) \n", "R2_test = r2_score(y_test,y_predict)" ] }, { "cell_type": "code", "execution_count": 393, "id": "618b0213", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MAE 729.0662072296108\n", "MAPE 0.2415938813321013\n", "MSE 917387.7243668467\n", "RMSE 957.803593836882\n", "R2 0.7596922000879675\n" ] } ], "source": [ "print(\"MAE\",mae_test)\n", "print(\"MAPE\",mape_test)\n", "print(\"MSE\",MSE_test)\n", "print(\"RMSE\",RMSE_test)\n", "print(\"R2\",R2_test)" ] }, { "cell_type": "markdown", "id": "79c46a3c", "metadata": {}, "source": [ "### 6. Conclusiones" ] }, { "cell_type": "markdown", "id": "016441ec", "metadata": {}, "source": [ "1. Describa brevemente los resultados obtenidos" ] }, { "cell_type": "markdown", "id": "1e21d313", "metadata": {}, "source": [ "El modelo presento un desempeño de 24% de Error Promedio Porcentual Medio (MAPE) y un R-cuadrado de 0.76 lo cual indica una correlación lineal positiva ya que tiende a 1 sin embargo el modelo presenta un desempeño regular se tendría que experimentar y jugar con el número de capas y neuronas por capas, etc." ] }, { "cell_type": "markdown", "id": "7fa68821", "metadata": {}, "source": [ "2. Realizar un gráfico de dispersión entre y_test y y_predict" ] }, { "cell_type": "code", "execution_count": 396, "id": "ed3c003b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 396, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [] }, { "cell_type": "markdown", "id": "4c488be0", "metadata": {}, "source": [ "# 2. Clasificación" ] }, { "cell_type": "markdown", "id": "a89673df", "metadata": {}, "source": [ "### Información del dataset\n", "\n", "\n", "#### Mobile Price Classification\n", "\n", "https://www.kaggle.com/datasets/iabhishekofficial/mobile-price-classification\n", "\n", "\n", "Bob ha comenzado su propia empresa de telefonía móvil. Quiere dar una pelea dura a las grandes empresas como Apple, Samsung, etc. No sabe cómo estimar el precio de los móviles que fabrica su empresa. En este competitivo mercado de telefonía móvil no se puede simplemente asumir cosas. Para resolver este problema, recopila datos de ventas de teléfonos móviles de varias empresas. Bob quiere averiguar alguna relación entre las funciones de un teléfono móvil (p. ej., RAM, memoria interna, etc.) y su precio de venta. Pero no es tan bueno en Machine Learning. Así que necesita tu ayuda para resolver este problema. En este problema, no tiene que predecir el precio real, sino un rango de precios que indica qué tan alto es el precio.\n", "\n", "\n", "Price_range: Esta es la variable objetivo con valor de 0 (costo bajo), 1 (costo medio), 2 (costo alto) y 3 (costo muy alto).\n", "\n", "\n", "Variables\n", "\n", "1. battery_power: Energía total que una batería puede almacenar en un tiempo medida en mAh\n", "1. blue: Tiene bluetooth o no\n", "\n", "1. clock_speed: velocidad a la que el microprocesador ejecuta las instrucciones\n", "1. dual_sim: Tiene soporte para dual sim o no\n", "1. fc: Cámara frontal de megapíxeles\n", "1. four_g: Tiene 4G o no\n", "1. int_memory: Memoria interna en gigabytes\n", "1. m_dep: Profundidad del móvil en cm\n", "1. mobile_wt: Peso del móvil\n", "1. n_cores: Número de núcleos del procesador\n", "1. pc: Cámara principal de megapíxeles\n", "1. px_height: Altura de la resolución de píxeles\n", "1. px_width: Ancho de la resolución de píxeles\n", "1. ram: Memoria de acceso aleatorio en Mega Bytes\n", "1. sc_h: Altura de la pantalla del móvil en cm\n", "1. sc_w: Ancho de pantalla del móvil en cm\n", "1. talk_time: tiempo máximo que durará una sola carga de la batería cuando esté\n", "1. three_g: Tiene 3G o no\n", "1. touch_screen: Tiene pantalla táctil o no\n", "1. wifi: Tiene wifi o no\n", "1. Price_range: Esta es la variable objetivo con valor de 0 (costo bajo), 1 (costo medio), 2 (costo alto) y 3 (costo muy alto)." ] }, { "cell_type": "markdown", "id": "804d09a6", "metadata": {}, "source": [ "### Tarea\n", "\n", "Predecir si un celular pertenece alguna de las siguientes clases :\n", "1. 0 (costo bajo)\n", "2. 1 (costo medio)\n", "3. 2 (costo alto)\n", "4. 3 (costo muy alto). " ] }, { "cell_type": "markdown", "id": "c4f50282", "metadata": {}, "source": [ "### 1- Análisis exploratorio de los datos" ] }, { "cell_type": "markdown", "id": "0effc3d9", "metadata": {}, "source": [ "1. Imprima el número de registros del dataset\n", "2. Imprima el número de variables del dataset\n", "3. Imprima el nombre de las columnas del dataset\n", "4. Imprima el **head** del dataset\n", "5. Imprima el **tail** del dataset\n", "6. Imprima **info** basica del dataset\n", "7. Imprima un **describe** del dataset\n", "8. Graficar la distribución de la clase ha predecir (price_range)\n" ] }, { "cell_type": "code", "execution_count": 400, "id": "21d732cd", "metadata": {}, "outputs": [], "source": [ "data = pd.read_csv(\"resources/train.csv\")" ] }, { "cell_type": "code", "execution_count": 402, "id": "823f3323", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de regristros 2000\n", "Número de variables 21\n" ] } ], "source": [ "print(\"Número de regristros\",)\n", "print(\"Número de variables\",)" ] }, { "cell_type": "code", "execution_count": 403, "id": "becfbf35", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
battery_powerblueclock_speeddual_simfcfour_gint_memorym_depmobile_wtn_cores...px_heightpx_widthramsc_hsc_wtalk_timethree_gtouch_screenwifiprice_range
084202.201070.61882...20756254997190011
1102110.5101530.71363...9051988263117371102
256310.5121410.91455...12631716260311291102
361512.5000100.81316...121617862769168111002
4182111.20131440.61412...12081212141182151101
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " battery_power blue clock_speed dual_sim fc four_g int_memory m_dep \\\n", "0 842 0 2.2 0 1 0 7 0.6 \n", "1 1021 1 0.5 1 0 1 53 0.7 \n", "2 563 1 0.5 1 2 1 41 0.9 \n", "3 615 1 2.5 0 0 0 10 0.8 \n", "4 1821 1 1.2 0 13 1 44 0.6 \n", "\n", " mobile_wt n_cores ... px_height px_width ram sc_h sc_w talk_time \\\n", "0 188 2 ... 20 756 2549 9 7 19 \n", "1 136 3 ... 905 1988 2631 17 3 7 \n", "2 145 5 ... 1263 1716 2603 11 2 9 \n", "3 131 6 ... 1216 1786 2769 16 8 11 \n", "4 141 2 ... 1208 1212 1411 8 2 15 \n", "\n", " three_g touch_screen wifi price_range \n", "0 0 0 1 1 \n", "1 1 1 0 2 \n", "2 1 1 0 2 \n", "3 1 0 0 2 \n", "4 1 1 0 1 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 403, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 404, "id": "aa00fa19", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
battery_powerblueclock_speeddual_simfcfour_gint_memorym_depmobile_wtn_cores...px_heightpx_widthramsc_hsc_wtalk_timethree_gtouch_screenwifiprice_range
199579410.510120.81066...12221890668134191100
1996196512.6100390.21874...915196520321110161112
1997191100.9111360.71088...868163230579151103
1998151200.9041460.11455...3366708691810191110
199951012.0151450.91686...483754391919421113
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " battery_power blue clock_speed dual_sim fc four_g int_memory \\\n", "1995 794 1 0.5 1 0 1 2 \n", "1996 1965 1 2.6 1 0 0 39 \n", "1997 1911 0 0.9 1 1 1 36 \n", "1998 1512 0 0.9 0 4 1 46 \n", "1999 510 1 2.0 1 5 1 45 \n", "\n", " m_dep mobile_wt n_cores ... px_height px_width ram sc_h sc_w \\\n", "1995 0.8 106 6 ... 1222 1890 668 13 4 \n", "1996 0.2 187 4 ... 915 1965 2032 11 10 \n", "1997 0.7 108 8 ... 868 1632 3057 9 1 \n", "1998 0.1 145 5 ... 336 670 869 18 10 \n", "1999 0.9 168 6 ... 483 754 3919 19 4 \n", "\n", " talk_time three_g touch_screen wifi price_range \n", "1995 19 1 1 0 0 \n", "1996 16 1 1 1 2 \n", "1997 5 1 1 0 3 \n", "1998 19 1 1 1 0 \n", "1999 2 1 1 1 3 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 404, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 405, "id": "9edfea18", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 2000 entries, 0 to 1999\n", "Data columns (total 21 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 battery_power 2000 non-null int64 \n", " 1 blue 2000 non-null int64 \n", " 2 clock_speed 2000 non-null float64\n", " 3 dual_sim 2000 non-null int64 \n", " 4 fc 2000 non-null int64 \n", " 5 four_g 2000 non-null int64 \n", " 6 int_memory 2000 non-null int64 \n", " 7 m_dep 2000 non-null float64\n", " 8 mobile_wt 2000 non-null int64 \n", " 9 n_cores 2000 non-null int64 \n", " 10 pc 2000 non-null int64 \n", " 11 px_height 2000 non-null int64 \n", " 12 px_width 2000 non-null int64 \n", " 13 ram 2000 non-null int64 \n", " 14 sc_h 2000 non-null int64 \n", " 15 sc_w 2000 non-null int64 \n", " 16 talk_time 2000 non-null int64 \n", " 17 three_g 2000 non-null int64 \n", " 18 touch_screen 2000 non-null int64 \n", " 19 wifi 2000 non-null int64 \n", " 20 price_range 2000 non-null int64 \n", "dtypes: float64(2), int64(19)\n", "memory usage: 328.2 KB\n" ] } ], "source": [] }, { "cell_type": "code", "execution_count": 406, "id": "0fb51ec5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
battery_powerblueclock_speeddual_simfcfour_gint_memorym_depmobile_wtn_cores...px_heightpx_widthramsc_hsc_wtalk_timethree_gtouch_screenwifiprice_range
count2000.0000002000.00002000.0000002000.0000002000.0000002000.0000002000.0000002000.0000002000.0000002000.000000...2000.0000002000.0000002000.0000002000.0000002000.0000002000.0000002000.0000002000.0000002000.0000002000.000000
mean1238.5185000.49501.5222500.5095004.3095000.52150032.0465000.501750140.2490004.520500...645.1080001251.5155002124.21300012.3065005.76700011.0110000.7615000.5030000.5070001.500000
std439.4182060.50010.8160040.5000354.3414440.49966218.1457150.28841635.3996552.287837...443.780811432.1994471084.7320444.2132454.3563985.4639550.4262730.5001160.5000761.118314
min501.0000000.00000.5000000.0000000.0000000.0000002.0000000.10000080.0000001.000000...0.000000500.000000256.0000005.0000000.0000002.0000000.0000000.0000000.0000000.000000
25%851.7500000.00000.7000000.0000001.0000000.00000016.0000000.200000109.0000003.000000...282.750000874.7500001207.5000009.0000002.0000006.0000001.0000000.0000000.0000000.750000
50%1226.0000000.00001.5000001.0000003.0000001.00000032.0000000.500000141.0000004.000000...564.0000001247.0000002146.50000012.0000005.00000011.0000001.0000001.0000001.0000001.500000
75%1615.2500001.00002.2000001.0000007.0000001.00000048.0000000.800000170.0000007.000000...947.2500001633.0000003064.50000016.0000009.00000016.0000001.0000001.0000001.0000002.250000
max1998.0000001.00003.0000001.00000019.0000001.00000064.0000001.000000200.0000008.000000...1960.0000001998.0000003998.00000019.00000018.00000020.0000001.0000001.0000001.0000003.000000
\n", "

8 rows × 21 columns

\n", "
" ], "text/plain": [ " battery_power blue clock_speed dual_sim fc \\\n", "count 2000.000000 2000.0000 2000.000000 2000.000000 2000.000000 \n", "mean 1238.518500 0.4950 1.522250 0.509500 4.309500 \n", "std 439.418206 0.5001 0.816004 0.500035 4.341444 \n", "min 501.000000 0.0000 0.500000 0.000000 0.000000 \n", "25% 851.750000 0.0000 0.700000 0.000000 1.000000 \n", "50% 1226.000000 0.0000 1.500000 1.000000 3.000000 \n", "75% 1615.250000 1.0000 2.200000 1.000000 7.000000 \n", "max 1998.000000 1.0000 3.000000 1.000000 19.000000 \n", "\n", " four_g int_memory m_dep mobile_wt n_cores ... \\\n", "count 2000.000000 2000.000000 2000.000000 2000.000000 2000.000000 ... \n", "mean 0.521500 32.046500 0.501750 140.249000 4.520500 ... \n", "std 0.499662 18.145715 0.288416 35.399655 2.287837 ... \n", "min 0.000000 2.000000 0.100000 80.000000 1.000000 ... \n", "25% 0.000000 16.000000 0.200000 109.000000 3.000000 ... \n", "50% 1.000000 32.000000 0.500000 141.000000 4.000000 ... \n", "75% 1.000000 48.000000 0.800000 170.000000 7.000000 ... \n", "max 1.000000 64.000000 1.000000 200.000000 8.000000 ... \n", "\n", " px_height px_width ram sc_h sc_w \\\n", "count 2000.000000 2000.000000 2000.000000 2000.000000 2000.000000 \n", "mean 645.108000 1251.515500 2124.213000 12.306500 5.767000 \n", "std 443.780811 432.199447 1084.732044 4.213245 4.356398 \n", "min 0.000000 500.000000 256.000000 5.000000 0.000000 \n", "25% 282.750000 874.750000 1207.500000 9.000000 2.000000 \n", "50% 564.000000 1247.000000 2146.500000 12.000000 5.000000 \n", "75% 947.250000 1633.000000 3064.500000 16.000000 9.000000 \n", "max 1960.000000 1998.000000 3998.000000 19.000000 18.000000 \n", "\n", " talk_time three_g touch_screen wifi price_range \n", "count 2000.000000 2000.000000 2000.000000 2000.000000 2000.000000 \n", "mean 11.011000 0.761500 0.503000 0.507000 1.500000 \n", "std 5.463955 0.426273 0.500116 0.500076 1.118314 \n", "min 2.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 6.000000 1.000000 0.000000 0.000000 0.750000 \n", "50% 11.000000 1.000000 1.000000 1.000000 1.500000 \n", "75% 16.000000 1.000000 1.000000 1.000000 2.250000 \n", "max 20.000000 1.000000 1.000000 1.000000 3.000000 \n", "\n", "[8 rows x 21 columns]" ] }, "execution_count": 406, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 407, "id": "7fba415a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 0], dtype=int64)" ] }, "execution_count": 407, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 408, "id": "4b7e9309", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\seaborn\\_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.\n", " warnings.warn(\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 408, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEHCAYAAABBW1qbAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAARVElEQVR4nO3df6zddX3H8eeLoqCIE+TCagsrMR2u+ANDqTqMOnHSuc0SJ64kajOYNRv+yhYX2BI3Nc1M3MyIk8XGX/UndqLS8YfadCLxB5YWUSiF0YiDho7Wn4hzuHbv/XG//Xja3tIjvd977o/nI7k53+/nfL7nvDih93W/53vO95uqQpIkgGNGHUCSNH1YCpKkxlKQJDWWgiSpsRQkSc2xow5wNE455ZRatGjRqGNI0oyydevW71fV2ET3zehSWLRoEVu2bBl1DEmaUZL85+Hu8+0jSVJjKUiSGktBktRYCpKkxlKQJDWWgiSp6bUUknwvyW1Jbk2ypRs7OcnGJHd3tycNzL8yyY4kdyW5sM9skqRDTcWewu9U1TlVtbRbvwLYVFWLgU3dOkmWACuBs4HlwNVJ5k1BPklSZxRvH60A1nXL64CLBsavqaqHq+oeYAewbOrjSdLc1fc3mgv4UpIC3l9Va4HTqmoXQFXtSnJqN3cBcNPAtju7sQMkWQ2sBjjjjDOOGODct370qP4DZpOt737tUW1/7zueMUlJZr4z3nbbUT/G+e89fxKSzA5fe+PXjvoxvvKCF05CktnhhTd+5VFv23cpnF9V93e/+DcmufMR5maCsUMuC9cVy1qApUuXetk4SZpEvb59VFX3d7e7gc8x/nbQA0nmA3S3u7vpO4HTBzZfCNzfZz5J0oF6K4UkJyQ5cf8y8FLgdmADsKqbtgq4rlveAKxMclySM4HFwOa+8kmSDtXn20enAZ9Lsv95PllVX0hyM7A+yWXAvcDFAFW1Lcl64A5gL3B5Ve3rMZ8k6SC9lUJVfRd41gTjPwAuOMw2a4A1fWWSJD0yv9EsSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJElN76WQZF6SbyW5vls/OcnGJHd3tycNzL0yyY4kdyW5sO9skqQDTcWewpuB7QPrVwCbqmoxsKlbJ8kSYCVwNrAcuDrJvCnIJ0nq9FoKSRYCvw98YGB4BbCuW14HXDQwfk1VPVxV9wA7gGV95pMkHajvPYV/Av4K+L+BsdOqahdAd3tqN74AuG9g3s5u7ABJVifZkmTLnj17egktSXNVb6WQ5A+A3VW1ddhNJhirQwaq1lbV0qpaOjY2dlQZJUkHOrbHxz4feHmSlwHHA09M8nHggSTzq2pXkvnA7m7+TuD0ge0XAvf3mE+SdJDe9hSq6sqqWlhVixg/gPzvVfVqYAOwqpu2CriuW94ArExyXJIzgcXA5r7ySZIO1eeewuG8C1if5DLgXuBigKralmQ9cAewF7i8qvaNIJ8kzVlTUgpVdQNwQ7f8A+CCw8xbA6yZikySpEP5jWZJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWp6K4UkxyfZnOTbSbYleXs3fnKSjUnu7m5PGtjmyiQ7ktyV5MK+skmSJtbnnsLDwIur6lnAOcDyJM8FrgA2VdViYFO3TpIlwErgbGA5cHWSeT3mkyQdpLdSqHEPdauP6X4KWAGs68bXARd1yyuAa6rq4aq6B9gBLOsrnyTpUL0eU0gyL8mtwG5gY1V9EzitqnYBdLendtMXAPcNbL6zG5MkTZFeS6Gq9lXVOcBCYFmSpz/C9Ez0EIdMSlYn2ZJky549eyYpqSQJpujTR1X1Y+AGxo8VPJBkPkB3u7ubthM4fWCzhcD9EzzW2qpaWlVLx8bG+owtSXNOn58+GkvypG75ccBLgDuBDcCqbtoq4LpueQOwMslxSc4EFgOb+8onSTrUsT0+9nxgXfcJomOA9VV1fZJvAOuTXAbcC1wMUFXbkqwH7gD2ApdX1b4e80mSDjJUKSTZVFUXHGlsUFV9B3j2BOM/ACbcrqrWAGuGySRJmnyPWApJjgceD5zSfcls/8HgJwJP6TmbJGmKHWlP4fXAWxgvgK38shQeBN7XXyxJ0ig8YilU1VXAVUneWFXvnaJMkqQRGeqYQlW9N8lvA4sGt6mqj/aUS5I0AsMeaP4Y8FTgVmD/J4IKsBQkaRYZ9iOpS4ElVXXIN4wlSbPHsF9eux349T6DSJJGb9g9hVOAO5JsZvyU2ABU1ct7SSVJGolhS+Hv+gwhSZoehv300Vf6DiJJGr1hP330U355GuvHMn7BnJ9V1RP7CiZJmnrD7imcOLie5CK8KpokzTqP6tTZVfV54MWTG0WSNGrDvn30ioHVYxj/3oLfWZCkWWbYTx/94cDyXuB7wIpJTyNJGqlhjyn8Sd9BJEmjN9QxhSQLk3wuye4kDyS5NsnCvsNJkqbWsAeaP8z4NZSfAiwA/q0bkyTNIsOWwlhVfbiq9nY/HwHGeswlSRqBYUvh+0lenWRe9/Nq4Ad9BpMkTb1hS+FS4FXAfwG7gFcCHnyWpFlm2I+kvhNYVVU/AkhyMvAPjJeFJGmWGHZP4Zn7CwGgqn4IPLufSJKkURm2FI5JctL+lW5PYdi9DEnSDDHsL/Z/BL6e5DOMn97iVcCa3lJJkkZi2G80fzTJFsZPghfgFVV1R6/JJElTbui3gLoSsAgkaRZ7VKfOliTNTpaCJKmxFCRJjaUgSWosBUlSYylIkpreSiHJ6Um+nGR7km1J3tyNn5xkY5K7u9vBb0pfmWRHkruSXNhXNknSxPrcU9gL/GVV/RbwXODyJEuAK4BNVbUY2NSt0923EjgbWA5cnWRej/kkSQfprRSqaldV3dIt/xTYzvhV21YA67pp64CLuuUVwDVV9XBV3QPsAJb1lU+SdKgpOaaQZBHjZ1X9JnBaVe2C8eIATu2mLQDuG9hsZzd28GOtTrIlyZY9e/b0mluS5preSyHJE4BrgbdU1YOPNHWCsTpkoGptVS2tqqVjY14RVJImU6+lkOQxjBfCJ6rqs93wA0nmd/fPB3Z34zuB0wc2Xwjc32c+SdKB+vz0UYAPAtur6j0Dd20AVnXLq4DrBsZXJjkuyZnAYmBzX/kkSYfq80I55wOvAW5Lcms39tfAu4D1SS4D7gUuBqiqbUnWM34m1r3A5VW1r8d8kqSD9FYKVfVVJj5OAHDBYbZZgxfvkaSR8RvNkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLU9FYKST6UZHeS2wfGTk6yMcnd3e1JA/ddmWRHkruSXNhXLknS4fW5p/ARYPlBY1cAm6pqMbCpWyfJEmAlcHa3zdVJ5vWYTZI0gd5KoapuBH540PAKYF23vA64aGD8mqp6uKruAXYAy/rKJkma2FQfUzitqnYBdLenduMLgPsG5u3sxg6RZHWSLUm27Nmzp9ewkjTXTJcDzZlgrCaaWFVrq2ppVS0dGxvrOZYkzS1TXQoPJJkP0N3u7sZ3AqcPzFsI3D/F2SRpzpvqUtgArOqWVwHXDYyvTHJckjOBxcDmKc4mSXPesX09cJJPAS8CTkmyE/hb4F3A+iSXAfcCFwNU1bYk64E7gL3A5VW1r69skqSJ9VYKVXXJYe664DDz1wBr+sojSTqy6XKgWZI0DVgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUTLtSSLI8yV1JdiS5YtR5JGkumValkGQe8D7g94AlwCVJlow2lSTNHdOqFIBlwI6q+m5V/QK4Blgx4kySNGekqkadoUnySmB5Vf1pt/4a4DlV9YaBOauB1d3qWcBdUx70V3cK8P1Rh5hFfD0nl6/n5Jkpr+VvVNXYRHccO9VJjiATjB3QWlW1Flg7NXEmR5ItVbV01DlmC1/PyeXrOXlmw2s53d4+2gmcPrC+ELh/RFkkac6ZbqVwM7A4yZlJHgusBDaMOJMkzRnT6u2jqtqb5A3AF4F5wIeqatuIY02GGfV21wzg6zm5fD0nz4x/LafVgWZJ0mhNt7ePJEkjZClIkhpLoWeetmPyJPlQkt1Jbh91lpkuyelJvpxke5JtSd486kwzWZLjk2xO8u3u9Xz7qDM9Wh5T6FF32o7/AH6X8Y/b3gxcUlV3jDTYDJXkBcBDwEer6umjzjOTJZkPzK+qW5KcCGwFLvL/zUcnSYATquqhJI8Bvgq8uapuGnG0X5l7Cv3ytB2TqKpuBH446hyzQVXtqqpbuuWfAtuBBaNNNXPVuIe61cd0PzPyL25LoV8LgPsG1nfiPzxNM0kWAc8GvjniKDNaknlJbgV2Axuraka+npZCv4542g5plJI8AbgWeEtVPTjqPDNZVe2rqnMYPxPDsiQz8i1OS6FfnrZD01b33ve1wCeq6rOjzjNbVNWPgRuA5aNN8uhYCv3ytB2alroDox8EtlfVe0adZ6ZLMpbkSd3y44CXAHeONNSjZCn0qKr2AvtP27EdWD9LTtsxEkk+BXwDOCvJziSXjTrTDHY+8BrgxUlu7X5eNupQM9h84MtJvsP4H4Mbq+r6EWd6VPxIqiSpcU9BktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVpQJJ3JHnJqHNIo+L3FKROknlVtW+mPbY0mdxT0JyQZFGSO5OsS/KdJJ9J8vgk30vytiRfBS5O8pEkr+y2OS/J17sLp2xOcmJ3Jsx3J7m5e5zXP8Jzvqi7kM0ngdu6sc8n2dpdiGX1wNyHkqzpnuumJKd140/t1m/u9mIeGtjmrQM5ZuxFXTS9WAqaS84C1lbVM4EHgT/vxv+nqp5fVdfsn9idq+rTjF8o5VmMn8vm58BlwE+q6jzgPOB1Sc58hOdcBvxNVS3p1i+tqnOBpcCbkjy5Gz8BuKl7rhuB13XjVwFXdc/XTqaY5KXA4u7xzwHO7S5CJB0VS0FzyX1V9bVu+ePA87vlT08w9yxgV1XdDFBVD3bnsnop8NruvPnfBJ7M+C/nw9lcVfcMrL8pybeBmxg/g+7+bX8B7D9XzlZgUbf8POBfu+VPDjzOS7ufbwG3AE87Qg5pKMeOOoA0hQ4+gLZ//WcTzM0E8/ePv7Gqvjjkc7bHTvIixvc4nldV/53kBuD47u7/rV8e4NvHkf9tBvj7qnr/kDmkobinoLnkjCTP65YvYfw6uodzJ/CUJOcBdMcTjmX8jLd/1l2LgCS/meSEIZ//14AfdYXwNOC5Q2xzE/BH3fLKgfEvApd2F8khyYIkpw6ZQzosS0FzyXZgVXd645OBfzncxO6a2n8MvLd7u2cj43/VfwC4A7glye3A+xl+j/sLwLHd87+T8V/4R/IW4C+SbGb89Mw/6fJ9ifG3k76R5DbgM8CJQ+aQDsuPpGpO6K5DfH1VzahLJCZ5PPDzqqokK4FLqmrFqHNp9vKYgjS9nQv8c3eltB8Dl442jmY79xSko5TkGcDHDhp+uKqeM4o80tGwFCRJjQeaJUmNpSBJaiwFSVJjKUiSmv8HyxCAj9nPSTgAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [] }, { "cell_type": "markdown", "id": "fb679511", "metadata": {}, "source": [ "9. Graficar la distribución de los n_cores\n" ] }, { "cell_type": "code", "execution_count": 412, "id": "8f32d035", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 412, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAASAklEQVR4nO3df7DldV3H8edLllRQE4crrbvYoq0mVoLdKKWMpBR/guYPmBGZstZpwJGyGrGZpJptnFGxJn9MKCgoQgSiaI6CaJJWLrsbyo+VcQvClZVdswKbIqF3f5zvfjzunl0Ocr/3e+7e52Pmzj3fz/l+z3kt7N7X/f76nFQVkiQBPGToAJKk2WEpSJIaS0GS1FgKkqTGUpAkNSuGDvBgHHroobVmzZqhY0jSkrJp06ZvVdXcpOeWdCmsWbOGjRs3Dh1DkpaUJP+6t+c8fCRJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqlvQdzRLA55/1i0NHAOAXr/380BGkB809BUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGqe5kLQkbVn/2aEj8JQ/ePbQERacewqSpMY9BUnqydlnnz10BOCB5XBPQZLU9FYKSQ5P8rkkW5LclOT13fjZSb6R5Pru6/lj25yVZGuSW5I8t69skqTJ+jx8dC/whqranOSRwKYkV3fPvaOq3ja+cpIjgZOBpwKPAz6T5ElVdV+PGSVJY3rbU6iq7VW1uXt8N7AFWLWPTU4ELqmqe6rqVmArcExf+SRJe1qUcwpJ1gBHA1/qhs5I8pUk5yc5pBtbBXx9bLNtTCiRJOuSbEyycefOnX3GlqRlp/dSSPII4HLgzKq6C3gP8ETgKGA78PZdq07YvPYYqDq3quaran5ubq6f0JK0TPV6SWqSAxkVwkVV9RGAqrpz7Pn3Ap/oFrcBh49tvhq4o898Q7n9j39y6AgAPP4Pbxg6gqQZ01spJAlwHrClqs4ZG19ZVdu7xZcAN3aPrwQ+nOQcRiea1wIb+sqn+3fsXxw7dAQAvvi6Lw4dYUG88w0fHzoCZ7z9RUNH0Izrc0/hWOBU4IYk13djbwJOSXIUo0NDtwGvBaiqm5JcCtzM6Mql073ySJIWV2+lUFVfYPJ5gk/uY5v1wPq+MkmS9m2/m+bip3/vwqEjsOmtrx46giT9QJzmQpLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSmv3u5jVJD876V71s6Aj8wYcuGzrCsuWegiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJElNb6WQ5PAkn0uyJclNSV7fjT8mydVJvtZ9P2Rsm7OSbE1yS5Ln9pVNkjRZn3sK9wJvqKqnAD8HnJ7kSOCNwDVVtRa4plume+5k4KnACcC7kxzQYz5J0m56K4Wq2l5Vm7vHdwNbgFXAicAF3WoXACd1j08ELqmqe6rqVmArcExf+SRJe1qUcwpJ1gBHA18CDquq7TAqDuCx3WqrgK+PbbatG5MkLZLeSyHJI4DLgTOr6q59rTphrCa83rokG5Ns3Llz50LFlCTRcykkOZBRIVxUVR/phu9MsrJ7fiWwoxvfBhw+tvlq4I7dX7Oqzq2q+aqan5ub6y+8JC1DfV59FOA8YEtVnTP21JXAad3j04CPjY2fnOShSY4A1gIb+sonSdrTih5f+1jgVOCGJNd3Y28C3gJcmuQ1wO3AywGq6qYklwI3M7py6fSquq/HfJKk3fRWClX1BSafJwA4fi/brAfW95VJkrRv3tEsSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSM1UpJLlmmjFJ0tK2Yl9PJnkYcBBwaJJDgHRPPQp4XM/ZJEmLbJ+lALwWOJNRAWzie6VwF/Cu/mJJkoawz8NHVfXnVXUE8LtV9YSqOqL7elpVvXNf2yY5P8mOJDeOjZ2d5BtJru++nj/23FlJtia5JclzH/SfTJL0gN3fngIAVfUXSZ4JrBnfpqou3MdmHwDeCey+zjuq6m3jA0mOBE4Gnspor+QzSZ5UVfdNk0+StDCmKoUkHwSeCFwP7PpBXez5A7+pqmuTrJkyx4nAJVV1D3Brkq3AMcA/TLm9JGkBTFUKwDxwZFXVArznGUleDWwE3lBV/w6sAv5xbJ1t3dgekqwD1gE8/vGPX4A4kqRdpr1P4UbgRxbg/d7DaI/jKGA78PZuPBPWnVhAVXVuVc1X1fzc3NwCRJIk7TLtnsKhwM1JNgD37Bqsqhc/kDerqjt3PU7yXuAT3eI24PCxVVcDdzyQ15YkPXjTlsLZC/FmSVZW1fZu8SWM9kAArgQ+nOQcRiea1wIbFuI9JUnTm/bqo88/0BdOcjFwHKMb37YBbwaOS3IUo0NDtzG6D4KquinJpcDNwL3A6V55JEmLb9qrj+7me8f4fwg4EPivqnrU3rapqlMmDJ+3j/XXA+unySNJ6se0ewqPHF9OchKjS0YlSfuRH2iW1Kr6KPDshY0iSRratIePXjq2+BBG9y0sxD0LkqQZMu3VRy8ae3wvo5PEJy54GknSoKY9p/BrfQeRJA1v2g/ZWZ3kim7W0zuTXJ5kdd/hJEmLa9oTze9ndIPZ4xjNSfTxbkyStB+ZthTmqur9VXVv9/UBwImHJGk/M20pfCvJq5Ic0H29Cvi3PoNJkhbftKXw68ArgG8ymt30ZYAnnyVpPzPtJal/ApzWffYBSR4DvI1RWUiS9hPT7in81K5CAKiqbwNH9xNJkjSUaUvhIUkO2bXQ7SlMu5chSVoipv3B/nbg75Ncxmh6i1fgjKaStN+Z9o7mC5NsZDQJXoCXVtXNvSaTJC26qQ8BdSVgEUjSfuwHmjpbkrR/shQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkpreSiHJ+Ul2JLlxbOwxSa5O8rXu+/gH95yVZGuSW5I8t69ckqS963NP4QPACbuNvRG4pqrWAtd0yyQ5EjgZeGq3zbuTHNBjNknSBL2VQlVdC3x7t+ETgQu6xxcAJ42NX1JV91TVrcBW4Ji+skmSJlvscwqHVdV2gO77Y7vxVcDXx9bb1o3tIcm6JBuTbNy5c2evYSVpuZmVE82ZMFaTVqyqc6tqvqrm5+bmeo4lScvLYpfCnUlWAnTfd3Tj24DDx9ZbDdyxyNkkadlb7FK4Ejite3wa8LGx8ZOTPDTJEcBaYMMiZ5OkZW9FXy+c5GLgOODQJNuANwNvAS5N8hrgduDlAFV1U5JLgZuBe4HTq+q+vrJJkibrrRSq6pS9PHX8XtZfD6zvK48k6f7NyolmSdIMsBQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJKaFUO8aZLbgLuB+4B7q2o+yWOAvwLWALcBr6iqfx8inyQtV0PuKfxSVR1VVfPd8huBa6pqLXBNtyxJWkSzdPjoROCC7vEFwEnDRZGk5WmoUijgqiSbkqzrxg6rqu0A3ffHTtowybokG5Ns3Llz5yLFlaTlYZBzCsCxVXVHkscCVyf56rQbVtW5wLkA8/Pz1VdASVqOBtlTqKo7uu87gCuAY4A7k6wE6L7vGCKbJC1ni14KSQ5O8shdj4HnADcCVwKndaudBnxssbNJ0nI3xOGjw4Arkux6/w9X1aeSXAdcmuQ1wO3AywfIJknL2qKXQlX9C/C0CeP/Bhy/2HkkSd8zS5ekSpIGZilIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1M1cKSU5IckuSrUneOHQeSVpOZqoUkhwAvAt4HnAkcEqSI4dNJUnLx0yVAnAMsLWq/qWq/he4BDhx4EyStGykqobO0CR5GXBCVf1Gt3wq8LNVdcbYOuuAdd3ik4FbFjjGocC3Fvg1+2DOhWXOhbUUci6FjNBPzh+tqrlJT6xY4Dd6sDJh7Ptaq6rOBc7tLUCysarm+3r9hWLOhWXOhbUUci6FjLD4OWft8NE24PCx5dXAHQNlkaRlZ9ZK4TpgbZIjkvwQcDJw5cCZJGnZmKnDR1V1b5IzgE8DBwDnV9VNixyjt0NTC8ycC8ucC2sp5FwKGWGRc87UiWZJ0rBm7fCRJGlAloIkqbEUOknOT7IjyY1DZ9mXJIcn+VySLUluSvL6oTPtLsnDkmxI8uUu4x8NnWlfkhyQ5J+SfGLoLHuT5LYkNyS5PsnGofPsTZJHJ7ksyVe7v6PPGDrT7pI8ufvvuOvrriRnDp1rkiS/3f0bujHJxUke1vt7ek5hJMmzgO8AF1bVTwydZ2+SrARWVtXmJI8ENgEnVdXNA0drkgQ4uKq+k+RA4AvA66vqHweONlGS3wHmgUdV1QuHzjNJktuA+aqa6ZutklwA/F1Vva+7gvCgqvqPgWPtVTe1zjcY3ST7r0PnGZdkFaN/O0dW1X8nuRT4ZFV9oM/3dU+hU1XXAt8eOsf9qartVbW5e3w3sAVYNWyq71cj3+kWD+y+ZvK3jySrgRcA7xs6y1KX5FHAs4DzAKrqf2e5EDrHA/88a4UwZgXw8CQrgINYhPu2LIUlLMka4GjgSwNH2UN3SOZ6YAdwdVXNXMbOnwG/D/zfwDnuTwFXJdnUTfUyi54A7ATe3x2Oe1+Sg4cOdT9OBi4eOsQkVfUN4G3A7cB24D+r6qq+39dSWKKSPAK4HDizqu4aOs/uquq+qjqK0V3pxySZuUNySV4I7KiqTUNnmcKxVfV0RjMIn94d7pw1K4CnA++pqqOB/wJmdvr77vDWi4G/HjrLJEkOYTQh6BHA44CDk7yq7/e1FJag7jj95cBFVfWRofPsS3f44G+BE4ZNMtGxwIu74/WXAM9O8qFhI01WVXd033cAVzCaUXjWbAO2je0VXsaoJGbV84DNVXXn0EH24peBW6tqZ1V9F/gI8My+39RSWGK6k7jnAVuq6pyh80ySZC7Jo7vHD2f0l/urg4aaoKrOqqrVVbWG0WGEz1ZV77+JPVBJDu4uKqA7HPMcYOaukquqbwJfT/Lkbuh4YGYugJjgFGb00FHnduDnkhzU/bs/ntE5xF5ZCp0kFwP/ADw5ybYkrxk6014cC5zK6LfaXZfUPX/oULtZCXwuyVcYzWd1dVXN7OWeS8BhwBeSfBnYAPxNVX1q4Ex78zrgou7//VHAnw4bZ7IkBwG/wui375nU7XFdBmwGbmD087r3KS+8JFWS1LinIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAG0E3EJs0cS0HaTZI13bTP7+2mLb6quwlv0ro/luQz3TThm5M8MSNv7aY7viHJK7t1j+umPf8wcEM3P9Rbk1yX5CtJXtuttzLJtd09KDcm+YVF/ONrmZupz2iWZsha4JSq+s1uyuJfBSZNgXER8JaquqKb6/4hwEsZ3bj1NOBQ4Lok13brHwP8RFXd2k1s959V9TNJHgp8MclV3fafrqr13R7FQT3+OaXvYylIk91aVdd3jzcBa3ZfoZt6YlVVXQFQVf/Tjf88cHFV3QfcmeTzwM8AdwEbqurW7iWeA/xUkpd1yz/MqIyuA87v5rj66FgOqXeWgjTZPWOP7wMmHT7KXrbd2ziMZg4dX+91VfXpPV5gNAvqC4APJnlrVV14P3mlBeE5BekH1E1Zvi3JSQBJHtrNqXMt8MrunMEcow+e2TDhJT4N/Fa3R0CSJ3WT3/0ooym938to8sNZnmlU+xn3FKQH51TgL5P8MfBd4OWMprZ+BvBlRh+O8/tV9c0kP77btu9jdFhqczcL5k7gJOA44PeSfJfRR8S+uv8/hjTihHiSpMbDR5KkxsNH0hSSvIvRZ1mM+/Oqev8QeaS+ePhIktR4+EiS1FgKkqTGUpAkNZaCJKmxFCRJzf8D7Ia1lcGFL9QAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [] }, { "cell_type": "markdown", "id": "cdf5a6a3", "metadata": {}, "source": [ "### 2. Tratamiento de missing, reparación dataset y codificación de variables" ] }, { "cell_type": "code", "execution_count": 681, "id": "50eb9f51", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f5afdd5a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 684, "id": "c5cb1a97", "metadata": {}, "outputs": [], "source": [ "scaler = StandardScaler()" ] }, { "cell_type": "code", "execution_count": null, "id": "8635441d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "7fbb4302", "metadata": {}, "source": [ "### 3. Determinar el conjunto de entrenamiento y el de validación.\n", "\n", "1. Hacer división de los datos 80% train , 20% test Crear un vector X el cual contiene las características \n", "2. Imprimir el shape o dimensiones del vector de entrenamiento (x_train)\n", "2. Imprimir el shape o dimensiones del vector de prueba (x_test)\n", "Ayuda: usar la función train_test_split de sklearn https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html\n", "\n" ] }, { "cell_type": "code", "execution_count": 752, "id": "15bafdd7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 753, "id": "1a6bb765", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dimensiones vector de entrenamiento (1600, 20)\n" ] } ], "source": [ "print(\"Dimensiones vector de entrenamiento\", )" ] }, { "cell_type": "code", "execution_count": 754, "id": "a8209442", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dimensiones vector de prueba (400, 20)\n" ] } ], "source": [ "print(\"Dimensiones vector de prueba\", )" ] }, { "cell_type": "code", "execution_count": 755, "id": "4f20d139", "metadata": {}, "outputs": [], "source": [ "x_train = scaler.fit_transform(x_train) #Normalizamos los datos" ] }, { "cell_type": "markdown", "id": "4f345f89", "metadata": {}, "source": [ "### 4. Entrenamiento del modelo\n", "\n", "\n", "1. Crear un MLPClassifier model usando la librería sklearn https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html\n", "2. Entrenar el modelo\n", "3. Usar 4 capas ocultas de (5000,1000,1000,10) neuronas\n", "5. Usar early_stopping = True\n", "4. Usar un alpha=1e-5 \n", "6. Usar solver='lbfgs'\n", "\n", "Ayudas:\n", "\n", "- Usar la función fit\n", "- Solo usar el conjunto de entrenamiento (X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 798, "id": "d333355e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 799, "id": "782040b7", "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "MLPClassifier(alpha=1e-05, early_stopping=True,\n", " hidden_layer_sizes=(5000, 1000, 1000, 10), random_state=1,\n", " solver='lbfgs', verbose=True, warm_start=True)" ] }, "execution_count": 799, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 800, "id": "c33c43b4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de iteraciones necesarias para entrenar el modelo 51\n" ] } ], "source": [ "print(\"Número de iteraciones necesarias para entrenar el modelo\",)" ] }, { "cell_type": "markdown", "id": "68998628", "metadata": {}, "source": [ "### 5. Calcular las métricas de evaluación\n", "\n", "**Nota:** Ejecutar la siguiente función, la cual calcula crea la matriz de confusión y algunas métricas. " ] }, { "cell_type": "code", "execution_count": 801, "id": "29a5fb36", "metadata": {}, "outputs": [], "source": [ " def metrics(y_true,y_pred):\n", " \"\"\"\n", " This method calculate some metrics shuch as acurracy,f1-score,precision and create confusion matrix figure.\n", "\n", " Args:\n", " y_true (numpy_array): true classes\n", " y_pred (numpy_array): predict classes\n", "\n", " Returns:\n", " \n", " cm_fig (ConfusionMatrixDisplay: Confusion matrix figure\n", " accuracy (float): acurracy\n", " report (dict): some metrics\n", "\n", " \"\"\"\n", " cm = confusion_matrix(y_true,y_pred, normalize='true')\n", " report = classification_report(y_true,y_pred,output_dict=True)\n", " cm_fig = ConfusionMatrixDisplay(confusion_matrix=cm)\n", " return cm_fig,report[\"accuracy\"],report" ] }, { "cell_type": "markdown", "id": "6be37a3d", "metadata": {}, "source": [ "1. Usar la función predict() para crear el vector de predicciones\n", "\n", "\n", "Ayuda: Utilice el conjunto de test (X_test)" ] }, { "cell_type": "code", "execution_count": 802, "id": "a65904f2", "metadata": {}, "outputs": [], "source": [ "x_test_norma = scaler.transform(x_test)" ] }, { "cell_type": "code", "execution_count": 803, "id": "8c99b503", "metadata": {}, "outputs": [], "source": [ "y_predict = clf.predict(x_test_norma)" ] }, { "cell_type": "code", "execution_count": 804, "id": "b3babd3a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ACCURACY 0.95\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "\"\"\"\n", "Utiliza la función metrics, debes reemplazar las variables\n", "y_test por las clases del conjunto de test y y_predict por las predicciones obtenidas de tu modelo.\n", "\n", "\"\"\"\n", "cm_fig,test_score, report = metrics(y_test,y_predict)\n", "cm_fig.plot(cmap=plt.cm.Blues)\n", "print(\"ACCURACY\",test_score)" ] }, { "cell_type": "markdown", "id": "9f29571a", "metadata": {}, "source": [ "### 6. Conclusiones\n", "\n", "Describa brevemente los resultados obtenidos, incluyendo el accuracy y mencionando el comportamiento del modelo clasificando muestras para ambas clases." ] }, { "cell_type": "markdown", "id": "b04952a7", "metadata": {}, "source": [ "Como podemos ver el modelo es realmente bueno ya que con muestras que jamás conoció durante el entrenamiento obtuvo un accuracy de 0.95 además tiene un buen resultado clasificando todas las clases:\n", "\n", "- 0 (costo bajo)\n", "- 1 (costo medio)\n", "- 2 (costo alto)\n", "- 3 (costo muy alto)." ] }, { "cell_type": "markdown", "id": "3f4d3c16", "metadata": {}, "source": [ "\n", "\n", "\n", "Profesor: Jose Alberto Arango Sánchez
[](https://www.linkedin.com/in/jose-alberto-arango-sanchez-79a337142/)\n", "\n", " \n", "\n", "@jose.arangos
[](https://github.com/josearangos)" ] }, { "cell_type": "markdown", "id": "b41d532b", "metadata": {}, "source": [ "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }