{ "cells": [ { "cell_type": "markdown", "id": "03309598", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "fcca3bb1", "metadata": {}, "source": [ "# Laboratorio Procesamiento de datos con Python" ] }, { "cell_type": "code", "execution_count": 2, "id": "6f7c4a67", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "a027e1d7", "metadata": {}, "source": [ "# Ejercicios en Numpy:" ] }, { "cell_type": "markdown", "id": "713d35ab", "metadata": {}, "source": [ "## Ejercicio #1:" ] }, { "cell_type": "markdown", "id": "67ac4ae7", "metadata": {}, "source": [ "Calcular la matriz Cauchy: Dados dos vectores (matrices numpy 1D), **x** e **y**, construya la matriz de Cauchy:\n", "\n", "$\n", " C_{i_j} = \\frac{1}{X_i - Y_j} \n", "$\n", "\n", "Consulte https://en.wikipedia.org/wiki/Cauchy_matrix, los elementos de la matriz Cauchy son el resultado de restar las posiciones correspondientes en los vectores **x** e **y** donde las filas corresponden a **x** y las columnas a **y**.\n", "\n", "**NOTAS:** \n", "1. Cree una matriz con ceros con la forma de salida deseada, use la brocasting de **x** como vector de fila y luego el brocasting de **y** como vector de columna con reshape(-1,1), redondee el valor final y solo muestre 8 decimales.\n", "\n", "\n", "#### Ejemplo\n", "\n", "```\n", ">> x = np.array([45, 31, 67, 75, 54])\n", ">> y = np.array([17, 7, 15, 15, 18])\n", ">> cauchy(c,y)\n", "array([[0.03571429, 0.02631579, 0.03333333, 0.03333333, 0.03703704],\n", " [0.07142857, 0.04166667, 0.0625 , 0.0625 , 0.07692308],\n", " [0.02 , 0.01666667, 0.01923077, 0.01923077, 0.02040816],\n", " [0.01724138, 0.01470588, 0.01666667, 0.01666667, 0.01754386],\n", " [0.02702703, 0.0212766 , 0.02564103, 0.02564103, 0.02777778]])\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "id": "211e4c78", "metadata": {}, "outputs": [], "source": [ "def cauchy(x, y):\n", "\n", " result = ... # Escrine tu código dentro de esta función\n", " \n", " return result" ] }, { "cell_type": "code", "execution_count": null, "id": "83cbc764", "metadata": {}, "outputs": [], "source": [ "# Esta ejecución debera devolver un ValueError exception\n", "x = np.array([45, 31, 67, 75, 54])\n", "y = np.array([17, 7, 15, 75, 18])\n", "cauchy(x,y)" ] }, { "cell_type": "markdown", "id": "87b4bcbb", "metadata": {}, "source": [ "## Ejercicio #2:" ] }, { "cell_type": "markdown", "id": "2ec3f726", "metadata": {}, "source": [ "**Position of closest scalar:** Cree una función llamada minimo(X,v)la cual dado un vector 1D **X**, encuentre la posición del elemento más cercano a v:\n", "\n", "**RETO: resolverlo con una línea de código**\n", "\n", "\n", "#### Ejemplo\n", "```\n", ">> x=np.array([25, 28, 31, 34, 37, 40, 43, 46, 49, 52])\n", ">> minimo(x,34)\n", "Respuesta= 3\n", "```" ] }, { "cell_type": "markdown", "id": "26409a8e", "metadata": {}, "source": [ "## Ejercicio #3:" ] }, { "cell_type": "markdown", "id": "78c57274", "metadata": {}, "source": [ "**Substracting row mean:** Crear una función llamada my_media(X) la cual cual dada una matriz debe devolver una nueva matriz con las mismas dimensiones en la que a cada componente se le resta la media de su propia fila.\n", "\n", "**RETO:** Resolverlo con una línea de código\n", "\n", "**SUGERENCIA:** Use broadcasting\n", "\n", "##### Ejemplo:\n", "\n", "```\n", "\n", ">> X = np.array([[1, 2, 3], [4, 5, 6],[7,8,9]])\n", ">> my_media(X)\n", "array([[-1., 0., 1.],\n", " [-1., 0., 1.],\n", " [-1., 0., 1.]])\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "id": "cf0edca4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "30ea700f", "metadata": {}, "source": [ "## Ejercicio #4:" ] }, { "cell_type": "markdown", "id": "b66adb52", "metadata": {}, "source": [ "**Double the diagonal:** Crear una función que reciba como parámetro una matriz y retorne una matriz con las mismas dimensiones pero con la diagonal principal multiplicada por 2, \n", "\n", "\n", "#### Ejemplo:\n", "\n", "```\n", ">> X = np.array([[79, 45, 67, 8, 37],\n", ">> [47, 40, 5, 79, 86],\n", ">> [72, 25, 44, 45, 22],\n", ">> [12, 85, 8, 53, 28],\n", ">> [ 4, 37, 36, 40, 16]])\n", ">> \n", ">> doublediag(X)\n", "\n", "array([[158., 45., 67., 8., 37.],\n", " [ 47., 80., 5., 79., 86.],\n", " [ 72., 25., 88., 45., 22.],\n", " [ 12., 85., 8., 106., 28.],\n", " [ 4., 37., 36., 40., 32.]])\n", "```\n", "\n", "**SUGERENCIA:** Use np.eye\n", "\n", "**RETO:** Resolverlo con una línea de código" ] }, { "cell_type": "markdown", "id": "ce25288c", "metadata": {}, "source": [ "# Ejercicios en Pandas:" ] }, { "cell_type": "code", "execution_count": 10, "id": "f5809eb9", "metadata": {}, "outputs": [], "source": [ "def create_df(missing=False, n=10):\n", " itemid = np.random.randint(100000, size=n)+1000\n", " category = np.random.randint(3, size=n)\n", " price = np.round(np.random.normal(loc=100, scale=10, size=n),2)\n", " margin = np.round(np.random.normal(loc=10, scale=1, size=n),2)\n", " \n", " if missing:\n", " nmissing = np.random.randint(len(price)//2)+2 \n", " price[np.random.permutation(len(price))[:nmissing]] = np.nan\n", " \n", " d = pd.DataFrame(np.r_[[price, category, margin]].T, index=itemid, columns=[\"price\", \"category\", \"margin\"])\n", " d.index.name=\"itemid\"\n", " if np.random.random()>.5:\n", " d = d[d.columns[:2]]\n", " \n", " return d" ] }, { "cell_type": "markdown", "id": "b422ce1f", "metadata": {}, "source": [ "## Ejercicio #1:\n", "\n", "**Extract data:** Dado el dataframe llamado **df** con dos columnas: **price y category** elabore una función que filtre aquellos cuyo: price > 100, si el dataframe contiene la columna margin deberá también seleccionar aquellos cuyo margin > 10 or el price > 100. su función debe devolver UNA LISTA con los ids de elementos de las filas seleccionadas\n", "\n", "\n", "**NOTA:** su función no debe modificar el dataframe original, hacer una copia de este , completar los valores en la copia y devolverla.\n", "\n", "#### Ejemplo:\n", "\n", "```\n", "Si el dataFrame de entrada es:\n", "\n", " price category margin\n", "itemid \n", "39059 98.11 0.0 11.04\n", "19526 98.11 1.0 11.25\n", "78176 94.34 1.0 10.51\n", "50948 102.37 1.0 10.77\n", "12111 98.07 1.0 8.50\n", "56191 98.53 1.0 11.65\n", "38887 91.49 2.0 11.24\n", "77915 117.30 0.0 8.64\n", "55010 96.13 0.0 8.95\n", "45925 98.59 1.0 10.45\n", "\n", "La Lista que se debe retornar es:\n", "\n", "[39059, 19526, 78176, 50948, 56191, 38887, 77915, 45925] \n", " \n", " ```" ] }, { "cell_type": "code", "execution_count": 14, "id": "844d3734", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricecategory
itemid
1895288.772.0
3283592.822.0
63324104.862.0
68229106.171.0
6935101.071.0
4819485.842.0
4896897.080.0
3117086.572.0
5076691.611.0
60814108.912.0
\n", "
" ], "text/plain": [ " price category\n", "itemid \n", "18952 88.77 2.0\n", "32835 92.82 2.0\n", "63324 104.86 2.0\n", "68229 106.17 1.0\n", "6935 101.07 1.0\n", "48194 85.84 2.0\n", "48968 97.08 0.0\n", "31170 86.57 2.0\n", "50766 91.61 1.0\n", "60814 108.91 2.0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = create_df()\n", "df" ] }, { "cell_type": "markdown", "id": "7ab6aae5", "metadata": {}, "source": [ "## Ejercicio #2:" ] }, { "cell_type": "markdown", "id": "65efba39", "metadata": {}, "source": [ "**Group statistics:** Crear una función que retorne un dataframe con los precios máximos, mínimo y promedio por categoria.\n", "\n", "**Nota: Usar el dataframe del punto 1.**\n", "\n", "\n", "**NOTA:** su función no debe modificar el dataframe original, hacer una copia de este , completar los valores en la copia y devolverla.\n", "\n", "\n", "### Ejemplo\n", "\n", "```\n", "Si el dataFrame fuera\n", "\n", " price category margin\n", "itemid \n", "17946 93.85 1.0 10.64\n", "61190 91.72 1.0 9.76\n", "39639 100.16 1.0 10.67\n", "17791 110.44 2.0 9.65\n", "7333 101.05 1.0 9.69\n", "77362 122.33 0.0 11.14\n", "92646 108.13 2.0 10.58\n", "27797 85.52 2.0 10.88\n", "31746 97.56 0.0 9.75\n", "12355 101.04 2.0 9.51\n", "\n", "El resultado debe ser\n", "\n", " media maximo minimo\n", "categoria \n", "0 109.9450 122.33 97.56\n", "1 96.6950 101.05 91.72\n", "2 101.2825 110.44 85.52\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "9fdd7922", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "fc789ba5", "metadata": {}, "source": [ "## Ejercicio #3:" ] }, { "cell_type": "markdown", "id": "015bd064", "metadata": {}, "source": [ "**Fill in missing data:** Dado un dataFrame llamado **df_miss** Rellene los datos que faltan en la columna de price con el siguiente procedimiento:\n", " \n", " - 1) Calcule la media y la desviación estándar de los precios disponibles\n", " - 2) Realice una muestreo de una distribución normal con la media y la desviación estándar calculada en el paso anterior(ver [np.random.normal])(https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html), generar tantas muestras como datos faltantes.\n", " - 3) Sustituir los valores que faltan con las muestras.\n", " \n", " \n", " **NOTA:** su función no debe modificar el dataframe original, hacer una copia de este , completar los valores en la copia y devolverla.\n", " \n", " \n", " #### Ejemplo:\n", " \n", " ```\n", " Si el dataFrame de entrada es:\n", " \n", " price category margin\n", "itemid \n", "18922 NaN 1.0 10.32\n", "69500 121.25 1.0 10.22\n", "76442 90.25 1.0 12.60\n", "33863 106.51 0.0 10.26\n", "15904 95.87 1.0 11.51\n", "41946 103.47 2.0 9.85\n", "85451 93.08 2.0 9.56\n", "70028 116.68 1.0 9.11\n", "26860 NaN 2.0 9.71\n", "12807 91.48 0.0 9.77\n", "\n", "El resultado debe ser:\n", "\n", " price category margin\n", "itemid \n", "18922 97.441188 1.0 10.32\n", "69500 121.250000 1.0 10.22\n", "76442 90.250000 1.0 12.60\n", "33863 106.510000 0.0 10.26\n", "15904 95.870000 1.0 11.51\n", "41946 103.470000 2.0 9.85\n", "85451 93.080000 2.0 9.56\n", "70028 116.680000 1.0 9.11\n", "26860 103.294843 2.0 9.71\n", "12807 91.480000 0.0 9.77\n", " \n", " ```" ] }, { "cell_type": "code", "execution_count": 16, "id": "a13cfbab", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricecategorymargin
itemid
4055090.042.011.10
51615108.372.011.25
79037NaN2.011.91
9572888.691.08.89
1429992.421.09.65
8657696.302.011.09
59888115.162.09.48
38770113.170.09.63
3430993.580.09.75
73805NaN1.08.49
\n", "
" ], "text/plain": [ " price category margin\n", "itemid \n", "40550 90.04 2.0 11.10\n", "51615 108.37 2.0 11.25\n", "79037 NaN 2.0 11.91\n", "95728 88.69 1.0 8.89\n", "14299 92.42 1.0 9.65\n", "86576 96.30 2.0 11.09\n", "59888 115.16 2.0 9.48\n", "38770 113.17 0.0 9.63\n", "34309 93.58 0.0 9.75\n", "73805 NaN 1.0 8.49" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_miss = create_df(missing=True)\n", "\n", "df_miss" ] }, { "cell_type": "code", "execution_count": null, "id": "9e5d7520", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "21e41ce0", "metadata": {}, "source": [ "# Información de contacto\n", "\n", "Profesor: Jose Alberto Arango Sánchez
[](https://www.linkedin.com/in/jose-alberto-arango-sanchez-79a337142/)\n", "\n", " \n", "\n", "@jose.arangos
[](https://github.com/josearangos)" ] }, { "cell_type": "markdown", "id": "747ca244", "metadata": {}, "source": [ "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }