2 Solución reto día del hackathon.
En este apartado está descrito cómo nuestro equipo solucionó este reto durante el Hackathon. Se han añadido comentarios para dar más claridad al desarrollo. Aparte de estas notas no hemos añadido nada sustancial al análisis que no fuese incluido durante la competición.
La solución ampliada al reto, que contendrá lo que nos hubiese gustado hacer con más tiempo, aún no está desarrollada. Si quieres ver cómo lo vamos haciendo sólo tienes que seguir este repositorio de GitHub - Hackathon_Talento_Reto_2_Wind_Farm.
2.1 Carga de las librerias necesarias.
library(tidyverse)
library(readxl)
library(VIM)
library(GGally)
library(lubridate)
library(stringr)
2.2 Preparación datos “train”.
Construimos tres listas con los archivos de training, cada una con un tipo de archivo: ‘logs’, ‘metmast’ y ‘signals’.
logs_files <- list.files(path = "data/Training", pattern = "wind-farm-1-logs*")
metmast_files <- list.files(path = "data/Training", pattern = "wind-farm-1-metmast*")
signals_files <- list.files(path = "data/Training", pattern = "wind-farm-1-signals*")
Utilizamos las tres listas creadas para importar todos los datasets de training y unirlos en tres: ‘signals_data’, ‘metmast_data’ y ‘logs_data’.
signals_data <- signals_files %>%
map(function(x) {
read_xlsx(paste0("./data/Training/", x))
}) %>%
reduce(rbind)
metmast_data <- metmast_files %>%
map(function(x) {
read_xlsx(paste0("./data/Training/", x))
}) %>%
reduce(rbind)
logs_data <- logs_files %>%
map(function(x) {
read_xlsx(paste0("./data/Training/", x))
}) %>%
reduce(rbind)
2.2.1 Exploración de las tablas.
Inspeccionamos rápidamente el contenido de las tablas, realizamos algún ajuste de formato y selección de variables.
2.2.1.1 Logs data
Echamos un vistazo a la tabla logs_data obteniendo los principales estadísticos de cada variable.
summary(logs_data)
## TimeDetected TimeReset UnitTitle
## Length:122090 Length:122090 Length:122090
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## Remark UnitTitleDestination
## Length:122090 Mode:logical
## Class :character NA's:122090
## Mode :character
head(logs_data)
## # A tibble: 6 x 5
## TimeDetected TimeReset UnitTitle Remark UnitTitleDestina~
## <chr> <chr> <chr> <chr> <lgl>
## 1 2016-04-01T00:01:~ <NA> T01 Yaw Speed Exc: ~ NA
## 2 2016-04-01T00:12:~ <NA> T01 External power ~ NA
## 3 2016-04-01T00:21:~ <NA> T11 External power ~ NA
## 4 2016-04-01T00:22:~ <NA> T06 External power ~ NA
## 5 2016-04-01T00:23:~ <NA> T07 External power ~ NA
## 6 2016-04-01T00:33:~ <NA> T11 Gen. int. vent.~ NA
En principio de esta tabla solo vamos a necesitar la fecha y la identificacion de la turbina. Por lo que creamos una nueva tabla con únicamente estos dos campos.
logs_data_1 <- logs_data %>%
select(TimeDetected,
UnitTitle)
logs_data_1$TimeDetected <- ymd_hms(logs_data_1$TimeDetected)
logs_data_1$UnitTitle <- as.factor(logs_data_1$UnitTitle)
Solo hay logs en 4 unidades. Hay 4 NAs en las fechas
summary(logs_data_1)
## TimeDetected UnitTitle
## Min. :2016-01-01 00:02:18 T01:36891
## 1st Qu.:2016-04-24 07:21:40 T06:24721
## Median :2016-07-23 11:17:18 T07:30939
## Mean :2016-07-09 03:21:27 T11:29539
## 3rd Qu.:2016-09-18 12:21:16
## Max. :2016-12-31 23:57:33
## NA's :4
2.2.1.2 Metmast data
Echamos un vistazo a la tabla metmast_data. Hay varios campos que parecen no tener informacion valida.
summary(metmast_data)
## Timestamp Min_Windspeed1 Max_Windspeed1 Avg_Windspeed1
## Length:52697 Min. : 0.00 Min. : 0.000 Min. : 0.00
## Class :character 1st Qu.: 2.30 1st Qu.: 4.600 1st Qu.: 3.50
## Mode :character Median : 3.80 Median : 7.400 Median : 5.60
## Mean : 4.35 Mean : 8.259 Mean : 6.31
## 3rd Qu.: 6.00 3rd Qu.:11.000 3rd Qu.: 8.50
## Max. :16.40 Max. :26.300 Max. :21.70
##
## Var_Windspeed1 Min_Windspeed2 Max_Windspeed2 Avg_Windspeed2
## Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.1400 1st Qu.: 2.200 1st Qu.: 4.600 1st Qu.: 3.400
## Median : 0.4400 Median : 3.800 Median : 7.400 Median : 5.600
## Mean : 0.7948 Mean : 4.339 Mean : 8.252 Mean : 6.301
## 3rd Qu.: 1.0400 3rd Qu.: 6.000 3rd Qu.:11.000 3rd Qu.: 8.500
## Max. :35.1000 Max. :16.300 Max. :26.700 Max. :21.800
##
## Var_Windspeed2 Min_Winddirection2 Max_Winddirection2
## Min. : 0.0000 Min. : 0 Min. :236
## 1st Qu.: 0.1400 1st Qu.:236 1st Qu.:236
## Median : 0.4400 Median :236 Median :236
## Mean : 0.7984 Mean :236 Mean :236
## 3rd Qu.: 1.0400 3rd Qu.:236 3rd Qu.:236
## Max. :35.9300 Max. :236 Max. :236
##
## Avg_Winddirection2 Var_Winddirection2 Min_AmbientTemp Max_AmbientTemp
## Min. :213 Min. : 0.000 Min. :-40.00 Min. : 6.00
## 1st Qu.:236 1st Qu.: 0.000 1st Qu.: 14.00 1st Qu.:15.00
## Median :236 Median : 0.000 Median : 17.00 Median :18.00
## Mean :236 Mean : 0.137 Mean : 18.32 Mean :19.16
## 3rd Qu.:236 3rd Qu.: 0.000 3rd Qu.: 22.00 3rd Qu.:23.00
## Max. :236 Max. :4683.000 Max. : 39.00 Max. :40.00
##
## Avg_AmbientTemp Min_Pressure Max_Pressure Avg_Pressure
## Min. : 5.00 Min. : 991 Min. : 992 Min. : 992
## 1st Qu.:15.00 1st Qu.:1007 1st Qu.:1007 1st Qu.:1007
## Median :18.00 Median :1009 Median :1009 Median :1009
## Mean :18.71 Mean :1010 Mean :1010 Mean :1010
## 3rd Qu.:23.00 3rd Qu.:1013 3rd Qu.:1013 3rd Qu.:1013
## Max. :40.00 Max. :1028 Max. :1028 Max. :1028
##
## Min_Humidity Max_Humidity Avg_Humidity Min_Precipitation
## Min. : 13.00 Min. : 14.00 Min. : 13.0 Min. : 0.00000
## 1st Qu.: 57.00 1st Qu.: 60.00 1st Qu.: 58.0 1st Qu.: 0.00000
## Median : 70.00 Median : 72.00 Median : 71.0 Median : 0.00000
## Mean : 68.21 Mean : 70.68 Mean : 69.4 Mean : 0.04547
## 3rd Qu.: 80.00 3rd Qu.: 83.00 3rd Qu.: 81.0 3rd Qu.: 0.00000
## Max. :100.00 Max. :100.00 Max. :100.0 Max. :55.00000
## NA's :1 NA's :1
## Max_Precipitation Avg_Precipitation Min_Raindetection Max_Raindetection
## Min. : 0.0000 Min. : 0.0000 Min. :0 Min. :0.0e+00
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.:0 1st Qu.:0.0e+00
## Median : 0.0000 Median : 0.0000 Median :0 Median :0.0e+00
## Mean : 0.1291 Mean : 0.0864 Mean :0 Mean :1.9e-05
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.:0 3rd Qu.:0.0e+00
## Max. :79.0000 Max. :74.0000 Max. :0 Max. :1.0e+00
##
## Avg_Raindetection Anemometer1_Freq Anemometer1_Offset
## Min. :0 Min. :0.0499 Min. :0.24
## 1st Qu.:0 1st Qu.:0.0499 1st Qu.:0.24
## Median :0 Median :0.0499 Median :0.24
## Mean :0 Mean :0.0499 Mean :0.24
## 3rd Qu.:0 3rd Qu.:0.0499 3rd Qu.:0.24
## Max. :0 Max. :0.0499 Max. :0.24
##
## Anemometer1_CorrGain Anemometer1_CorrOffset Anemometer2_Freq
## Min. :1 Min. :0 Min. :0.0499
## 1st Qu.:1 1st Qu.:0 1st Qu.:0.0499
## Median :1 Median :0 Median :0.0499
## Mean :1 Mean :0 Mean :0.0499
## 3rd Qu.:1 3rd Qu.:0 3rd Qu.:0.0499
## Max. :1 Max. :0 Max. :0.0499
##
## Anemometer2_Offset Anemometer2_CorrGain Anemometer2_CorrOffset
## Min. :0.24 Min. :1 Min. :0
## 1st Qu.:0.24 1st Qu.:1 1st Qu.:0
## Median :0.24 Median :1 Median :0
## Mean :0.24 Mean :1 Mean :0
## 3rd Qu.:0.24 3rd Qu.:1 3rd Qu.:0
## Max. :0.24 Max. :1 Max. :0
##
## DistanceAirPress AirRessureSensorZeroOffset Anemometer1_Avg_Freq
## Min. :0 Min. :600 Min. : 0
## 1st Qu.:0 1st Qu.:600 1st Qu.: 66
## Median :0 Median :600 Median :108
## Mean :0 Mean :600 Mean :123
## 3rd Qu.:0 3rd Qu.:600 3rd Qu.:166
## Max. :0 Max. :600 Max. :431
##
## Anemometer2_Avg_Freq Pressure_Avg_Freq
## Min. : 0.0 Min. :392.0
## 1st Qu.: 65.0 1st Qu.:407.0
## Median :108.0 Median :409.0
## Mean :122.9 Mean :410.1
## 3rd Qu.:166.0 3rd Qu.:413.0
## Max. :434.0 Max. :428.0
##
Nos quedamos solo con los campos que parecen tener informacion valida
metmast_data_1 <- metmast_data %>%
select(-Anemometer1_CorrOffset,
-Anemometer2_Freq,
-Anemometer2_Offset,
-Anemometer2_CorrGain,
-Anemometer2_CorrOffset,
-DistanceAirPress,
-AirRessureSensorZeroOffset,
-Min_Precipitation,
-Max_Precipitation,
-Avg_Precipitation,
-Min_Raindetection,
-Avg_Raindetection,
-Max_Raindetection,
-Min_Winddirection2,
-Max_Winddirection2,
-Avg_Winddirection2,
-Var_Winddirection2,
-Anemometer1_CorrGain,
)
# Cambiamos el formato de la variable Timestamp.
metmast_data_1$Timestamp <- ymd_hms(metmast_data_1$Timestamp)
Vemos de nuevo los principales estadísticos de cada variable.
summary(metmast_data_1)
## Timestamp Min_Windspeed1 Max_Windspeed1
## Min. :2016-01-01 00:00:00 Min. : 0.00 Min. : 0.000
## 1st Qu.:2016-04-01 12:40:00 1st Qu.: 2.30 1st Qu.: 4.600
## Median :2016-07-02 00:20:00 Median : 3.80 Median : 7.400
## Mean :2016-07-02 00:07:10 Mean : 4.35 Mean : 8.259
## 3rd Qu.:2016-10-01 12:00:00 3rd Qu.: 6.00 3rd Qu.:11.000
## Max. :2016-12-31 23:50:00 Max. :16.40 Max. :26.300
##
## Avg_Windspeed1 Var_Windspeed1 Min_Windspeed2 Max_Windspeed2
## Min. : 0.00 Min. : 0.0000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 3.50 1st Qu.: 0.1400 1st Qu.: 2.200 1st Qu.: 4.600
## Median : 5.60 Median : 0.4400 Median : 3.800 Median : 7.400
## Mean : 6.31 Mean : 0.7948 Mean : 4.339 Mean : 8.252
## 3rd Qu.: 8.50 3rd Qu.: 1.0400 3rd Qu.: 6.000 3rd Qu.:11.000
## Max. :21.70 Max. :35.1000 Max. :16.300 Max. :26.700
##
## Avg_Windspeed2 Var_Windspeed2 Min_AmbientTemp Max_AmbientTemp
## Min. : 0.000 Min. : 0.0000 Min. :-40.00 Min. : 6.00
## 1st Qu.: 3.400 1st Qu.: 0.1400 1st Qu.: 14.00 1st Qu.:15.00
## Median : 5.600 Median : 0.4400 Median : 17.00 Median :18.00
## Mean : 6.301 Mean : 0.7984 Mean : 18.32 Mean :19.16
## 3rd Qu.: 8.500 3rd Qu.: 1.0400 3rd Qu.: 22.00 3rd Qu.:23.00
## Max. :21.800 Max. :35.9300 Max. : 39.00 Max. :40.00
##
## Avg_AmbientTemp Min_Pressure Max_Pressure Avg_Pressure
## Min. : 5.00 Min. : 991 Min. : 992 Min. : 992
## 1st Qu.:15.00 1st Qu.:1007 1st Qu.:1007 1st Qu.:1007
## Median :18.00 Median :1009 Median :1009 Median :1009
## Mean :18.71 Mean :1010 Mean :1010 Mean :1010
## 3rd Qu.:23.00 3rd Qu.:1013 3rd Qu.:1013 3rd Qu.:1013
## Max. :40.00 Max. :1028 Max. :1028 Max. :1028
##
## Min_Humidity Max_Humidity Avg_Humidity Anemometer1_Freq
## Min. : 13.00 Min. : 14.00 Min. : 13.0 Min. :0.0499
## 1st Qu.: 57.00 1st Qu.: 60.00 1st Qu.: 58.0 1st Qu.:0.0499
## Median : 70.00 Median : 72.00 Median : 71.0 Median :0.0499
## Mean : 68.21 Mean : 70.68 Mean : 69.4 Mean :0.0499
## 3rd Qu.: 80.00 3rd Qu.: 83.00 3rd Qu.: 81.0 3rd Qu.:0.0499
## Max. :100.00 Max. :100.00 Max. :100.0 Max. :0.0499
## NA's :1 NA's :1
## Anemometer1_Offset Anemometer1_Avg_Freq Anemometer2_Avg_Freq
## Min. :0.24 Min. : 0 Min. : 0.0
## 1st Qu.:0.24 1st Qu.: 66 1st Qu.: 65.0
## Median :0.24 Median :108 Median :108.0
## Mean :0.24 Mean :123 Mean :122.9
## 3rd Qu.:0.24 3rd Qu.:166 3rd Qu.:166.0
## Max. :0.24 Max. :431 Max. :434.0
##
## Pressure_Avg_Freq
## Min. :392.0
## 1st Qu.:407.0
## Median :409.0
## Mean :410.1
## 3rd Qu.:413.0
## Max. :428.0
##
2.2.1.3 Signals data
Inspeccionamos la tabla signals_data. Hay varios campos que parecen no tener información valida.
summary(signals_data)
## Turbine_ID Timestamp Gen_RPM_Max Gen_RPM_Min
## Length:207905 Length:207905 Min. : 0.0 Min. : 0.0
## Class :character Class :character 1st Qu.: 296.8 1st Qu.: 183.2
## Mode :character Mode :character Median :1405.2 Median :1231.1
## Mean :1166.5 Mean : 915.3
## 3rd Qu.:1676.6 3rd Qu.:1342.2
## Max. :2040.9 Max. :1666.1
## Gen_RPM_Avg Gen_RPM_Std Gen_Bear_Temp_Avg Gen_Phase1_Temp_Avg
## Min. : 0.0 Min. : 0.00 Min. : 18.00 Min. : 21.00
## 1st Qu.: 255.4 1st Qu.: 13.70 1st Qu.: 34.00 1st Qu.: 41.00
## Median :1271.9 Median : 28.10 Median : 43.00 Median : 59.00
## Mean :1043.7 Mean : 67.17 Mean : 45.39 Mean : 62.71
## 3rd Qu.:1571.0 3rd Qu.: 69.80 3rd Qu.: 54.00 3rd Qu.: 74.00
## Max. :1683.7 Max. :823.10 Max. :205.00 Max. :205.00
## Gen_Phase2_Temp_Avg Gen_Phase3_Temp_Avg Hyd_Oil_Temp_Avg
## Min. : 21.00 Min. : 21.00 Min. :22.00
## 1st Qu.: 41.00 1st Qu.: 41.00 1st Qu.:29.00
## Median : 60.00 Median : 59.00 Median :33.00
## Mean : 62.95 Mean : 62.07 Mean :35.19
## 3rd Qu.: 75.00 3rd Qu.: 74.00 3rd Qu.:41.00
## Max. :205.00 Max. :205.00 Max. :55.00
## Gear_Oil_Temp_Avg Gear_Bear_Temp_Avg Nac_Temp_Avg Rtr_RPM_Max
## Min. :25.00 Min. :23.00 Min. :17.00 Min. : 0.00
## 1st Qu.:43.00 1st Qu.:44.00 1st Qu.:25.00 1st Qu.: 2.60
## Median :48.00 Median :53.00 Median :29.00 Median :12.40
## Mean :47.58 Mean :51.37 Mean :29.43 Mean :10.25
## 3rd Qu.:53.00 3rd Qu.:60.00 3rd Qu.:33.00 3rd Qu.:14.80
## Max. :61.00 Max. :71.00 Max. :50.00 Max. :16.80
## Rtr_RPM_Min Rtr_RPM_Avg Amb_WindSpeed_Max Amb_WindSpeed_Min
## Min. : 0.000 Min. : 0.000 Min. : 0.40 Min. : 0.400
## 1st Qu.: 1.700 1st Qu.: 2.300 1st Qu.: 6.80 1st Qu.: 0.700
## Median :10.900 Median :11.300 Median :10.80 Median : 1.500
## Mean : 8.014 Mean : 9.166 Mean :12.61 Mean : 1.627
## 3rd Qu.:11.900 3rd Qu.:13.900 3rd Qu.:17.60 3rd Qu.: 2.200
## Max. :14.800 Max. :14.900 Max. :70.00 Max. :14.600
## Amb_WindSpeed_Avg Amb_WindSpeed_Std Amb_WindDir_Relative_Avg
## Min. : 0.400 Min. : 0.000 Min. :-180.0000
## 1st Qu.: 3.100 1st Qu.: 0.600 1st Qu.: -10.5000
## Median : 5.200 Median : 1.000 Median : 0.2000
## Mean : 5.941 Mean : 1.166 Mean : 0.3694
## 3rd Qu.: 8.100 3rd Qu.: 1.500 3rd Qu.: 10.0000
## Max. :22.500 Max. :55.100 Max. : 180.0000
## Amb_WindDir_Abs_Avg Amb_Temp_Avg Prod_LatestAvg_ActPwrGen0
## Min. : 0.0 Min. : 6.00 Min. :-5027.0
## 1st Qu.:100.8 1st Qu.:15.00 1st Qu.: -731.0
## Median :163.5 Median :19.00 Median : 0.0
## Mean :183.2 Mean :19.62 Mean : -397.2
## 3rd Qu.:272.5 3rd Qu.:24.00 3rd Qu.: 0.0
## Max. :359.0 Max. :41.00 Max. : 796.0
## Prod_LatestAvg_ActPwrGen1 Prod_LatestAvg_ActPwrGen2
## Min. : -736 Min. :0
## 1st Qu.: 0 1st Qu.:0
## Median : 34073 Median :0
## Mean : 86771 Mean :0
## 3rd Qu.:141527 3rd Qu.:0
## Max. :334398 Max. :0
## Prod_LatestAvg_TotActPwr Prod_LatestAvg_ReactPwrGen0
## Min. : -5027 Min. :-5046.0
## 1st Qu.: -675 1st Qu.: -246.0
## Median : 34052 Median : 0.0
## Mean : 86374 Mean : -289.7
## 3rd Qu.:141526 3rd Qu.: 0.0
## Max. :334398 Max. : 500.0
## Prod_LatestAvg_ReactPwrGen1 Prod_LatestAvg_ReactPwrGen2
## Min. :-77957 Min. :0
## 1st Qu.:-26894 1st Qu.:0
## Median :-15572 Median :0
## Mean :-15112 Mean :0
## 3rd Qu.: 0 3rd Qu.:0
## Max. :166806 Max. :0
## Prod_LatestAvg_TotReactPwr HVTrafo_Phase1_Temp_Avg
## Min. :-77957 Min. : 28.00
## 1st Qu.:-26896 1st Qu.: 50.00
## Median :-15582 Median : 62.00
## Mean :-15402 Mean : 61.65
## 3rd Qu.: -1229 3rd Qu.: 71.00
## Max. :166806 Max. :104.00
## HVTrafo_Phase2_Temp_Avg HVTrafo_Phase3_Temp_Avg
## Min. : 28.00 Min. : 27.00
## 1st Qu.: 54.00 1st Qu.: 50.00
## Median : 69.00 Median : 67.00
## Mean : 68.65 Mean : 67.27
## 3rd Qu.: 79.00 3rd Qu.: 80.00
## Max. :119.00 Max. :137.00
## Grd_InverterPhase1_Temp_Avg Cont_Top_Temp_Avg Cont_Hub_Temp_Avg
## Min. :29.00 Min. :23.00 Min. :14.0
## 1st Qu.:35.00 1st Qu.:35.00 1st Qu.:25.0
## Median :39.00 Median :39.00 Median :29.0
## Mean :38.11 Mean :39.52 Mean :29.4
## 3rd Qu.:40.00 3rd Qu.:43.00 3rd Qu.:33.0
## Max. :58.00 Max. :58.00 Max. :49.0
## Cont_VCP_Temp_Avg Gen_SlipRing_Temp_Avg Spin_Temp_Avg
## Min. :26.00 Min. : 12.00 Min. : 8.00
## 1st Qu.:36.00 1st Qu.: 23.00 1st Qu.:18.00
## Median :41.00 Median : 28.00 Median :22.00
## Mean :40.05 Mean : 29.52 Mean :22.47
## 3rd Qu.:44.00 3rd Qu.: 35.00 3rd Qu.:26.00
## Max. :66.00 Max. :205.00 Max. :44.00
## Blds_PitchAngle_Min Blds_PitchAngle_Max Blds_PitchAngle_Avg
## Min. :-4.200 Min. :-2.20 Min. :-2.300
## 1st Qu.:-2.400 1st Qu.: 0.20 1st Qu.:-1.700
## Median :-2.200 Median : 4.50 Median :-0.100
## Mean : 6.391 Mean :13.62 Mean : 9.009
## 3rd Qu.:20.800 3rd Qu.:24.00 3rd Qu.:24.000
## Max. :90.000 Max. :90.00 Max. :90.000
## Blds_PitchAngle_Std Cont_VCP_ChokcoilTemp_Avg Grd_RtrInvPhase1_Temp_Avg
## Min. : 0.000 Min. : 20.00 Min. :29.0
## 1st Qu.: 0.100 1st Qu.: 53.00 1st Qu.:35.0
## Median : 0.400 Median : 90.00 Median :39.0
## Mean : 1.697 Mean : 79.67 Mean :38.4
## 3rd Qu.: 1.000 3rd Qu.:101.00 3rd Qu.:41.0
## Max. :45.000 Max. :149.00 Max. :59.0
## Grd_RtrInvPhase2_Temp_Avg Grd_RtrInvPhase3_Temp_Avg Cont_VCP_WtrTemp_Avg
## Min. :29.00 Min. :29.00 Min. :22.00
## 1st Qu.:35.00 1st Qu.:35.00 1st Qu.:33.00
## Median :39.00 Median :38.00 Median :38.00
## Mean :38.53 Mean :38.46 Mean :36.71
## 3rd Qu.:41.00 3rd Qu.:41.00 3rd Qu.:39.00
## Max. :60.00 Max. :59.00 Max. :54.00
## Grd_Prod_Pwr_Avg Grd_Prod_CosPhi_Avg Grd_Prod_Freq_Avg
## Min. : -30.1 Min. :0.2000 Min. : 0.0
## 1st Qu.: -4.1 1st Qu.:0.8000 1st Qu.:50.0
## Median : 204.3 Median :0.9000 Median :50.0
## Mean : 518.2 Mean :0.8718 Mean :50.0
## 3rd Qu.: 849.1 3rd Qu.:1.0000 3rd Qu.:50.0
## Max. :2000.5 Max. :1.0000 Max. :50.1
## Grd_Prod_VoltPhse1_Avg Grd_Prod_VoltPhse2_Avg Grd_Prod_VoltPhse3_Avg
## Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.:398.2 1st Qu.:396.4 1st Qu.:395.7
## Median :399.3 Median :397.7 Median :397.2
## Mean :399.3 Mean :397.7 Mean :397.3
## 3rd Qu.:400.4 3rd Qu.:398.9 3rd Qu.:398.8
## Max. :419.7 Max. :418.1 Max. :419.7
## Grd_Prod_CurPhse1_Avg Grd_Prod_CurPhse2_Avg Grd_Prod_CurPhse3_Avg
## Min. : 1.0 Min. : 1.0 Min. : 1.0
## 1st Qu.: 12.6 1st Qu.: 12.0 1st Qu.: 13.7
## Median : 195.8 Median : 219.7 Median : 201.0
## Mean : 447.8 Mean : 463.5 Mean : 448.3
## 3rd Qu.: 712.2 3rd Qu.: 739.2 3rd Qu.: 714.2
## Max. :1699.8 Max. :1712.8 Max. :1688.4
## Grd_Prod_Pwr_Max Grd_Prod_Pwr_Min Grd_Busbar_Temp_Avg Rtr_RPM_Std
## Min. : -27.7 Min. : -90.4 Min. :20.00 Min. :0.0000
## 1st Qu.: -2.5 1st Qu.: -24.8 1st Qu.:33.00 1st Qu.:0.1000
## Median : 402.7 Median : 50.5 Median :38.00 Median :0.3000
## Mean : 736.4 Mean : 294.7 Mean :37.43 Mean :0.6057
## 3rd Qu.:1390.4 3rd Qu.: 392.2 3rd Qu.:42.00 3rd Qu.:0.7000
## Max. :2120.6 Max. :1996.1 Max. :77.00 Max. :7.4000
## Amb_WindSpeed_Est_Avg Grd_Prod_Pwr_Std Grd_Prod_ReactPwr_Avg
## Min. : 0.000 Min. : 0.00 Min. :-467.90
## 1st Qu.: 3.100 1st Qu.: 3.50 1st Qu.:-161.40
## Median : 5.300 Median : 56.50 Median : -93.50
## Mean : 5.932 Mean : 94.46 Mean : -92.41
## 3rd Qu.: 8.100 3rd Qu.: 149.00 3rd Qu.: -7.40
## Max. :22.100 Max. :1009.60 Max. :1000.00
## Grd_Prod_ReactPwr_Max Grd_Prod_ReactPwr_Min Grd_Prod_ReactPwr_Std
## Min. :-449.5 Min. :-1001.3 Min. : 0.00
## 1st Qu.:-147.2 1st Qu.: -202.0 1st Qu.: 2.50
## Median : -74.6 Median : -129.1 Median : 4.10
## Mean : -27.0 Mean : -145.9 Mean : 21.41
## 3rd Qu.: -0.4 3rd Qu.: -26.6 3rd Qu.: 7.90
## Max. :1003.3 Max. : 997.6 Max. :575.70
## Grd_Prod_PsblePwr_Avg Grd_Prod_PsblePwr_Max Grd_Prod_PsblePwr_Min
## Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 19.6 1st Qu.: 75.7 1st Qu.: 0.0
## Median : 230.9 Median : 457.0 Median : 65.3
## Mean : 552.5 Mean : 771.1 Mean : 318.3
## 3rd Qu.: 921.5 3rd Qu.:1520.6 3rd Qu.: 417.6
## Max. :2000.0 Max. :2000.0 Max. :2000.0
## Grd_Prod_PsblePwr_Std Grd_Prod_PsbleInd_Avg Grd_Prod_PsbleInd_Max
## Min. : 0.0 Min. :-1000.0 Min. :-1000.0
## 1st Qu.: 12.7 1st Qu.: -985.8 1st Qu.: -583.3
## Median : 59.1 Median : -627.0 Median : -239.9
## Mean : 96.9 Mean : -534.6 Mean : -368.7
## 3rd Qu.:153.2 3rd Qu.: 0.0 3rd Qu.: 0.0
## Max. :999.5 Max. : 0.0 Max. : 200.0
## Grd_Prod_PsbleInd_Min Grd_Prod_PsbleInd_Std Grd_Prod_PsbleCap_Avg
## Min. :-1000.0 Min. : 0.00 Min. : 0.0
## 1st Qu.:-1000.0 1st Qu.: 0.00 1st Qu.: 0.0
## Median :-1000.0 Median : 16.20 Median : 508.7
## Mean : -669.4 Mean : 74.53 Mean : 504.6
## 3rd Qu.: 0.0 3rd Qu.:140.40 3rd Qu.: 965.1
## Max. : 0.0 Max. :498.90 Max. :1000.0
## Grd_Prod_PsbleCap_Max Grd_Prod_PsbleCap_Min Grd_Prod_PsbleCap_Std
## Min. : 0.0 Min. :-200.0 Min. : 0.00
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.00
## Median :1000.0 Median : 239.9 Median : 29.60
## Mean : 657.8 Mean : 322.7 Mean : 81.33
## 3rd Qu.:1000.0 3rd Qu.: 517.5 3rd Qu.:152.50
## Max. :1000.0 Max. :1000.0 Max. :498.90
## Gen_Bear2_Temp_Avg Nac_Direction_Avg
## Min. : 20.00 Min. : 0.0
## 1st Qu.: 32.00 1st Qu.:102.8
## Median : 40.00 Median :184.8
## Mean : 42.37 Mean :189.9
## 3rd Qu.: 51.00 3rd Qu.:278.8
## Max. :205.00 Max. :359.0
Realizamos un par de cambios de formato en sendas variables.
signals_data$Timestamp <- ymd_hms(signals_data$Timestamp)
signals_data$Turbine_ID <- as.factor(signals_data$Turbine_ID)
Comprobamos la corrección del cambio de formato de Timestamp y vemos a la vez el rango de fechas.
summary(signals_data$Timestamp)
## Min. 1st Qu. Median
## "2016-01-01 00:00:00" "2016-03-31 07:20:00" "2016-06-29 17:30:00"
## Mean 3rd Qu. Max.
## "2016-07-01 11:35:06" "2016-10-01 23:50:00" "2016-12-31 23:50:00"
Hacemos lo mismo con Turbine_ID.
summary(signals_data$Turbine_ID)
## T01 T06 T07 T11
## 52439 50575 52445 52446
Ahora tenemos que juntar las tres tablas. Las tablas metmast y signals no hay problema para el cruce. Lo hacemos por TimeStamp.
metmast_signals_data <- signals_data %>%
left_join(metmast_data_1, by = 'Timestamp')
Exportamos los datasets a csv para su procesado en BIGml. Donde además de crear los modelos uniremos las dos tablas que exportamos a continuación.
write_csv(metmast_signals_data, "final_csvs/Training/metmast_signals_data.csv")
write_csv(logs_data_1, "final_csvs/Training/logs_data_1.csv")
2.3 Preparación datos “test”.
Repetimos el proceso para los datos de test.
test_metmast <- read_xlsx(path = "./Data/Test/wind-farm-1-metmast-testing.xlsx")
test_metmast_data_1 <- test_metmast %>%
select(-Anemometer1_CorrOffset,
-Anemometer2_Freq,
-Anemometer2_Offset,
-Anemometer2_CorrGain,
-Anemometer2_CorrOffset,
-DistanceAirPress,
-AirRessureSensorZeroOffset,
-Min_Precipitation,
-Max_Precipitation,
-Avg_Precipitation,
-Min_Raindetection,
-Avg_Raindetection,
-Max_Raindetection,
-Min_Winddirection2,
-Max_Winddirection2,
-Avg_Winddirection2,
-Var_Winddirection2,
-Anemometer1_CorrGain,
)
test_signals <- read_xlsx(path = "./Data/Test/wind-farm-1-signals-testing.xlsx")
test_metmast_signals_data <- test_signals %>%
left_join(test_metmast_data_1, by = 'Timestamp')
write_csv(test_metmast_signals_data, "./final_csvs/Test/test_metmast_data_1.csv")
2.4 Modelización en BigML.
A continuación describimos lo realizado en BigML.
Aquí se ven las fuente de datos creadas para este reto. “logs_data_1.csv” es la fuente a partir de los errores de las torres. “metmast_signals_data.csv” es la fuente a partir de los datos de las condiciones meteorológicas y los datos de los sensores SCADA (procesadas en R para hacer el join). Y “test_metmast_data_1.csv” es la fuente con los datos de test para hacer las predicciones.
Aquí se ven los datasets que creamos a partir de las dos fuentes. Como en el reto anterior se puede ver que creamos un dataset y sobre él fuimos iterando. Lo primero que hicimos fue normalizar la fecha de los errores que se podian producir en cualquier minuto mientras que las mediciones se realizaban cada 10 minutos. Luego hicimos un join de los datasets por el identificador de la torre y la fecha normalizada con lo que ya teniamos un dataset a partir del que entrenar el modelo.
Detalle del dataset de errores con el campo fecha normalizada.
Detalle de la función Lisp que utilizamos para normalizar los minutos de los logs.
Detalle del dataset final que utilizamos para entrenar el modelo con los campos que eliminamos para que no influyesen en la predicción.
Detalle del dataset final que utilizamos para entrenar el modelo con los campos que eliminamos para que no influyesen en la predicción.
Por tiempo solo nos dio tiempo a entrenar correctamente un arbol con el dataset.
Predicción batch para el modelo creada con el dataset de prueba. Detalle de la predicción.