The Weather Prediction Dataset contains meteorological data collected from 18 different European cities or places, including Basel (Switzerland), Budapest (Hungary), Dresden, Düsseldorf, Kassel, München (Germany), De Bilt and Maastricht (the Netherlands), Heathrow (UK), Ljubljana (Slovenia), Malmo and Stockholm (Sweden), Montélimar, Perpignan and Tours (France), Oslo (Norway), Roma (Italy), and Sonnblick (Austria).
The dataset includes daily observations from the years 2000 to 2010, resulting in 3654 daily observations. The data consists of various variables such as mean temperature, maximum temperature, minimum temperature, cloud cover, wind speed, wind gust, humidity, sea level pressure, global radiation, precipitation, and sunshine. The data has undergone basic cleaning, removing columns with more than 5% invalid entries and replacing invalid entries in remaining columns with mean values. Additionally, as part of the preprocessing, all attributes have been scaled to achieve similar ranges for the present values. Please note that the dataset has been preprocessed to ensure data quality. Some units have been transformed for consistency.
The dataset comprises 165 variables over the course of 3654 days and has been transformed to achieve similar ranges for the present values. The units of temperature are given in degrees Celsius, wind speed and gust in m/s, humidity as a fraction of 100%, sea level pressure in 1000 hPa, global radiation in 100 W/m², precipitation amounts in centimeters, and sunshine in hours.
Algorithm | No.of Clusters | Daives Bouldin Score | Silhoutte Score |
---|---|---|---|
KMeans Clustering | 2 | 0.937 | 0.414 |
Affinity Propagation Clustering | 2 | 1.502 | 0.171 |
Mean Shift Clustering | 3 | 0.955 | -0.002 |
Agglomerative Clustering | 2 | 1.021 | 0.354 |
Spectral Clustering | 2 | 0.939 | 0.412 |
OPTICS Clustering | 2 | 1.300 | -0.255 |
Guassian Clustering | 2 | 0.997 | 0.381 |
BIRCH Clustering | 2 | 0.970 | 0.378 |
Ensembled Clustering | 3 | 0.683 | 0.277 |