Summarize Data
Members:
Ryan P
Cesar
Travis
Sebastian Calleja
Christopher Vuong
Amir Love
Nicholas Stout
Melanie Grudinschi
Evan
Data Format:
Given as numpy arrays in respective folders (Training & Test) in ZIP file
Training Data (1.18 GB) [93,028 Performance Records] [214 Daily Records across a 30km Grid]
inputs_weather_train.npy: Crop Statistics
Average Direct Normal Irradiance (ADNI)
Average Precipitation (AP)
Average Relative Humidity (ARH)
Maximum Direct Normal Irradiance (MDNI)
Maximum Surface Temperature (MaxSur)
Minimum Surface Temperature (MinSur)
Average Surface Temperature (AvgSur)
inputs_others_train.npy: Crop Information (Maturity, Genotype ID, State, Year, Location)
yield_train.npy: Crop Performance (Yield value)
Test Data (130 MB) [10,337 Performance Records] [214 Daily Records across a 30km Grid]
inputs_weather_test.npy: Crop Statistics (See above)
inputs_others_test.npy: Crop Information (See above)
Genotype Cluster Data (Optional)
clusterID_genotype.npy: 5839 genotype IDs
5839 x 5839 correlation matrix based on K-means algorithm (See publication)
Plan of Action:
Open and format the numpy arrays into presentable dataframes in Jupyter Notebook
Example:
Good idea to visualize the data
Plot out yield differences between State, Year, Genotypes, Maturity Group
Plot weather data over the 13 years