Summarize Data

Summarize Data

Members:

  • Ryan P

  • Cesar

  • Travis

  • Sebastian Calleja

  • Christopher Vuong

  • Amir Love

  • Nicholas Stout

  • Melanie Grudinschi

  • Evan

 

Data Format:

  • Given as numpy arrays in respective folders (Training & Test) in ZIP file

  • Training Data (1.18 GB) [93,028 Performance Records] [214 Daily Records across a 30km Grid]

    • inputs_weather_train.npy: Crop Statistics

      • Average Direct Normal Irradiance (ADNI)

      • Average Precipitation (AP)

      • Average Relative Humidity (ARH)

      • Maximum Direct Normal Irradiance (MDNI)

      • Maximum Surface Temperature (MaxSur)

      • Minimum Surface Temperature (MinSur)

      • Average Surface Temperature (AvgSur)

    • inputs_others_train.npy: Crop Information (Maturity, Genotype ID, State, Year, Location)

    • yield_train.npy: Crop Performance (Yield value)

  • Test Data (130 MB) [10,337 Performance Records] [214 Daily Records across a 30km Grid]

    • inputs_weather_test.npy: Crop Statistics (See above)

    • inputs_others_test.npy: Crop Information (See above)

  • Genotype Cluster Data (Optional)

    • clusterID_genotype.npy: 5839 genotype IDs

      • 5839 x 5839 correlation matrix based on K-means algorithm (See publication)

 

Plan of Action:

  • Open and format the numpy arrays into presentable dataframes in Jupyter Notebook

    • Example:

  • Good idea to visualize the data

    • Plot out yield differences between State, Year, Genotypes, Maturity Group

    • Plot weather data over the 13 years