Datasets

Lorenz System Dataset with Multi Rho and Gaussian Noise

Lorenz system time series, with gaussian noise added at each integration step, with multiple rho parameter values, details as follows:

  • 2 CSV files with similar structure but random initial conditions, can be utilized for Training and Testing Machine Learning approaches.
  • File names: Training_Lorenz_Multi_Rho.csv and Testing_Lorenz_Multi_Rho.csv
  • Generated using RK ODE45 solver in MATLAB.
  • dt: 0.05.
  • time span: [0, 500].
  • parameters: rho in range [5, 225] with a step of 1, sigma 10, and beta 8/3.
  • simulations: 10 per each rho value.
  • initial conditions: random in [0,1] for each individual simulation for the 3 variables (x,y,z).
  • Noise: Gaussian Noise N(0,1) at each time step, different for each variable.
  • File structure:
    • Columns (all numeric): # |0: Simulation | 1: Sub-Sim (same as simulation) | 2: Time | 3: X | 4: Y | 5: Z | 6: RHO |

Lorenz System Time Series Across Multiple Parameter Regimes

The datasets consist of simulated trajectories of the Lorenz system generated using MATLAB.

Data Generation Details

  • Numerical integration performed using the ODE45 solver with multistep integration
  • Initial conditions sampled uniformly at random from the interval [0, 1]
  • System parameter (ρ) range: 5 to 225
  • Simulation time span: 1 to 500
  • Time step: 0.05
  • Number of simulations: 10 independent runs for each ρ value
  • Length per simulation: 19,810 data points

Datasets

  • Training dataset: 891,450 data points
  • Testing dataset: 891,450 data points
  • Training and testing sets are generated independently

Noise Model

At each simulation step, additive Gaussian noise is injected into all state variables.

These datasets are intended for evaluating regime-dependent predictability and long-term forecasting performance of sequence models under parameter variations.

Lorenz System Simulation Dataset

The datasets were generated using MATLAB, i.e. utilizing the ODE45 solver with multi-step integration. The ODE45 solver function implements the Runge-Kutta method with a variable time step for efficient computations. Each simulation starts with random initial conditions in the range [0, 1]. The time span for each simulation ranges from 1 to 100 with a time step of 0.01. Each CSV file contains a 2D array, mostly of the form: sim number, subsim number, t, x, y, z, rho or rho_0 value. The default Lorenz System parameters are sigma = 10, rho = 28 and beta = 8/3. The files where the parameters were changed are named with the parameters values.

OpenFungi: A Machine Learning Dataset for Fungal Image Recognition Tasks

The dataset contains 1249 high quality pictures of macroscopic and microscopic images of 5 different filamentous fungi genera: Aspergillus spp. (further subdivided into Aspergillus section Flavi and Aspergillus section Nigri), Penicillium spp., Rhizopus spp., Alternaria spp. and Fusarium spp. Furthermore, mixed cultures of different filamentous fungi are included in the database. From them, 678 are macroscopic images of fungal cultures and 571 are microscopic images in the lactophenol cotton blue staining.