3. Benchmarks information
This section contains information on how to create a new benchmark during the call for benchmark period.
Benchmark proposal template
The benchmark proposal can be a submitted as a PFF file, a markdown file, a LaTeX source file, or a shared Google Document. It is recommended to contain:
# Benchmark Proposal: Title of the benchmark
The title should be concise and descriptive.
## Contributors
List of contributors name(s), affiliation(s), and contact email(s) of the proposer(s). Optionally ORCID, GitHub handle, or project link
## Tags
Official `FireBench` tags (list available at the end of this page)
It is recommended to add one *Metric type* tag, at least one *Model context* tag, and at least one *Application context* tag.
These tags are important for referecing the proposed benchmark.
Optional free tags or keywords are welcome.
## Short description
A 1-2 sentence overview of the benchmark goal, scope, and what is being tested.
## Detailed description
It should contain:
- Scientific background and motiviation.
- Description of the modeled process or scenario.
- Relevance of the benchmark to real-world application or theoretical exploration.
- Diagrams/schematics of the benchmark are welcome.
## Data description
- Input data:
- Description of required input dataset (terrain, fuels, weather, etc.).
- Indicate availability (Open source, proprietary with access restriction, not yet available). Open source is prefered.
- Indicate if data is provided with the benchmark, if it can be access upon request for running the benchmark (under which conditions), and and if the data can be integrated within `FireBench` directly.
- Expected output data:
- Defined expected output fields and format
- Ground truth availability (if applicable). Indicate if this data is provided with the benchmark or available upon request (under which conditions), and and if the data can be integrated within `FireBench` directly.
## Initial conditions and configuration
- Detailed description of the initial setup.
- Simulation parameters or constant
- Timeline or duration of the benchmark
- Mesh properties
## Metrics definition
- Definition of primary metrics (RMSE, bias, runtime, etc.) and derived metrics (burned area agreement, time to ignition, statistical comparison of plumes, etc.)
- Usage of existing `FireBench` post processing tools (or need for tools)
- Units and interpretation.
## Publication status
- Is this benchmark:
- linked to a publication (in review, published, preprint)?
- embargoed until a specific date?
- Citation to use (if applicable)
## Licensing and Use Terms
- License for any data or code provided
- Attribution and reuse policy
## Additional notes
## Optional: Benchmark difficulty
Optional indicator for difficulty to run this benchmark:
- low: fast/approximate, educational or conceptual
- medium: realistic inputs, moderate compute
- high: high fidelity, coupled models, research grade
Run a benchmark and submit your results
This guide explains how to run an existing benchmark and submit your results to the FireBench community. The list of existing benchmarks is shared here.
# FireBench Benchmark Execution Guidelines
## 1. Before you start
### Select a benchmark
- Visit the submitted benchmarks registry
- Choose a benchmark that:
- Matches the capabilities of your model(s)
- Has clear input data and metric definition
### Review the Benchmark Page
Read the benchmark:
- Description and objectives
- Input/output requirements
- Metrics definition
- Tags
- Evaluation procedure
- Data availability/licensing
## 2. Prepare your Evaluation
### Model setup
Clearly document:
- Your model(s) name(s), version (commit if available to share), and configuration
- Any custom parameters, or simplification
- Whether it is operational, experimental, ML-based, etc.
### Input data processing
- Follow the benchmark's instruction precisely
- Document any necessary pre-processing (*e.g.* interpolation for resolution adjustment)
- Avoid any optional processing
- Confirm compatibility of coordinate system, units, formats
### Output Requirements
- Ensure your outputs match the required fields (document any special processing needed to obtain the requested outputs).
- If uncertain, contact the benchmark proposer or the FireBench team
## 3. Run the Benchmark
- Run the simulation(s) as defined in the benchmark scenario.
- Ensure reproducibility:
- Fix seeds if stochastic components are used
- Log software and hardward environment (*e.g.* CPU/GPU, OS)
- Prefer containerization (*e.g.* Docker) is available
## 4. Report Your Results
Prepare a **benchmark evaluation report** using the following structure
1. Title and benchmark ID
Match the benchmark registry title.
2. Contributors
Name(s), affiliation(s), contact, ORCID
3. Model description
Type, versions, capabilities, known limitations, etc.
4. Run configuration
Inputs used, any modifications to benchmark setup, runtime
5. Results
Raw and unprocessed outputs, visuals (plot, contours, cross sections, etc.), and computed metrics (according to the benchmark definition).
6. Interpretation
Comment on model performance, strengths/weaknesses, unexpected behavior
7. Reproducibility
Link to code or container, software version, OS, runtime environment.
Acceptable formats: PDF, markdown, or reStructuredText
## 5. Submit Your Report
Send your completed report alongside any important data voa one of the following:
- Email: aurelien.costes@sjsu.edu