2021 Caldor Fire

Version: 2026.0
Case ID: FB001
FireBench IO std version: >= 1.0
Date of last update: 01/14/2025

Contributors

Aurélien Costes, Wildfire Interdisciplinary Research Center, San Jose State University, aurelien.costes@sjsu.edu, ORCID
Angel Farguell Caus, Wildfire Interdisciplinary Research Center, San Jose State University, angel.farguellcaus@sjsu.edu, ORCID
Adam Kochanski, Wildfire Interdisciplinary Research Center, San Jose State University, adam.kochanski@sjsu.edu, ORCID

Description

This collection of benchmarks uses the public resources about the 2021 Caldor Fire. It contains over 300 benchmarks on various datasets. It contains observation datasets for:

Building damaged (CALFIRE)
Burn severity (MTBS)
Burn severity (RAVG)
Canopy bottom height (LANDFIRE)
Canopy bulk density (LANDFIRE)
Canopy cover loss (RAVG)
Canopy height (LANDFIRE)
Infrared fire perimeters (NIROPS)
Live basal area change (RAVG)
Weather stations (Synoptic)

Buildings damage

Dataset

The data has been collected using CAL FIRE Damage Inspection (DINS) Data (version of 2025/11/05). The original CSV file containing multiple fires has been processed to extract only the buildings damaged by the Caldor Fire. The dataset includes the positions (lat, lon) of buildings within the area of influence of the fire. The state of buildings is one of the following:

‘No Damage’,
‘Affected (1-9%)’,
‘Minor (10-25%)’,
‘Major (26-50%)’,
‘Destroyed (>50%)’,
‘Inaccessible’.

The sha256 of the source file is: 0190a5a51aafafa20270fe046a7ae17a53697b1fb218ff8096a3d8ebbc9ef983.

If the evaluated model does not explicitly represent individual buildings, it should treat all buildings within a cell as sharing the cell value for building damage (deterministic models) or the median of the building damage distribution (probabilistic models).

Figure 1 shows the spatial distribution of building damage for the Caldor Fire. blockdiagram

Fig. 1 : Building damage map

Figure 2 shows the distribution of building damage for the Caldor Fire. The following Table shows the number of structures in each damage category.

Damage category	Counts [-]
No Damage	3356
Affected (1-9%)	56
Minor (10-25%)	18
Major (26-50%)	7
Destroyed (>50%)	1005
Inaccessible	2
Total	4444

blockdiagram

Fig. 2 : Distribution of buildings damage

Processing of dataset

Performed at obs dataset level

The data from the original CSV file were standardized without modification. The column names from the original csv file were corrected from “* Damage” to “Damage” and “* Incident Name” to “Incident Name” to simplify processing.

Binary classes of building damage

Performed at benchmark run level

To perform some calculations, the damaged building classes can be aggregated to form binary classes. The Inaccessible is ignored. The following aggregation method is used:

unburnt binary class contains No Damage, Affected (1-9%), and Minor (10-25%),
burnt binary class contains Major (26-50%), and Destroyed (>50%).

Benchmarks

See Key Performance Indicator (KPI) and normalization definitions here.

Binary Structure Loss Accuracy

Short IDs: BD01
KPI: Binary Structure Loss Accuracy
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary Structure Loss Accuracy
This benchmark is performed on the binary classes for damaged buildings.

Binary Structure Loss Precision

Short IDs: BD02
KPI: Binary Structure Loss Precision
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary Structure Loss Precision
This benchmark is performed on the binary classes for damaged buildings.

Binary Structure Loss Recall

Short IDs: BD03
KPI: Binary Structure Loss Recall
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary Structure Loss Recall
This benchmark is performed on the binary classes for damaged buildings.

Binary Structure Loss Specificity

Short IDs: BD04
KPI: Binary Structure Loss Specificity
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary Structure Loss Specificity
This benchmark is performed on the binary classes for damaged buildings.

Binary Structure Loss Negative Predictive Value

Short IDs: BD05
KPI: Binary Structure Loss Negative Predictive Value
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary Structure Loss Negative Predictive Value
This benchmark is performed on the binary classes for damaged buildings.

Binary Structure Loss F1 Score

Short IDs: BD06
KPI: Binary Structure Loss F1 Score
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary Structure Loss F1 Score
This benchmark is performed on the binary classes for damaged buildings.

Burn severity from MTBS

Dataset

The data has been collected using Monitoring Trends in Burning Severity (MTBS). The original zip file contains burn severity, pre/post burn images, and the final fire perimeter. The source of the burn severity used in FireBench is the file ca3858612053820210815_20210805_20220723_dnbr6.tif. The source of the final fire perimeter is the kmz file ca3858612053820210815_20210805_20220723.kmz.

The burn severity categories, described with the corresponding index used in the dataset, are the following:

‘no data’: 0
‘unburnt to low’: 1
‘low’: 2
‘moderate’: 3
‘high’: 4
‘increased greenness’: 5

The hashes of the original source files are:

zip file: 171b9604c0654d8612eaabcfcad93d2374762661ab34b4d62718630a13469841
tif dnbr6: 33db74d3c5798c41ff3a4fc5ee57da9105fdc7a75d7f8af0d053d2f82cfdc0b6
final perimeter kmz: 4ed7a0ee585f8118b65a29375a3d5ee8a69e85a95ee155205ba5d781289c6e2b

Figure 3 shows the MTBS map from the original source.

blockdiagram

Fig. 3 : Map of burn severity from MTBS. Source: MTBS (`ca3858612053820210815_map.pdf`)

Processing of dataset

Performed at obs dataset level

The burn severity array is extracted from the original file without any modification. The latitude and longitude array are reconstructed using projection parameters (see firebench.standardize.mtbs.standardize_mtbs_from_geotiff). The final perimeter has been processed using QGIS. The original data (kmz file) has been imported and cleaned. Extra perimeters have been removed to conserve only the final fire perimeter. No modification to the polygons has been performed. Then, the multipolygons were exported to kml format and integrated into the dataset HDF5 file.

Binary classes for high severity

Performed at benchmark run level

To perform the high-severity benchmarks using a binary confusion matrix, we construct a binary field based on the high-severity index. All points will have a burn severity of 4 (‘high’) and will be assigned the value 1. The other points are assigned a value of 0. This processing is done when the benchmark is performed.

Benchmarks

See Key Performance Indicator (KPI) and normalization definitions here.

Binary High Severity Accuracy

Short IDs: SV01
KPI: Binary High Severity Accuracy
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Severity Accuracy
This benchmark is performed on the binary classes for high severity points (Binary High severity processed variable)

Binary High Severity Precision

Short IDs: SV02
KPI: Binary High Severity Precision
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Severity Precision
This benchmark is performed on the binary classes for high severity points (Binary High severity processed variable)

Binary High Severity Recall

Short IDs: SV03
KPI: Binary High Severity Recall
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Severity Recall
This benchmark is performed on the binary classes for high severity points (Binary High severity processed variable)

Binary High Severity Specificity

Short IDs: SV04
KPI: Binary High Severity Specificity
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Severity Specificity
This benchmark is performed on the binary classes for high severity points (Binary High severity processed variable)

Binary High Severity Negative Predictive Value

Short IDs: SV05
KPI: Binary High Severity Negative Predictive Value
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Severity Negative Predictive Value
This benchmark is performed on the binary classes for high severity points (Binary High severity processed variable)

Binary High Severity F1 Score

Short IDs: SV06
KPI: Binary High Severity F1 Score
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Severity F1 Score
This benchmark is performed on the binary classes for high severity points (Binary High severity processed variable)

Canopy cover loss

Dataset

The data has been collected using Rapid Assessment of Vegetation Condition after Wildfire (RAVG). The source of the canopy cover loss used in FireBench is the dataset over CONUS for 2021, ravg_2021_cc5.tif. The region around the Caldor Fire has been processed and standardized using the following bounding box:

south west: (38.4, -120.8)
north east: (39.0, -119.7)

The canopy cover loss categories, described with the corresponding index used in the dataset, are the following:

‘Unmappable’: 0
‘0%’: 1
‘>0-<25%’: 2
‘25-<50%’: 3
‘50-<75%’: 4
‘75-100%’: 5
‘Outide burn area’: 9

In addition, a bounding box has been used to remove the data from another fire (forced to 0):

south west: (38.6, -119.9)
north east: (38.805, -119.7)

Figure 4 shows the processed RAVG dataset available in FireBench.

blockdiagram

Fig. 4 : Map of standardized canopy cover loss from RAVG for Caldor Fire.

Processing of dataset

Performed at obs dataset level

A bounding box has been used to remove the data from another fire (forced to 0):

south west: (38.6, -119.9)
north east: (38.805, -119.7)

Masking using LANDFIRE dataset

Performed at benchmark run level

To perform an evaluation of high canopy cover loss, a mask is defined using three LANDFIRE datasets:

Canopy bulk density
Canopy height
Canopy bottom height

The variable masked high binary canopy cover loss used in various benchmarks is computed only where all LANDFIRE canopy variables (interpolated using the nearest method on the RAVG grid) are strictly greater than 0 (presence of canopy fuel) and is defined as a binary variable:

1 if RAVG canopy cover loss value is 5,
0 if RAVG canopy cover loss value is between 1 and 4,
nan otherwise.

Figure 5 shows the processed masked high binary canopy cover loss dataset used for related benchmarks.

blockdiagram

Fig. 5 : Map of standardized canopy cover loss from RAVG for Caldor Fire.

Benchmarks

See Key Performance Indicator (KPI) and normalization definitions here.

Masked High Binary Canopy Cover Loss Accuracy

Short IDs: CC01
KPI: Binary High Canopy Cover Loss Accuracy
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Canopy Cover Loss Accuracy
This benchmark is performed on the binary classes masked high binary canopy cover loss.

Masked High Binary Canopy Cover Precision

Short IDs: CC02
KPI: Binary High Canopy Cover Loss Precision
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Canopy Cover Loss Precision
This benchmark is performed on the binary classes masked high binary canopy cover loss.

Masked High Binary Canopy Cover Recall

Short IDs: CC03
KPI: Binary High Canopy Cover Loss Recall
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Canopy Cover Loss Recall
This benchmark is performed on the binary classes masked high binary canopy cover loss.

Masked High Binary Canopy Cover Specificity

Short IDs: CC04
KPI: Binary High Canopy Cover Loss Specificity
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Canopy Cover Loss Specificity
This benchmark is performed on the binary classes masked high binary canopy cover loss.

Masked High Binary Canopy Cover Negative Predictive Value

Short IDs: CC05
KPI: Binary High Canopy Cover Loss Negative Predictive Value
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Canopy Cover Loss Negative Predictive Value
This benchmark is performed on the binary classes masked high binary canopy cover loss.

Masked High Binary Canopy Cover F1 Score

Short IDs: CC06
KPI: Binary High Canopy Cover Loss F1 Score
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: Binary High Canopy Cover Loss F1 Score
This benchmark is performed on the binary classes masked high binary canopy cover loss.

Infrared fire perimeters

Dataset

The infrared fire perimeters have been gathered from NIROPS dataset. Every orginal file has been manually processed to extract only the perimeter. The time stamp of the perimeter has been defined from the imaging report (e.g. Report for 2021/08/17) using the Imagery Date and Imagery Time. The burn area obtained using the KML file and python tools has been verified against the Interpreted Acreage when specified in the reports. Each fire perimeter (see Fig. 6) is stored as a group within the HDF5 data file with attributes containing the path of the KML file that contains the fire perimeter dataset. The perimeters have been processed from August 17th (first IR perimeter available) to September 10th, when the burn area is 99% if the final burn area, as shown in Figure 7 (source: CALFIRE). The final dataset contains 21 perimeters.

The following study periods (see Fig. 7) are defined in the following Table:

Name	Start time	End time	Duration	Burn area [acre]
W1	Aug 17 20h20 PDT	Sep 10 23h34 PDT	24d 3h 14min	166,256
W2	Aug 19 20h45 PDT	Aug 21 21h15 PDT	2d 0h 30min	24,941
W3	Aug 26 02h30 PDT	Aug 28 20h30 PDT	2d 18h 0min	19,992
W4	Aug 28 20h30 PDT	Sep 3 00h40 PDT	5d 4h 10min	56,272

Figure 6 shows the processed fire perimeter as a colored solid contour. The color of the contour indicates the timestamp of the perimeter.

blockdiagram

Fig. 6 : Infrared fire perimeters from August 17th to September 10th.

blockdiagram

Fig. 7 : Burn area derived from IR perimeters from August 17th to September 10th. The red dashed line shows the final burn area from CALFIRE. The orange dashed line shows the final burn area from the MTBS final perimeter.

Benchmarks

See Key Performance Indicator (KPI) and normalization definitions here.

Average Jaccard Index over study period

Short IDs: See Table
KPI: Average Jaccard Index
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: See Table
The first perimeter at the start of the period can serve as an initial condition for the fire perimeter. The first perimeter is not used to compute any metric. The area preserving project used is EPSG:5070.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Name in Score Card
FP01	W1	Average Jaccard Index W1
FP02	W2	Average Jaccard Index W2
FP03	W3	Average Jaccard Index W3
FP04	W4	Average Jaccard Index W4

Minimum Jaccard Index over study period

Short IDs: See Table
KPI: Minimum Jaccard Index
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: See Table
The first perimeter at the start of the period can serve as an initial condition for the fire perimeter. The first perimeter is not used to compute any metric. The area preserving project used is EPSG:5070.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Name in Score Card
FP05	W1	Minimum Jaccard Index W1
FP06	W2	Minimum Jaccard Index W2
FP07	W3	Minimum Jaccard Index W3
FP08	W4	Minimum Jaccard Index W4

Maximum Jaccard Index over study period

Short IDs: See Table
KPI: Maximum Jaccard Index
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: See Table
The first perimeter at the start of the period can serve as an initial condition for the fire perimeter. The first perimeter is not used to compute any metric. The area preserving project used is EPSG:5070.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Name in Score Card
FP09	W1	Minimum Jaccard Index W1
FP10	W2	Minimum Jaccard Index W2
FP11	W3	Minimum Jaccard Index W3
FP12	W4	Minimum Jaccard Index W4

Average Dice-Sorensen Index over study period

Short IDs: See Table
KPI: Average Dice-Sorensen Index
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: See Table
The first perimeter at the start of the period can serve as an initial condition for the fire perimeter. The first perimeter is not used to compute any metric. The area preserving project used is EPSG:5070.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Name in Score Card
FP13	W1	Average Dice-Sorensen Index W1
FP14	W2	Average Dice-Sorensen Index W2
FP15	W3	Average Dice-Sorensen Index W3
FP16	W4	Average Dice-Sorensen Index W4

Minimum Dice-Sorensen Index over study period

Short IDs: See Table
KPI: Minimum Dice-Sorensen Index
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: See Table
The first perimeter at the start of the period can serve as an initial condition for the fire perimeter. The first perimeter is not used to compute any metric. The area preserving project used is EPSG:5070.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Name in Score Card
FP17	W1	Minimum Dice-Sorensen Index W1
FP18	W2	Minimum Dice-Sorensen Index W2
FP19	W3	Minimum Dice-Sorensen Index W3
FP20	W4	Minimum Dice-Sorensen Index W4

Maximum Dice-Sorensen Index over study period

Short IDs: See Table
KPI: Maximum Dice-Sorensen Index
Normalization: Linear Bounded Normalization with \(a=0\), \(b=1\)
Name in Score Card: See Table
The first perimeter at the start of the period can serve as an initial condition for the fire perimeter. The first perimeter is not used to compute any metric. The area preserving project used is EPSG:5070.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Name in Score Card
FP21	W1	Minimum Dice-Sorensen Index W1
FP22	W2	Minimum Dice-Sorensen Index W2
FP23	W3	Minimum Dice-Sorensen Index W3
FP24	W4	Minimum Dice-Sorensen Index W4

Final Burn Area Bias

Short IDs: See Table
KPI: Burn Area Bias
Normalization: Symmetric Exponential Open Normalization (\(m\) value in Table)
Name in Score Card: See Table
The first perimeter, at the start of the period, can be used as initial condition for the fire perimeter. The bias is calculated on the last perimeter of the study period as the difference between the model and the observed burn area. A bias of \(m\) acres, representing \(B_{50}\)% of burn area during the study period, will lead to a score of 50.00. The value of \(m\) represents the benchmark difficulty (smaller \(m\) means greater difficulty) and must be chosen by the community.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Name in Score Card	\(m\)	\(B_{50}\)
FP25	W1	Burn Area Bias W1	80,000	48%
FP26	W2	Burn Area Bias W2	5,000	20%
FP27	W3	Burn Area Bias W3	5,000	25%
FP28	W4	Burn Area Bias W4	17,000	30%

Burn Area RMSE

Short IDs: See Table
KPI: Burn Area RMSE
Normalization: Symmetric Exponential Open Normalization (\(m\) value in Table)
Name in Score Card: See Table
The first perimeter, at the start of the period, can be used as initial condition for the fire perimeter. A bias of \(m\) acres, representing \(B_{50}\)% of burn area during the study period, will lead to a score of 50.00. The value of \(m\) represents the benchmark difficulty (smaller \(m\) means greater difficulty) and must be chosen by the community.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Name in Score Card	\(m\)	\(B_{50}\)
FP29	W1	Burn Area RMSE W1	80,000	48%
FP30	W2	Burn Area RMSE W2	5,000	20%
FP31	W3	Burn Area RMSE W3	5,000	25%
FP32	W4	Burn Area RMSE W4	17,000	30%

Weather stations

Dataset

Weather stations datasets have been gathered from Synoptics. All the stations available in the following bounding box have been processed:

south west: (38.4, -120.8)
north east: (39.0, -119.7)

The following variables have been processed (following FireBench namespace):

air_temperature
relative_humidity
solar_radiation
fuel_moisture_content_10h
wind_direction
wind_gust
wind_speed

Note

If you want to process more variables or require new benchmarks for existing variables, please reach out to the FireBench team to integrate these changes into a future version of the benchmarks.

Some stations don’t have data for the period W1 and have been excluded from the dataset. The list of excluded stations for missing data in the study period is: 403_PG, 412_PG, 413_PG, F9934. Also, some stations did not meet the data quality criterion and have been excluded from the dataset. The list of excluded stations for data quality reasons is: AV833, BLCC1, C9148, COOPDAGN2, COOPMINN2, FOIC1, FPDC1, G0658, GEOC1, LNLC1, PFHC1, SBKC1, SLPC1, STAN2, UTRC1, WDFC1, XOHC1.

Sensor height data has been extracted following the sensor height priority rules defined here. The current version of knowledge about sensor heights for the case weather stations are:

10 stations with a complete dataset (sensor height found in the source file)
98 stations with missing metadata
21 stations skipped
81 datasets with sensor height metadata
0 datasets from trusted stations from the FireBench database
0 datasets from trusted history from the FireBench database
5 datasets from the FireBench provider default database
394 datasets using FireBench default metadata

Therefore, 81 datasets are considered trusted and will be used in the benchmarks trusted source only (TSO). All 399 datasets are used in benchmarks “all sources”.

Note

If you have information about sensor height and want to help increase the number of trusted datasets, please get in touch with the FireBench Team.

Weather stations are stored in the HDF5 file using their STID.

Benchmarks

See Key Performance Indicator (KPI) and normalization definitions here.

Air temperature

Short IDs: See Table
KPI: Air temperature MAE/RMSE/Bias
Normalization: Symmetric Exponential Open Normalization (\(m\) value in Table)
Name in Score Card: See Table
Each metric (MAE, RMSE, Bias) is calculated for each station for both model and observational dataset for a specified period. Then we apply summary statistics (e.g., min, mean, Q3) across all available weather stations before applying the normalization. Implementation of metrics are firebench.metrics.stats.mae, firebench.metrics.stats.rmse, firebench.metrics.stats.bias. Datasets are converted into degC for comparison. The normalization parameter \(m\) sets which KPI value gives a Score of 50. It represents the difficulty of the benchmark.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Summary stats func	Name in Score Card	\(m\)	trusted source only
WX001	W1	MAE	Air temp MAE min W1 TSO	5.0 degC	False
WX002	W1	MAE	Air temp MAE mean W1 TSO	5.0 degC	False
WX003	W1	MAE	Air temp MAE max W1 TSO	5.0 degC	False
WX004	W1	MAE	Air temp MAE min W1	5.0 degC	True
WX005	W1	MAE	Air temp MAE mean W1	5.0 degC	True
WX006	W1	MAE	Air temp MAE max W1	5.0 degC	True
WX007	W1	RMSE	Air temp RMSE min W1 TSO	5.0 degC	False
WX008	W1	RMSE	Air temp RMSE mean W1 TSO	5.0 degC	False
WX009	W1	RMSE	Air temp RMSE max W1 TSO	5.0 degC	False
WX010	W1	RMSE	Air temp RMSE min W1	5.0 degC	True
WX011	W1	RMSE	Air temp RMSE mean W1	5.0 degC	True
WX012	W1	RMSE	Air temp RMSE max W1	5.0 degC	True
WX013	W1	Bias	Air temp Bias min W1 TSO	5.0 degC	False
WX014	W1	Bias	Air temp Bias mean W1 TSO	5.0 degC	False
WX015	W1	Bias	Air temp Bias max W1 TSO	5.0 degC	False
WX016	W1	Bias	Air temp Bias min W1	5.0 degC	True
WX017	W1	Bias	Air temp Bias mean W1	5.0 degC	True
WX018	W1	Bias	Air temp Bias max W1	5.0 degC	True
WX019	W2	MAE	Air temp MAE min W2 TSO	5.0 degC	False
WX020	W2	MAE	Air temp MAE mean W2 TSO	5.0 degC	False
WX021	W2	MAE	Air temp MAE max W2 TSO	5.0 degC	False
WX022	W2	MAE	Air temp MAE min W2	5.0 degC	True
WX023	W2	MAE	Air temp MAE mean W2	5.0 degC	True
WX024	W2	MAE	Air temp MAE max W2	5.0 degC	True
WX025	W2	RMSE	Air temp RMSE min W2 TSO	5.0 degC	False
WX026	W2	RMSE	Air temp RMSE mean W2 TSO	5.0 degC	False
WX027	W2	RMSE	Air temp RMSE max W2 TSO	5.0 degC	False
WX028	W2	RMSE	Air temp RMSE min W2	5.0 degC	True
WX029	W2	RMSE	Air temp RMSE mean W2	5.0 degC	True
WX030	W2	RMSE	Air temp RMSE max W2	5.0 degC	True
WX031	W2	Bias	Air temp Bias min W2 TSO	5.0 degC	False
WX032	W2	Bias	Air temp Bias mean W2 TSO	5.0 degC	False
WX033	W2	Bias	Air temp Bias max W2 TSO	5.0 degC	False
WX034	W2	Bias	Air temp Bias min W2	5.0 degC	True
WX035	W2	Bias	Air temp Bias mean W2	5.0 degC	True
WX036	W2	Bias	Air temp Bias max W2	5.0 degC	True
WX037	W3	MAE	Air temp MAE min W3 TSO	5.0 degC	False
WX038	W3	MAE	Air temp MAE mean W3 TSO	5.0 degC	False
WX039	W3	MAE	Air temp MAE max W3 TSO	5.0 degC	False
WX040	W3	MAE	Air temp MAE min W3	5.0 degC	True
WX041	W3	MAE	Air temp MAE mean W3	5.0 degC	True
WX042	W3	MAE	Air temp MAE max W3	5.0 degC	True
WX043	W3	RMSE	Air temp RMSE min W3 TSO	5.0 degC	False
WX044	W3	RMSE	Air temp RMSE mean W3 TSO	5.0 degC	False
WX045	W3	RMSE	Air temp RMSE max W3 TSO	5.0 degC	False
WX046	W3	RMSE	Air temp RMSE min W3	5.0 degC	True
WX047	W3	RMSE	Air temp RMSE mean W3	5.0 degC	True
WX048	W3	RMSE	Air temp RMSE max W3	5.0 degC	True
WX049	W3	Bias	Air temp Bias min W3 TSO	5.0 degC	False
WX050	W3	Bias	Air temp Bias mean W3 TSO	5.0 degC	False
WX051	W3	Bias	Air temp Bias max W3 TSO	5.0 degC	False
WX052	W3	Bias	Air temp Bias min W3	5.0 degC	True
WX053	W3	Bias	Air temp Bias mean W3	5.0 degC	True
WX054	W3	Bias	Air temp Bias max W3	5.0 degC	True
WX055	W4	MAE	Air temp MAE min W4 TSO	5.0 degC	False
WX056	W4	MAE	Air temp MAE mean W4 TSO	5.0 degC	False
WX057	W4	MAE	Air temp MAE max W4 TSO	5.0 degC	False
WX058	W4	MAE	Air temp MAE min W4	5.0 degC	True
WX059	W4	MAE	Air temp MAE mean W4	5.0 degC	True
WX060	W4	MAE	Air temp MAE max W4	5.0 degC	True
WX061	W4	RMSE	Air temp RMSE min W4 TSO	5.0 degC	False
WX062	W4	RMSE	Air temp RMSE mean W4 TSO	5.0 degC	False
WX063	W4	RMSE	Air temp RMSE max W4 TSO	5.0 degC	False
WX064	W4	RMSE	Air temp RMSE min W4	5.0 degC	True
WX065	W4	RMSE	Air temp RMSE mean W4	5.0 degC	True
WX066	W4	RMSE	Air temp RMSE max W4	5.0 degC	True
WX067	W4	Bias	Air temp Bias min W4 TSO	5.0 degC	False
WX068	W4	Bias	Air temp Bias mean W4 TSO	5.0 degC	False
WX069	W4	Bias	Air temp Bias max W4 TSO	5.0 degC	False
WX070	W4	Bias	Air temp Bias min W4	5.0 degC	True
WX071	W4	Bias	Air temp Bias mean W4	5.0 degC	True
WX072	W4	Bias	Air temp Bias max W4	5.0 degC	True

Relative Humidity

Short IDs: See Table
KPI: Relative humidity MAE/RMSE/Bias
Normalization: Symmetric Exponential Open Normalization (\(m\) value in Table)
Name in Score Card: See Table
Each metric (MAE, RMSE, Bias) is calculated for each station for both model and observational dataset for a specified period. Then we apply summary statistics (e.g., min, mean, Q3) across all available weather stations before applying the normalization. Implementation of metrics are firebench.metrics.stats.mae, firebench.metrics.stats.rmse, firebench.metrics.stats.bias. Datasets are converted into percent for comparison. The normalization parameter \(m\) sets which KPI value gives a Score of 50. It represents the difficulty of the benchmark.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Summary stats func	Name in Score Card	\(m\)	trusted source only
WX073	W1	MAE	RH MAE min W1 TSO	15.0 percent	False
WX074	W1	MAE	RH MAE mean W1 TSO	15.0 percent	False
WX075	W1	MAE	RH MAE max W1 TSO	15.0 percent	False
WX076	W1	MAE	RH MAE min W1	15.0 percent	True
WX077	W1	MAE	RH MAE mean W1	15.0 percent	True
WX078	W1	MAE	RH MAE max W1	15.0 percent	True
WX079	W1	RMSE	RH RMSE min W1 TSO	15.0 percent	False
WX080	W1	RMSE	RH RMSE mean W1 TSO	15.0 percent	False
WX081	W1	RMSE	RH RMSE max W1 TSO	15.0 percent	False
WX082	W1	RMSE	RH RMSE min W1	15.0 percent	True
WX083	W1	RMSE	RH RMSE mean W1	15.0 percent	True
WX084	W1	RMSE	RH RMSE max W1	15.0 percent	True
WX085	W1	Bias	RH Bias min W1 TSO	15.0 percent	False
WX086	W1	Bias	RH Bias mean W1 TSO	15.0 percent	False
WX087	W1	Bias	RH Bias max W1 TSO	15.0 percent	False
WX088	W1	Bias	RH Bias min W1	15.0 percent	True
WX089	W1	Bias	RH Bias mean W1	15.0 percent	True
WX090	W1	Bias	RH Bias max W1	15.0 percent	True
WX091	W2	MAE	RH MAE min W2 TSO	15.0 percent	False
WX092	W2	MAE	RH MAE mean W2 TSO	15.0 percent	False
WX093	W2	MAE	RH MAE max W2 TSO	15.0 percent	False
WX094	W2	MAE	RH MAE min W2	15.0 percent	True
WX095	W2	MAE	RH MAE mean W2	15.0 percent	True
WX096	W2	MAE	RH MAE max W2	15.0 percent	True
WX097	W2	RMSE	RH RMSE min W2 TSO	15.0 percent	False
WX098	W2	RMSE	RH RMSE mean W2 TSO	15.0 percent	False
WX099	W2	RMSE	RH RMSE max W2 TSO	15.0 percent	False
WX100	W2	RMSE	RH RMSE min W2	15.0 percent	True
WX101	W2	RMSE	RH RMSE mean W2	15.0 percent	True
WX102	W2	RMSE	RH RMSE max W2	15.0 percent	True
WX103	W2	Bias	RH Bias min W2 TSO	15.0 percent	False
WX104	W2	Bias	RH Bias mean W2 TSO	15.0 percent	False
WX105	W2	Bias	RH Bias max W2 TSO	15.0 percent	False
WX106	W2	Bias	RH Bias min W2	15.0 percent	True
WX107	W2	Bias	RH Bias mean W2	15.0 percent	True
WX108	W2	Bias	RH Bias max W2	15.0 percent	True
WX109	W3	MAE	RH MAE min W3 TSO	15.0 percent	False
WX110	W3	MAE	RH MAE mean W3 TSO	15.0 percent	False
WX111	W3	MAE	RH MAE max W3 TSO	15.0 percent	False
WX112	W3	MAE	RH MAE min W3	15.0 percent	True
WX113	W3	MAE	RH MAE mean W3	15.0 percent	True
WX114	W3	MAE	RH MAE max W3	15.0 percent	True
WX115	W3	RMSE	RH RMSE min W3 TSO	15.0 percent	False
WX116	W3	RMSE	RH RMSE mean W3 TSO	15.0 percent	False
WX117	W3	RMSE	RH RMSE max W3 TSO	15.0 percent	False
WX118	W3	RMSE	RH RMSE min W3	15.0 percent	True
WX119	W3	RMSE	RH RMSE mean W3	15.0 percent	True
WX120	W3	RMSE	RH RMSE max W3	15.0 percent	True
WX121	W3	Bias	RH Bias min W3 TSO	15.0 percent	False
WX122	W3	Bias	RH Bias mean W3 TSO	15.0 percent	False
WX123	W3	Bias	RH Bias max W3 TSO	15.0 percent	False
WX124	W3	Bias	RH Bias min W3	15.0 percent	True
WX125	W3	Bias	RH Bias mean W3	15.0 percent	True
WX126	W3	Bias	RH Bias max W3	15.0 percent	True
WX127	W4	MAE	RH MAE min W4 TSO	15.0 percent	False
WX128	W4	MAE	RH MAE mean W4 TSO	15.0 percent	False
WX129	W4	MAE	RH MAE max W4 TSO	15.0 percent	False
WX130	W4	MAE	RH MAE min W4	15.0 percent	True
WX131	W4	MAE	RH MAE mean W4	15.0 percent	True
WX132	W4	MAE	RH MAE max W4	15.0 percent	True
WX133	W4	RMSE	RH RMSE min W4 TSO	15.0 percent	False
WX134	W4	RMSE	RH RMSE mean W4 TSO	15.0 percent	False
WX135	W4	RMSE	RH RMSE max W4 TSO	15.0 percent	False
WX136	W4	RMSE	RH RMSE min W4	15.0 percent	True
WX137	W4	RMSE	RH RMSE mean W4	15.0 percent	True
WX138	W4	RMSE	RH RMSE max W4	15.0 percent	True
WX139	W4	Bias	RH Bias min W4 TSO	15.0 percent	False
WX140	W4	Bias	RH Bias mean W4 TSO	15.0 percent	False
WX141	W4	Bias	RH Bias max W4 TSO	15.0 percent	False
WX142	W4	Bias	RH Bias min W4	15.0 percent	True
WX143	W4	Bias	RH Bias mean W4	15.0 percent	True
WX144	W4	Bias	RH Bias max W4	15.0 percent	True

Wind Speed

Short IDs: See Table
KPI: Wind Speed MAE/RMSE/Bias
Normalization: Symmetric Exponential Open Normalization (\(m\) value in Table)
Name in Score Card: See Table
Each metric (MAE, RMSE, Bias) is calculated for each station for both model and observational dataset for a specified period. Then we apply summary statistics (e.g., min, mean, Q3) across all available weather stations before applying the normalization. Implementation of metrics are firebench.metrics.stats.mae, firebench.metrics.stats.rmse, firebench.metrics.stats.bias. Datasets are converted into m/s for comparison. The normalization parameter \(m\) sets which KPI value gives a Score of 50. It represents the difficulty of the benchmark.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Summary stats func	Name in Score Card	\(m\)	trusted source only
WX145	W1	MAE	Wind Speed MAE min W1 TSO	5.0 m/s	False
WX146	W1	MAE	Wind Speed MAE mean W1 TSO	5.0 m/s	False
WX147	W1	MAE	Wind Speed MAE max W1 TSO	5.0 m/s	False
WX148	W1	MAE	Wind Speed MAE min W1	5.0 m/s	True
WX149	W1	MAE	Wind Speed MAE mean W1	5.0 m/s	True
WX150	W1	MAE	Wind Speed MAE max W1	5.0 m/s	True
WX151	W1	RMSE	Wind Speed RMSE min W1 TSO	5.0 m/s	False
WX152	W1	RMSE	Wind Speed RMSE mean W1 TSO	5.0 m/s	False
WX153	W1	RMSE	Wind Speed RMSE max W1 TSO	5.0 m/s	False
WX154	W1	RMSE	Wind Speed RMSE min W1	5.0 m/s	True
WX155	W1	RMSE	Wind Speed RMSE mean W1	5.0 m/s	True
WX156	W1	RMSE	Wind Speed RMSE max W1	5.0 m/s	True
WX157	W1	Bias	Wind Speed Bias min W1 TSO	5.0 m/s	False
WX158	W1	Bias	Wind Speed Bias mean W1 TSO	5.0 m/s	False
WX159	W1	Bias	Wind Speed Bias max W1 TSO	5.0 m/s	False
WX160	W1	Bias	Wind Speed Bias min W1	5.0 m/s	True
WX161	W1	Bias	Wind Speed Bias mean W1	5.0 m/s	True
WX162	W1	Bias	Wind Speed Bias max W1	5.0 m/s	True
WX163	W2	MAE	Wind Speed MAE min W2 TSO	5.0 m/s	False
WX164	W2	MAE	Wind Speed MAE mean W2 TSO	5.0 m/s	False
WX165	W2	MAE	Wind Speed MAE max W2 TSO	5.0 m/s	False
WX166	W2	MAE	Wind Speed MAE min W2	5.0 m/s	True
WX167	W2	MAE	Wind Speed MAE mean W2	5.0 m/s	True
WX168	W2	MAE	Wind Speed MAE max W2	5.0 m/s	True
WX169	W2	RMSE	Wind Speed RMSE min W2 TSO	5.0 m/s	False
WX170	W2	RMSE	Wind Speed RMSE mean W2 TSO	5.0 m/s	False
WX171	W2	RMSE	Wind Speed RMSE max W2 TSO	5.0 m/s	False
WX172	W2	RMSE	Wind Speed RMSE min W2	5.0 m/s	True
WX173	W2	RMSE	Wind Speed RMSE mean W2	5.0 m/s	True
WX174	W2	RMSE	Wind Speed RMSE max W2	5.0 m/s	True
WX175	W2	Bias	Wind Speed Bias min W2 TSO	5.0 m/s	False
WX176	W2	Bias	Wind Speed Bias mean W2 TSO	5.0 m/s	False
WX177	W2	Bias	Wind Speed Bias max W2 TSO	5.0 m/s	False
WX178	W2	Bias	Wind Speed Bias min W2	5.0 m/s	True
WX179	W2	Bias	Wind Speed Bias mean W2	5.0 m/s	True
WX180	W2	Bias	Wind Speed Bias max W2	5.0 m/s	True
WX181	W3	MAE	Wind Speed MAE min W3 TSO	5.0 m/s	False
WX182	W3	MAE	Wind Speed MAE mean W3 TSO	5.0 m/s	False
WX183	W3	MAE	Wind Speed MAE max W3 TSO	5.0 m/s	False
WX184	W3	MAE	Wind Speed MAE min W3	5.0 m/s	True
WX185	W3	MAE	Wind Speed MAE mean W3	5.0 m/s	True
WX186	W3	MAE	Wind Speed MAE max W3	5.0 m/s	True
WX187	W3	RMSE	Wind Speed RMSE min W3 TSO	5.0 m/s	False
WX188	W3	RMSE	Wind Speed RMSE mean W3 TSO	5.0 m/s	False
WX189	W3	RMSE	Wind Speed RMSE max W3 TSO	5.0 m/s	False
WX190	W3	RMSE	Wind Speed RMSE min W3	5.0 m/s	True
WX191	W3	RMSE	Wind Speed RMSE mean W3	5.0 m/s	True
WX192	W3	RMSE	Wind Speed RMSE max W3	5.0 m/s	True
WX193	W3	Bias	Wind Speed Bias min W3 TSO	5.0 m/s	False
WX194	W3	Bias	Wind Speed Bias mean W3 TSO	5.0 m/s	False
WX195	W3	Bias	Wind Speed Bias max W3 TSO	5.0 m/s	False
WX196	W3	Bias	Wind Speed Bias min W3	5.0 m/s	True
WX197	W3	Bias	Wind Speed Bias mean W3	5.0 m/s	True
WX198	W3	Bias	Wind Speed Bias max W3	5.0 m/s	True
WX199	W4	MAE	Wind Speed MAE min W4 TSO	5.0 m/s	False
WX200	W4	MAE	Wind Speed MAE mean W4 TSO	5.0 m/s	False
WX201	W4	MAE	Wind Speed MAE max W4 TSO	5.0 m/s	False
WX202	W4	MAE	Wind Speed MAE min W4	5.0 m/s	True
WX203	W4	MAE	Wind Speed MAE mean W4	5.0 m/s	True
WX204	W4	MAE	Wind Speed MAE max W4	5.0 m/s	True
WX205	W4	RMSE	Wind Speed RMSE min W4 TSO	5.0 m/s	False
WX206	W4	RMSE	Wind Speed RMSE mean W4 TSO	5.0 m/s	False
WX207	W4	RMSE	Wind Speed RMSE max W4 TSO	5.0 m/s	False
WX208	W4	RMSE	Wind Speed RMSE min W4	5.0 m/s	True
WX209	W4	RMSE	Wind Speed RMSE mean W4	5.0 m/s	True
WX210	W4	RMSE	Wind Speed RMSE max W4	5.0 m/s	True
WX211	W4	Bias	Wind Speed Bias min W4 TSO	5.0 m/s	False
WX212	W4	Bias	Wind Speed Bias mean W4 TSO	5.0 m/s	False
WX213	W4	Bias	Wind Speed Bias max W4 TSO	5.0 m/s	False
WX214	W4	Bias	Wind Speed Bias min W4	5.0 m/s	True
WX215	W4	Bias	Wind Speed Bias mean W4	5.0 m/s	True
WX216	W4	Bias	Wind Speed Bias max W4	5.0 m/s	True

Wind Direction

Short IDs: See Table
KPI: Wind Direction circular Bias
Normalization: Symmetric Exponential Open Normalization (\(m\) value in Table)
Name in Score Card: See Table
Each metric is calculated for each station for both model and observational dataset for a specified period. Then we apply summary statistics (e.g., min, mean, Q3) across all available weather stations before applying the normalization. Implementation of metrics are firebench.metrics.stats.circular_bias_deg. Datasets are converted into degree for comparison. The normalization parameter \(m\) sets which KPI value gives a Score of 50. It represents the difficulty of the benchmark.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Summary stats func	Name in Score Card	\(m\)	trusted source only
WX217	W1	circular bias	Wind Direction circular bias min W1 TSO	45.0 degree	False
WX218	W1	circular bias	Wind Direction circular bias mean W1 TSO	45.0 degree	False
WX219	W1	circular bias	Wind Direction circular bias max W1 TSO	45.0 degree	False
WX220	W1	circular bias	Wind Direction circular bias min W1	45.0 degree	True
WX221	W1	circular bias	Wind Direction circular bias mean W1	45.0 degree	True
WX222	W1	circular bias	Wind Direction circular bias max W1	45.0 degree	True
WX223	W2	circular bias	Wind Direction circular bias min W2 TSO	45.0 degree	False
WX224	W2	circular bias	Wind Direction circular bias mean W2 TSO	45.0 degree	False
WX225	W2	circular bias	Wind Direction circular bias max W2 TSO	45.0 degree	False
WX226	W2	circular bias	Wind Direction circular bias min W2	45.0 degree	True
WX227	W2	circular bias	Wind Direction circular bias mean W2	45.0 degree	True
WX228	W2	circular bias	Wind Direction circular bias max W2	45.0 degree	True
WX229	W3	circular bias	Wind Direction circular bias min W3 TSO	45.0 degree	False
WX230	W3	circular bias	Wind Direction circular bias mean W3 TSO	45.0 degree	False
WX231	W3	circular bias	Wind Direction circular bias max W3 TSO	45.0 degree	False
WX232	W3	circular bias	Wind Direction circular bias min W3	45.0 degree	True
WX233	W3	circular bias	Wind Direction circular bias mean W3	45.0 degree	True
WX234	W3	circular bias	Wind Direction circular bias max W3	45.0 degree	True
WX235	W4	circular bias	Wind Direction circular bias min W4 TSO	45.0 degree	False
WX236	W4	circular bias	Wind Direction circular bias mean W4 TSO	45.0 degree	False
WX237	W4	circular bias	Wind Direction circular bias max W4 TSO	45.0 degree	False
WX238	W4	circular bias	Wind Direction circular bias min W4	45.0 degree	True
WX239	W4	circular bias	Wind Direction circular bias mean W4	45.0 degree	True
WX240	W4	circular bias	Wind Direction circular bias max W4	45.0 degree	True

Fuel Moisture Content 10h

Short IDs: See Table
KPI: FMC 10h MAE/RMSE/Bias
Normalization: Symmetric Exponential Open Normalization (\(m\) value in Table)
Name in Score Card: See Table
Each metric is calculated for each station for both model and observational dataset for a specified period. Then we apply summary statistics (e.g., min, mean, Q3) across all available weather stations before applying the normalization. Implementation of metrics are firebench.metrics.stats.mae, firebench.metrics.stats.rmse, firebench.metrics.stats.bias. Datasets are converted into percent for comparison. The normalization parameter \(m\) sets which KPI value gives a Score of 50. It represents the difficulty of the benchmark.

The following Table gives the correspondence between the benchmark ID and the study period:

ID	Study period	Summary stats func	Name in Score Card	\(m\)	trusted source only
WX241	W1	MAE	FMC 10h MAE min W1 TSO	5.0 percent	False
WX242	W1	MAE	FMC 10h MAE mean W1 TSO	5.0 percent	False
WX243	W1	MAE	FMC 10h MAE max W1 TSO	5.0 percent	False
WX244	W1	MAE	FMC 10h MAE min W1	5.0 percent	True
WX245	W1	MAE	FMC 10h MAE mean W1	5.0 percent	True
WX246	W1	MAE	FMC 10h MAE max W1	5.0 percent	True
WX247	W1	RMSE	FMC 10h RMSE min W1 TSO	5.0 percent	False
WX248	W1	RMSE	FMC 10h RMSE mean W1 TSO	5.0 percent	False
WX249	W1	RMSE	FMC 10h RMSE max W1 TSO	5.0 percent	False
WX250	W1	RMSE	FMC 10h RMSE min W1	5.0 percent	True
WX251	W1	RMSE	FMC 10h RMSE mean W1	5.0 percent	True
WX252	W1	RMSE	FMC 10h RMSE max W1	5.0 percent	True
WX253	W1	Bias	FMC 10h Bias min W1 TSO	5.0 percent	False
WX254	W1	Bias	FMC 10h Bias mean W1 TSO	5.0 percent	False
WX255	W1	Bias	FMC 10h Bias max W1 TSO	5.0 percent	False
WX256	W1	Bias	FMC 10h Bias min W1	5.0 percent	True
WX257	W1	Bias	FMC 10h Bias mean W1	5.0 percent	True
WX258	W1	Bias	FMC 10h Bias max W1	5.0 percent	True
WX259	W2	MAE	FMC 10h MAE min W2 TSO	5.0 percent	False
WX260	W2	MAE	FMC 10h MAE mean W2 TSO	5.0 percent	False
WX261	W2	MAE	FMC 10h MAE max W2 TSO	5.0 percent	False
WX262	W2	MAE	FMC 10h MAE min W2	5.0 percent	True
WX263	W2	MAE	FMC 10h MAE mean W2	5.0 percent	True
WX264	W2	MAE	FMC 10h MAE max W2	5.0 percent	True
WX265	W2	RMSE	FMC 10h RMSE min W2 TSO	5.0 percent	False
WX266	W2	RMSE	FMC 10h RMSE mean W2 TSO	5.0 percent	False
WX267	W2	RMSE	FMC 10h RMSE max W2 TSO	5.0 percent	False
WX268	W2	RMSE	FMC 10h RMSE min W2	5.0 percent	True
WX269	W2	RMSE	FMC 10h RMSE mean W2	5.0 percent	True
WX270	W2	RMSE	FMC 10h RMSE max W2	5.0 percent	True
WX271	W2	Bias	FMC 10h Bias min W2 TSO	5.0 percent	False
WX272	W2	Bias	FMC 10h Bias mean W2 TSO	5.0 percent	False
WX273	W2	Bias	FMC 10h Bias max W2 TSO	5.0 percent	False
WX274	W2	Bias	FMC 10h Bias min W2	5.0 percent	True
WX275	W2	Bias	FMC 10h Bias mean W2	5.0 percent	True
WX276	W2	Bias	FMC 10h Bias max W2	5.0 percent	True
WX277	W3	MAE	FMC 10h MAE min W3 TSO	5.0 percent	False
WX278	W3	MAE	FMC 10h MAE mean W3 TSO	5.0 percent	False
WX279	W3	MAE	FMC 10h MAE max W3 TSO	5.0 percent	False
WX280	W3	MAE	FMC 10h MAE min W3	5.0 percent	True
WX281	W3	MAE	FMC 10h MAE mean W3	5.0 percent	True
WX282	W3	MAE	FMC 10h MAE max W3	5.0 percent	True
WX283	W3	RMSE	FMC 10h RMSE min W3 TSO	5.0 percent	False
WX284	W3	RMSE	FMC 10h RMSE mean W3 TSO	5.0 percent	False
WX285	W3	RMSE	FMC 10h RMSE max W3 TSO	5.0 percent	False
WX286	W3	RMSE	FMC 10h RMSE min W3	5.0 percent	True
WX287	W3	RMSE	FMC 10h RMSE mean W3	5.0 percent	True
WX288	W3	RMSE	FMC 10h RMSE max W3	5.0 percent	True
WX289	W3	Bias	FMC 10h Bias min W3 TSO	5.0 percent	False
WX290	W3	Bias	FMC 10h Bias mean W3 TSO	5.0 percent	False
WX291	W3	Bias	FMC 10h Bias max W3 TSO	5.0 percent	False
WX292	W3	Bias	FMC 10h Bias min W3	5.0 percent	True
WX293	W3	Bias	FMC 10h Bias mean W3	5.0 percent	True
WX294	W3	Bias	FMC 10h Bias max W3	5.0 percent	True
WX295	W4	MAE	FMC 10h MAE min W4 TSO	5.0 percent	False
WX296	W4	MAE	FMC 10h MAE mean W4 TSO	5.0 percent	False
WX297	W4	MAE	FMC 10h MAE max W4 TSO	5.0 percent	False
WX298	W4	MAE	FMC 10h MAE min W4	5.0 percent	True
WX299	W4	MAE	FMC 10h MAE mean W4	5.0 percent	True
WX300	W4	MAE	FMC 10h MAE max W4	5.0 percent	True
WX301	W4	RMSE	FMC 10h RMSE min W4 TSO	5.0 percent	False
WX302	W4	RMSE	FMC 10h RMSE mean W4 TSO	5.0 percent	False
WX303	W4	RMSE	FMC 10h RMSE max W4 TSO	5.0 percent	False
WX304	W4	RMSE	FMC 10h RMSE min W4	5.0 percent	True
WX305	W4	RMSE	FMC 10h RMSE mean W4	5.0 percent	True
WX306	W4	RMSE	FMC 10h RMSE max W4	5.0 percent	True
WX307	W4	Bias	FMC 10h Bias min W4 TSO	5.0 percent	False
WX308	W4	Bias	FMC 10h Bias mean W4 TSO	5.0 percent	False
WX309	W4	Bias	FMC 10h Bias max W4 TSO	5.0 percent	False
WX310	W4	Bias	FMC 10h Bias min W4	5.0 percent	True
WX311	W4	Bias	FMC 10h Bias mean W4	5.0 percent	True
WX312	W4	Bias	FMC 10h Bias max W4	5.0 percent	True

Requirements

The following sections list the datasets’ requirements to run the different benchmarks. When the benchmark script runs, each requirement is validated against the HDF5 file provided as input (from the model output/data the user wants to evaluate). If a requirement is met, each corresponding benchmark is run. Each requirement lists the required datasets/groups (as paths) and the mandatory attributes for each dataset/group. The current version of FireBench does not support more complex checks (e.g., array size and dtype).

Requirement	Benchmarks
R01	BD01 to BD06
R02	SV01 to SV06
R03	FP01, FP05, FP09, FP13, FP17, FP21, FP25, FP29
R04	FP02, FP06, FP10, FP14, FP18, FP22, FP26, FP30
R05	FP03, FP07, FP11, FP15, FP19, FP23, FP27, FP31
R06	FP04, FP08, FP12, FP16, FP20, FP24, FP28, FP32
R07	CC01 to CC06
R08	WX001 to WX072
R09	WX073 to WX144
R10	WX145 to WX216
R11	WX217 to WX240
R12	WX241 to WX312

R01

Mandatory group/dataset	Mandatory attributes
`/points/building_damaged/building_damage`	units

R02

Mandatory group/dataset	Mandatory attributes
`/spatial_2d/Caldor_MTBS`	crs
`/spatial_2d/Caldor_MTBS/fire_burn_severity`	units, _FillValue
`/spatial_2d/Caldor_MTBS/position_lat`	units
`/spatial_2d/Caldor_MTBS/position_lon`	units

R03

Mandatory group/dataset	Mandatory attributes
`/polygons/Caldor_2021-08-18T20:30-07:00`	rel_path, time
`/polygons/Caldor_2021-08-19T20:45-07:00`	rel_path, time
`/polygons/Caldor_2021-08-20T20:20-07:00`	rel_path, time
`/polygons/Caldor_2021-08-21T21:15-07:00`	rel_path, time
`/polygons/Caldor_2021-08-24T22:07-07:00`	rel_path, time
`/polygons/Caldor_2021-08-26T03:30-06:00`	rel_path, time
`/polygons/Caldor_2021-08-26T22:15-06:00`	rel_path, time
`/polygons/Caldor_2021-08-27T00:22-06:00`	rel_path, time
`/polygons/Caldor_2021-08-28T21:30-06:00`	rel_path, time
`/polygons/Caldor_2021-08-29T22:32-07:00`	rel_path, time
`/polygons/Caldor_2021-08-30T21:09-07:00`	rel_path, time
`/polygons/Caldor_2021-08-31T21:08-07:00`	rel_path, time
`/polygons/Caldor_2021-09-01T21:12-07:00`	rel_path, time
`/polygons/Caldor_2021-09-03T00:40-07:00`	rel_path, time
`/polygons/Caldor_2021-09-04T23:29-07:00`	rel_path, time
`/polygons/Caldor_2021-09-05T23:41-07:00`	rel_path, time
`/polygons/Caldor_2021-09-06T23:09-07:00`	rel_path, time
`/polygons/Caldor_2021-09-07T22:40-07:00`	rel_path, time
`/polygons/Caldor_2021-09-08T22:33-07:00`	rel_path, time
`/polygons/Caldor_2021-09-10T23:34-07:00`	rel_path, time

Files (KML) at path defined in rel_path attributes must exist.

R04

Mandatory group/dataset	Mandatory attributes
`/polygons/Caldor_2021-08-20T20:20-07:00`	rel_path, time
`/polygons/Caldor_2021-08-21T21:15-07:00`	rel_path, time

Files (KML) at path defined in rel_path attributes must exist.

R05

Mandatory group/dataset	Mandatory attributes
`/polygons/Caldor_2021-08-26T22:15-06:00`	rel_path, time
`/polygons/Caldor_2021-08-27T00:22-06:00`	rel_path, time
`/polygons/Caldor_2021-08-28T21:30-06:00`	rel_path, time

Files (KML) at path defined in rel_path attributes must exist.

R06

Mandatory group/dataset	Mandatory attributes
`/polygons/Caldor_2021-08-29T22:32-07:00`	rel_path, time
`/polygons/Caldor_2021-08-30T21:09-07:00`	rel_path, time
`/polygons/Caldor_2021-08-31T21:08-07:00`	rel_path, time
`/polygons/Caldor_2021-09-01T21:12-07:00`	rel_path, time
`/polygons/Caldor_2021-09-03T00:40-07:00`	rel_path, time

Files (KML) at path defined in rel_path attributes must exist.

R07

Mandatory group/dataset	Mandatory attributes
`/spatial_2d/ravg_cc`	crs
`/spatial_2d/ravg_cc/ravg_canopy_cover_loss`	units, _FillValue
`/spatial_2d/ravg_cc/position_lat`	units
`/spatial_2d/ravg_cc/position_lon`	units

R08

Verify that the model and observational datasets contain the same weather station groups with the following datasets:

Mandatory group/dataset	Mandatory attributes
`/time_series/station_<name>/time`	None
`/time_series/station_<name>/air_temperature`	None

R09

Verify that the model and observational datasets contain the same weather station groups with the following datasets:

Mandatory group/dataset	Mandatory attributes
`/time_series/station_<name>/time`	None
`/time_series/station_<name>/relative_humidity`	None

R10

Verify that the model and observational datasets contain the same weather station groups with the following datasets:

Mandatory group/dataset	Mandatory attributes
`/time_series/station_<name>/time`	None
`/time_series/station_<name>/wind_speed`	None

R11

Verify that the model and observational datasets contain the same weather station groups with the following datasets:

Mandatory group/dataset	Mandatory attributes
`/time_series/station_<name>/time`	None
`/time_series/station_<name>/wind_direction`	None

R12

Verify that the model and observational datasets contain the same weather station groups with the following datasets:

Mandatory group/dataset	Mandatory attributes
`/time_series/station_<name>/time`	None
`/time_series/station_<name>/fuel_moisture_content_10h`	None

Aggregation Schemes

This section describes the weights used to aggregate KPI unit scores. More information about aggregation methods here. If the aggregation scheme 0 is specified, then no aggregation is performed. Therefore, group scores and total scores are not computed.

Group definition

All benchmarks have a default weight of 1 in each group. If custom weights are applied, refer to the custom weight Table.

Weight precedence:

Default benchmark weight: 1
Group benchmark overrides: apply to all schemes unless overridden
Scheme benchmark overrides: apply only within that scheme and override everything else

Group	Benchmark ID
Building Damage	BD01 to BD06
Burn Severity	SV01 to SV06
Fire Perimeter W1	FP01, FP05, FP09, FP13, FP17, FP21, FP25, FP29
Fire Perimeter W2	FP02, FP06, FP10, FP14, FP18, FP22, FP26, FP30
Fire Perimeter W3	FP03, FP07, FP11, FP15, FP19, FP23, FP27, FP31
Fire Perimeter W4	FP04, FP08, FP12, FP16, FP20, FP24, FP28, FP32
Canopy Cover Loss	CC01 to CC06
Air temperature W1	WX001 to WX018
Air temperature W2	WX019 to WX036
Air temperature W3	WX037 to WX054
Air temperature W4	WX055 to WX072
Relative humidity 10h W1	WX073 to WX090
Relative humidity 10h W2	WX091 to WX108
Relative humidity 10h W3	WX109 to WX126
Relative humidity 10h W4	WX127 to WX144
Wind speed W1	WX145 to WX162
Wind speed W2	WX163 to WX180
Wind speed W3	WX181 to WX198
Wind speed W4	WX199 to WX216
Wind direction W1	WX217 to WX222
Wind direction W2	WX223 to WX228
Wind direction W3	WX229 to WX234
Wind direction W4	WX235 to WX240
Fuel Moisture 10h W1	WX241 to WX258
Fuel Moisture 10h W2	WX259 to WX276
Fuel Moisture 10h W3	WX277 to WX294
Fuel Moisture 10h W4	WX295 to WX312

Scheme A

Scheme A contains all the groups with default weights. It can be used to evaluate complete model performance with balanced weighting.

Scheme B

Scheme B contains only the building damage group. It is used to evaluate the model only on building damage benchmarks.

Group	Group Weight
Building Damage	1

Scheme CC

Scheme CC contains only the canopy cover loss group. It is used to evaluate crown fire models.

Group	Group Weight
Canopy Cover Loss	1

Scheme FP

Scheme FP contains only the fire perimeter groups. It is used to evaluate the model only on fire perimeter benchmarks for all of the study periods.

Group	Group Weight
Fire Perimeter W1	1
Fire Perimeter W2	1
Fire Perimeter W3	1
Fire Perimeter W4	1

Scheme short_all

Scheme short_all contains all the groups except the groups relative to W1 study period. Therefore, the index i is in [2, 4].

Group	Group Weight
Air Temp Wi	1
Building Damage	1
Burn Severity	1
Canopy Cover Loss	1
Fire Perimeter Wi	1
FMC 10h Wi	1
RH Wi	1
Wind Direction Wi	1
Wind Speed Wi	1

Scheme S

Scheme S contains only the burn severity group. It is used to evaluate the model only on building severity from MTBS benchmarks.

Group	Group Weight
Burn Severity	1

Scheme WXi

Schemes WXi, for i in [1, 4], contains all the group related to weather stations for a specific study period (W1 to W4)

Group	Group Weight
Air Temp Wi	1
FMC 10h Wi	1
RH Wi	1
Wind Direction Wi	1
Wind Speed Wi	1

Scheme WX_short

Scheme short_all contains all the groups except the groups relative to W1 study period and fire perimeter groups. Therefore, the index i is in [2, 4].

Group	Group Weight
Air Temp Wi	1
Building Damage	1
Burn Severity	1
Canopy Cover Loss	1
FMC 10h Wi	1
RH Wi	1
Wind Direction Wi	1
Wind Speed Wi	1

Notes

Benchmark identifiers consist of a case ID and a short ID, for example FB001-BD01. Throughout the documentation, the short ID alone (e.g. BD01) is used when the benchmark case is unambiguous, in order to improve readability. The full identifier (FB001-BD01) is used whenever the case context must be explicit, such as when comparing benchmarks across different cases.
Each file hash has been performed using firebench.standardize.calculate_sha256.
Collection of forecasts or reanalysis is authorized for the benchmark period (e.g., for fire perimeters) but has to be detailed in the model report attached to the Report sent back to the FireBench team for collection and validation of results.

Acknowledgment

We gratefully acknowledge Synoptic for granting permission to redistribute selected weather-station data as part of the FireBench benchmarking framework.
I would like to thank my colleague Muthu K. Selvaraj (WPI) for his help in this project.