Why human forecasting matters: Dissecting sources of TC predictability via multi-modal learning
Preprint
Abstract
For decades, National Hurricane Center (NHC) forecasters have published a textual discussion of every Atlantic tropical cyclone that distills expert reasoning about storm intensity and trajectory, yet operational data-driven cyclone-prediction systems do not consume these reports. We study whether this prose carries intensity-change information beyond what standard numerical inputs provide, using 637 Atlantic storms from 2002 to 2023 paired with NHC discussions, 13 SHIPS environmental predictors, and ERA5 10,m wind fields. Across hybrid, end-to-end neural, and temporal architectures targeting change in maximum sustained wind and minimum sea-level pressure at lead times from 24 to 120 hours, a model trained on the discussion text alone is competitive with a SHIPS-only neural baseline at 24-hour \(\Delta V_{\max}\) on 2022–2023 held-out storms, and adding the discussion to the tabular branch yields a statistically resolved per-observation MAE reduction. Integrated-gradients attribution and input-text perturbations rule out current-intensity numerals, storm-name climatology, and forecaster-outlook prose as the source of the gain, localizing it instead to a compact intensity-trajectory vocabulary (e.g., ‘intensify’, ‘weaken’, ‘eyewall’ and ‘category’). The result quantifies the predictive content of operational forecaster discussions and motivates cyclone-prediction systems that consume both numerical observations and the natural-language analyses that meteorological agencies already produce.
Preprint under review, and will be available soon