Introducing ARFBench: A time series question-answering benchmark based on real incidents
The authors introduce ARFBench, a new benchmark designed to evaluate AI models' ability to perform time series question-answering (TSQA) using real-world Datadog incident data. Whi…