It should be possible to update benchmark results if there is, e.g., a new or changed questions or a new model without recomputing everything.