In the LLM benchmark, we need to measure energy consumption of the different tasks. For this, we should measure energy consumption on the inference server and associate this data to the different tasks we execute, depending on the running time. It seems hard to do this exactly, we should therefore probably work with average values and try to come up with some estimates of the consumed power per input and output token. We should also compare our measurements to publicly reported performance in particular for parallel requests. When a publication reports a certain number of tokens per second on a certain GPU, we can, based on the maximum power consumption of that GPU, and the tokens per second derive a maximum of the consumed power per token. |