Latency is a performance metric also known as Response time.
Latency (Response Time) is the amount of time take a system to process a request (ie to first response) from the outside or not, remote or not,
In other words, how much time it takes between making a request and receiving the first data requested. That you can implement so: the time calculated just before sending the request to just after the first response has been received.
A request can be:
Network Protocol analysers (such as Wireshark) measure the time when bytes are actually sent/received over the interface.
See also: Algorithm - (Performance|Running Time|Fast)
Can we answer “What was the performance?” by “It took 15 seconds. Performance’s units, “inverse seconds”, can be awkward
Latency measurement is done via a timer metrics
See also: CPU - (CPU|Processor) Time Counter
Service level expectation in percentile:
If you haven’t stated percentiles and a Max, you haven’t specified your requirements
Source: Response Times: The 3 Important Limits
Time response over load, for a system with a uniform response time. Without a lot of variance. Ie 99.7% of the request fall in 3 standards deviations.
hdrhistogram - data structure to capture latency
See also: https://tideways.io/profiler/blog/developing-a-time-series-database-based-on-hdrhistogram