Time in nanoseconds is calculated wrong

TLDR. `time_in_nanoseconds` is reported wrongly, we need to implement more advanced algorithm to measure that.

Full version.

I wanted to check the influence of the hyperion framework on the measured data. In order to do that I've used the same technique as @Shimuuar did for criterion. For criterion result told that the statistics measurements during run have very high impact on results and solution was to write temporary data to file and analyze that later, so I wanted to see if hyperion is affected.

Idea is to have a large number of the very simple measurements:

```haskell
#!/usr/bin/env stack
-- stack --no-nix-pure runghc --package hyperion

{-# LANGUAGE OverloadedLists #-}

module Main where

import Hyperion.Benchmark
import Hyperion.Main
import Hyperion.Run

import Data.Monoid
import Text.Printf

benchmarks :: [Benchmark]
benchmarks =
  [ bench (printf "%03i" n) $ nf exp (0.73::Double) | n <- [1..999::Int]]

main :: IO ()
main = -- defaultMainWith defaultConfig{configMonoidSamplingStrategy=First $ Just $ fixed 1} "hyperion-example-micro-benchmarks" benchmarks
  defaultMain "hyperion-example-micro-benchmarks" benchmarks
```

I expect that when I'll measure that data all benchmarks will show more or less equal results. I used `R` in order to analyze results:
```bash
qnikst@qwork ~/workspace/tweag/hyperion $ ./dist-newstyle/build/hyperion-0.1.0.0/build/hyperion-sanity-check/hyperion-sanity-check --flat output.flat.json
```
```R
library(jsonlite);
dat <- fromJSON("output.flatten.json");
> summary(dat$time_in_nanos)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  160.1   336.8   501.0   524.8   662.5 12990.0
> mean(dat$time_in_nanos);
[1] 524.8087
> sd(dat$time_in_nanos);
[1] 485.5656
> d <- dat[with(dat, order(bench_name)),]
> plot(d$time_in_nanos)
```
![1](https://user-images.githubusercontent.com/170227/30150203-29b16eea-93b3-11e7-93fc-84d8795310cb.png)

We see that standard deviation is almost equal to mean. Also we see that `time_in_nanos` grows with test index. Up to this point we may think that measurements affects experiment in undesirable way. As I told above criterion had that problem, I've tried to modify hyperion a bit and use `Storable` vectors instead of `Unboxed` to move them to non-haskell heap, and actually this helped a bit. Measurements no longer have that trend:

![1](https://user-images.githubusercontent.com/170227/30150385-db53785a-93b3-11e7-8553-6f824de000bd.png)

but what we can tell from the graph - we still have very large deviation in measurements and that ring us a bell. 
In addition even if we use `Storable` vectors and run program with `--pure` we still have increasing `time_in_nanos` trend over an experiment number. So that solution is not full.

Happily hyperion does provide raw measurements data and I tried to analyze that myself. Dropping out all iterations and surprises that I had, I finished with the following code. 

```bash
qnikst@qwork ~/workspace/tweag/hyperion $ ./dist-newstyle/build/hyperion-0.1.0.0/build/hyperion-sanity-check/hyperion-sanity-check --raw  --flat output.flat.json
```
Because like criterion hyperion runs each measurement in a number of batches, we could build a linear model, `Time = value + batchSize*t` - this way we can rule out an experiment costs from the results. See https://blog.janestreet.com/core_bench-micro-benchmarking-for-ocaml/ for more details.
```R
dat <- fromJSON("output.json");  # read data
dat <- dat$results # ignore metadata
dat <- dat[,c("bench_name","measurements")] # select only interesting fields
dat <- dat[with(dat, order(bench_name)),] # order data

result <- apply(dat, 1, function(x) lm(formula=duration ~ batchSize, data=x$measurements)$coefficients["batchSize"]) # calculate linear model for each experiment
r <- result[order(names(result)) # reorder data so it's reported in the same order as it runs
plot(r)
```
![2](https://user-images.githubusercontent.com/170227/30150800-7a8ed9ae-93b5-11e7-9a3b-e108199fd7b3.png)
Ignoring outlines those will be the matter of further analysis we see a flat line! After removing an outliners we see:
```
> mean(cleared)
[1] 115.1444
> sd(cleared)
[1] 24.45192
```
Those are perfect results! This means that actually hyperion does it's job right and correct analysis over the data leads to a correct results.

Actually what was the problem? After doing this I went to the sources and saw the following:

https://github.com/tweag/hyperion/blob/ccdccfa48a0e290edc7ca4e76cde0bf9c87b222e/src/Hyperion/Analysis.hs#L49-L66

instead of building a linear regression, we just summarize all measurement times and divide them by number of iterations! As a result measurement effect is not erased. I understand the purpose of that, such analysis works for all strategies, both fixed and incremental batches, however as analysis shows it's unreliable.


	, _reportTimeInNanos =
	totalDuration / trueNumIterations
	, _reportCycles = Nothing
	, _reportAlloc = Nothing
	, _reportGarbageCollections = Nothing
	, _reportMeasurements = Just samp
	}
	where
	totalDuration =
	ala
	Sum
	(foldMapOf (measurements.each.duration.to realToFrac))
	samp
	trueNumIterations =
	ala
	Sum
	(foldMapOf (measurements.each.batchSize.to realToFrac))
	samp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Time in nanoseconds is calculated wrong #40

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time in nanoseconds is calculated wrong #40

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions