Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge eugeneia/snabb:timeline-raptorjit into Vita #65

Merged
merged 44 commits into from
Mar 8, 2019

Conversation

eugeneia
Copy link
Member

@eugeneia eugeneia commented Dec 7, 2018

This builds on @lukego’s work on the timeline log, a probabilistic flight recorder for Snabb, see: snabbco#849 snabbco#873 snabbco#916 snabbco#973 snabbco#1011 snabbco#1098 snabbco#1112

My current working branch for the timeline is at eugeneia/snabb:timeline-raptorjit, this branch merges this feature into Vita and adds some application specific events (user events).

See #58 for some example plots.

lukego and others added 30 commits November 8, 2016 09:34
This is a very useful instruction for self-benchmarking programs that
want to read the CPU timestamp counter efficiently.

See Intel whitepaper for details:
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
Use 'double' instead of 'uint64_t' for values in the timeline file.

This change is motivated by making timeline files easier to process by
R. In the future we may switch back to uint64_t for the TSC counter
and/or argument values for improved precision. The major_version file
header field can be used to avoid confusion.

The obvious downside to using doubles is that the TSC value will lose
precision as the server uptime increases (the TSC starts at zero and
increases at the base frequency of the CPU e.g. 2GHz.) The impact seems
to be modest though. For example a 2GHz CPU would start rounding TSC
values to the nearest 128 (likely quite acceptable in practice) after
approximately 2 years of operation (2^53 * 128 cycles.)

So - storing the TSC as a double-float is definitely a kludge - but
unlikely to cause any real harm and expedient for the short-term goal of
putting this code to use without getting blocked due to e.g. my lack of
sophisticated as an R hacker.
Resolved conflict in app.lua between adding timeline events and the new
breath topological-sort machinery.
Simplify the code and eliminate unwanted branches from the engine loop
by drawing a random timeline level from a log-uniform distribution that
mathematically favors higher log levels over lower ones.

Plucked log5() out of the air i.e. each log level should be enabled for
5x more breaths than the one below.

Here is how the distribution of log level choice looks in practice using
this algorithm:

    > t = {0,0,0,0,0,0,0,0,0}
    > for i = 1, 1e8 do
         local n = math.max(1,math.ceil(math.log(math.random(5^9))/math.log(5)))
         t[n] = t[n]+1
      end
    > for i,n in ipairs(t) do print(i,n) end
    1       560
    2       2151
    3       10886
    4       55149
    5       273376
    6       1367410
    7       6844261
    8       34228143
    9       171120244

Note: Lua provides only natural logarithm functions but it is easy to
derive other bases from this (google "log change of base formula").
I suspect that it is a misfeature for the timeline to sample the
contents of packets. Do we really want user data potentially appearing
in debug logs? Removed for now.
Cleanup timeline integration in core.app a little along the merge.
This fixes a bug where timeline log level was rerolled between end of breaths
but before before post-breath events, causing sampling to affect the event lag
of the polled_timers event.
Changes the syntax of event specs to

  <level>,<rate>|<eventname>: ...

The previous level digit becomes the event’s "rate" and retains its semantics
with regard to the logging frequency of the specified event. The "stack depth"
of the event is now decoupled as the new, leading level digit and specified
independently. The new level semantics are as follows:

 - level ranges from 0-9 (10 levels in total)
 - 0 is the top most level while 9 in the lowest
 - levels 0-4 are reserved for use by the engine
 - user applications can use levels 5-9 to create hierarchy in their events

Caveat: users should avoid defining events with a higher level and a lower
event rate than an enclosed event if the higher level event is supposed to
serve as a latency anchor for the lower level event.

  RIGHT                  WRONG
  5,3|op_start:          5,2|op_start:
    6,2|op_iter:           6,3|op_iter:
  5,3|op_end:            5,2|op_end:

In the left most WRONG example, the anchor of the op_inter event depends on the
log rate at the time of sampling.
eugeneia added a commit that referenced this pull request Mar 8, 2019
@eugeneia eugeneia mentioned this pull request Mar 8, 2019
@eugeneia eugeneia merged commit c7a2106 into inters:master Mar 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants