Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare the VM to profile itself with times and frequencies for all instructions - output in both Scheme and CSV formats. #530

Merged
merged 9 commits into from
Jul 20, 2024

Conversation

jpellegrini
Copy link
Contributor

@jpellegrini jpellegrini commented Apr 5, 2023

This recovers an old profiling functionality that used to be in vm.c. The previous version would count the frequencies of
instructions. The new code also measures the time taken by each instruction.

Instructions for using the profiler are added to vm.adoc. It is possible to dump the data to a CSV file, open in a spreadsheet and sort by total time taken by each instruction, or by average time per instruction...

CAVEAT: This only works for the USE_COMPUTED_GOTO compilation. For some reason, the standard code (without computed gotos) does not seem to be working (doesn't compile), even without this patch. See PR #528

A description follows.

The VM will, at the end of each iteration, check the instruction that was just executed and update:

  • The number of times that this instruction was executed
  • The number of times that this instruction was executed after the previous one (so it's possible to tell what pairs of instructions are more common)
  • The time taken to execute this instruction

This is done in the tick() C function (which is compiled conditionally on STAT_VM):

void tick(STk_instr b) {
  couple_instr[previous_op][b]++;
  cpt_inst[b]++;
  previous_op = b;

  current_time = clock();
  if (previous_time > 0)
      time_inst[b] += ((double)(current_time - previous_time)) / CLOCKS_PER_SEC;
  previous_time = current_time;
}

Two Scheme primitives are then available:

  • (%vm-dump-stats fname format) will dump the statistics to a file whose name is fname. This is NOT a Scheme port, but a Scheme string. The file will be open using plain fopen. When the format argument is the keyword :csv, then the output is in CSV format; otherwise, there will be one single S-expression in the file, readable from Scheme. This S-expression is an alist, documented in the file itself (see below).
  • (%vm-reset-stats) will reset all counters.

Here's the header from the Scheme version of a statistics dump:

;; STklos VM statistics. It can be read in Scheme, and it represents one single
;; object: an alist with the names of instructions, and each CDR is a list containing:
;;
;; * count (the number of times this instruction was executed)
;; * time (the total time the program spent on this instruction)
;; * avg time (the average number of time spent on each execution of this instruction)
;; * an alist containing, for each OTHER instruction, the number of times they appeared\n"
;;   together in the code.
;;
;; ( (INS1 count1 time1 avgtime1 ( (INS1 . count1) (INS2 . count2) ... (INSn  .countn)))
;;   (INS2 count2 time2 avgtime2 ( (INS1 . count1) (INS2 . count2) ... (INSn  .countn)))
;;   ...
;;   (INS2 countk timek avgtimek ( (INS1 . count1) (INS2 . count2) ... (INSn  .countn))) )

The C code for printing the instructions is in the functions dump_couple_instr_csv and dump_couple_instr_scm.

@jpellegrini
Copy link
Contributor Author

jpellegrini commented Apr 5, 2023

I added a profiling macro...

(%with-profile-data "file-name.csv" :csv
  (display a)
  (newline)
  (values 1 2 3))

This will write the profiling data to file-name.csv, and will return the same values that the code would (1, 2, 3).
The expansion is

(receive
  result-4
  (begin (%vm-reset-stats)
         (display a)
         (newline)
         (values 1 2 3))
  (begin (%vm-dump-stats "file-name.csv" #:csv)
         (apply values result-4)))

Also, since this macro needs to know if the stats code is available, we also include a primitive, (%vm-has-stats).

When the stats code is not compiled in, the macro just expands to (begin ...code...).

It is still marked with %, but I suppose this could be useful for the general user in the future?

@jpellegrini
Copy link
Contributor Author

And just for completness...

  • (%vm-collecting-stats?) returns true when the system is collecting statistics
    (that is, whenthe variable collect_stats is one), and false otherwise.
  • (%vm-collect-stats-start) and (%vm-collect-stats-stop) wil turn collecting on
    and off, respectively.

Collecting statistics is off by default, especially because compiling STklos is very slow with statistics gathering. It should be turned on before profiling.

The status of "collecting" or "not collecting" reflects what happens internally in the statistics gathering code, when it is compiled in. If STklos was compiled without STAT_VM, then those procedures are just not available at all.

@jpellegrini
Copy link
Contributor Author

It only measures the times for VM instructions. However, for example...I have a benchmark in which UGREF_INVOKE takes a lot of time. But it's not, itself, slow. It happens to send the control flow to FUNCALL, which I think is the slow part of it. So maybe we could also measure, besides instructions, some specific parts of the VM (like FUNCALL?)

@jpellegrini
Copy link
Contributor Author

I have written tick() as a nested function, but that's a GNU extension. I'll fix it

@jpellegrini
Copy link
Contributor Author

@egallesio I think I have the statistics collection working (for the computed goto version), but I have one question...

These are declared static in vm.c:

static int couple_instr[NB_VM_INSTR][NB_VM_INSTR];
static int cpt_inst[NB_VM_INSTR];
static double time_inst[NB_VM_INSTR];

but then, wouldn't several threads clobber the data being collected by each other? (For example, in the couple_instr matrix one thread would overwrite the previous instruction from the other, and the data wouldn't correspond to wha tactually happens) -- right? Should we move these to inside run_vm()? Or something else?

@jpellegrini
Copy link
Contributor Author

Hi @egallesio !
If #561 is merged, I'll update this one (which will be necessary, I think).

@jpellegrini
Copy link
Contributor Author

I'm making adjustments to this one, as it would currently do the wrong thing, depending on the use of computed gotos.

@jpellegrini
Copy link
Contributor Author

OK @egallesio I think this one is ready now! :)

This recovers an old profiling functionality that used to be in
`vm.c`. The previous version would count the frequencies of
instructions. The new code also measures the time taken by
each instruction.

A description follows.

The VM will, at the end of each iteration, check the instruction that
was just executed and update:

* The number of times that this instruction was executed
* The number of times that this instruction was executed after the previous one
  (so it's possible to tell what pairs of instructions are more common)
* The time taken to execute this instruction

This is done in the `tick()` C function (which is compiled conditionally on `STAT_VM`):

```
void tick(STk_instr b) {
  couple_instr[previous_op][b]++;
  cpt_inst[b]++;
  previous_op = b;

  current_time = clock();
  if (previous_time > 0)
      time_inst[b] += ((double)(current_time - previous_time)) / CLOCKS_PER_SEC;
  previous_time = current_time;
}
```

Two Scheme primitives are then available:

* `(%vm-dump-stats fname format)` will dump the statistics to a file
  whose name is `fname`. This is *NOT* a Scheme port, but a Scheme
  string. The file will be open using plain `fopen`. When the `format`
  argument is the keyword `:csv`, then the output is in CSV format;
  otherwise, there will be one single S-expression in the file,
  readable from Scheme. This S-expression is an alist, documented in
  the file itself (see below).
* `%vm-reset-stats` will reset all counters.

The documentation for the Scheme format for the dumped instructions is
shown below.

```
;; STklos VM statistics. It can be read in Scheme, and it represents one single
;; object: an alist with the names of instructions, and each CDR is a list containing:
;;
;; * count (the number of times this instruction was executed)
;; * time (the total time the program spent on this instruction)
;; * avg time (the average number of time spent on each execution of this instruction)
;; * an alist containing, for each OTHER instruction, the number of times they appeared\n"
;;   together in the code.
;;
;; ( (INS1 count1 time1 avgtime1 ( (INS1 . count1) (INS2 . count2) ... (INSn  .countn)))
;;   (INS2 count2 time2 avgtime2 ( (INS1 . count1) (INS2 . count2) ... (INSn  .countn)))
;;   ...
;;   (INS2 countk timek avgtimek ( (INS1 . count1) (INS2 . count2) ... (INSn  .countn))) )
```

The C code for printing the instructions is in the functions `dump_couple_instr_csv` and
`dump_couple_instr_scm`.
`(%with-profile-data filename [ format ] body)`
will profile a specific piece of code. It returns
the same values that the code would return.

Also, since this macro needs to know if the stats code is
available, we also include a primitive, `(%vm-has-stats)`.
* `(%vm-collecting-stats?)` returns true when the system is collecting
  statistics (that is, whenthe variable `collect_stats` is one), and
  false otherwise.
* `(%vm-collect-stats-start)` and `(%vm-collect-stats-stop)` wil turn
  collecting on and off, respectively.
by excluding the collecting code itself from the timing.
And some other adjustments.
@egallesio egallesio merged commit bff475f into egallesio:master Jul 20, 2024
@egallesio
Copy link
Owner

After 15 months, I have finally merged this great PR 😄

I have applied small modifications (replaced the 3 functions around getting and setting the stats or not) by a unique parameter. I have also suppressed the primitive (%vm-has-stats) since it can be easily determined from the new version of (%vm-config) in PR #560

Finally, macro %with-profile-data is only defined when STklos is compiled with the STAT_VM flag set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants