-
Notifications
You must be signed in to change notification settings - Fork 36
/
arbtt.xml
2326 lines (2183 loc) · 94.5 KB
/
arbtt.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook EBNF Module V1.1CR1//EN"
"http://www.oasis-open.org/docbook/xml/ebnf/1.1CR1/dbebnf.dtd">
<!--
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" []>
-->
<article>
<articleinfo>
<title>arbtt – The Automatic Rule-Base Time Tracker</title>
<authorgroup>
<author>
<firstname>Joachim</firstname>
<surname>Breitner</surname>
<contrib>Main author of arbtt</contrib>
<email>mail@joachim-breitner.de</email>
</author>
<author id="sergey">
<firstname>Sergey</firstname>
<surname>Astanin</surname>
<contrib>Contributor</contrib>
<email>s.astanin@gmail.com</email>
</author>
<author id="martin">
<firstname>Martin</firstname>
<surname>Kiefel</surname>
<contrib>Contributor</contrib>
<email>mk@nopw.de</email>
</author>
<author id="muharem">
<firstname>Muharem</firstname>
<surname>Hrnjadovic</surname>
<contrib>Contributor</contrib>
<email>muharem@linux.com</email>
</author>
<author id="hauck">
<firstname>Markus</firstname>
<surname>Hauck</surname>
<contrib>Contributor</contrib>
<email>markus1189@gmail.com</email>
</author>
<author id="thomasz">
<firstname>Thomasz</firstname>
<surname>Miąsko</surname>
<contrib>Contributor</contrib>
<email>tomasz.miasko@gmail.com</email>
</author>
<author id="waldir">
<firstname>Waldir</firstname>
<surname>Pimenta</surname>
<contrib>Documentation writer</contrib>
<email>waldyrious@gmail.com</email>
</author>
<author id="gwern">
<firstname>Gwern</firstname>
<surname>Branwen</surname>
<contrib>Documentation writer</contrib>
<email>gwern@gwern.net</email>
</author>
<author id="paolo">
<firstname>Paolo G.</firstname>
<surname>Giarrusso</surname>
<contrib>Contributor</contrib>
<email>p.giarrusso@gmail.com</email>
</author>
<author id="michal">
<firstname>Michal J.</firstname>
<surname>Gajda</surname>
<contrib>Contributor</contrib>
<email>migamake@migamake.com</email>
</author>
</authorgroup>
</articleinfo>
<abstract>
<para>
arbtt is a background daemon that stores which windows are open, which one
has the focus and how long since your last action (and possibly more sources
later), and stores this. It is also a program that will, based on
expressive rules you specify, derive what you were doing, and what for.
</para>
<para>
It is comparable to the window trackers <ulink
url="https://www.rescuetime.com/">RescueTime</ulink>, <ulink
url="https://github.com/gurgeh/selfspy">selfspy</ulink>, <ulink
url="http://www.timesnapper.com/">TimeSnapper</ulink>, and
<ulink url="https://etopian.com/software/automatic-screenshots-windows-mac-linux/">
Productive Peach</ulink>; but it differs from the manual timetrackers like <ulink
url="http://projecthamster.wordpress.com/about/">Project Hamster</ulink> which require
the user to type a description of their activities.
<warning><para>The log file might contain very sensitive private data. Make sure
you understand the consequences of a full-time logger and be careful with this
data.</para></warning>
</para>
</abstract>
<sect1 id="installation">
<title>Installation</title>
<sect2>
<title>Obtaining arbtt</title>
<para>
See the project website <ulink url="https://arbtt.nomeata.de/#install">for installation instructions</ulink>.
</para>
</sect2>
<sect2>
<title>Setting up the capture program</title>
<para>
To have arbtt gather useful data, you need to make sure that
<command>arbtt-capture</command> is started with your X session. If you use
GNOME or KDE, you can copy the file
<filename>arbtt-capture.desktop</filename> to
<filename>~/.config/autostart/</filename>. You might need to put the full
path to <command>arbtt-capture</command> in the Exec line there, if you did
not do a system wide installation.
</para>
<para>
By default, <command>arbtt-capture</command> will save one data sample per
minute. If you want to change that, you can pass <option>--sample-rate
<replaceable>RATE</replaceable></option> to <command>arbtt-capture</command>, where
<replaceable>RATE</replaceable> specifies the sample rate in seconds.
</para>
</sect2>
<sect2>
<title>Building the documentation</title>
<para>
Obviously, you can already read the documentation. If you still want to
build it yourself, enter the directory <filename>doc/</filename> and run
<command>make</command> for the documentation in HTML and PDF format.
</para>
</sect2>
<sect2>
<title>Development version</title>
<para>
If you want to try the latest unreleased state of the code, or want to
contribute to arbtt, you can fetch the code with
<screen>darcs get <ulink url="http://darcs.nomeata.de/arbtt"/></screen>
</para>
</sect2>
</sect1>
<sect1 id="configuration">
<title>Configuring the arbtt categorizer (<command>arbtt-stats</command>)</title>
<para>
Once <command>arbtt-capture</command> is running, it will record data without
any configuration. And only to analyze the recorded data, one needs to
configure the categorizer. Everytime the categorizer
(<command>arbtt-stats</command>) runs, it applies categorization rules to all
recorded data and tags it accordingly. Thus, if you improve your
categorization rules later, they will apply also to all previous data
samples!
</para>
<sect2>
<title>Configuration example</title>
<para>
The configuration file needs to be placed in
<filename>~/.arbtt/categorize.cfg</filename>. An
example is included in the source distribution, and it is reproduced here:
see <xref linkend="catex"/>.
It should be more enlightening than a formal description.
</para>
<example id="catex">
<title><filename>categorize.cfg</filename></title>
<programlisting><xi:include href="../categorize.cfg" parse="text"
xmlns:xi="http://www.w3.org/2001/XInclude" /></programlisting>
</example>
</sect2>
<sect2>
<title>The semantics (informal)</title>
<para>
A data sample consists of the time of recording, the time passed since the
user’s last action, the name of the current workspace and the list of
windows. For each window this information is available:
<itemizedlist>
<listitem><simpara>the window title</simpara></listitem>
<listitem><simpara>the program name</simpara></listitem>
<listitem><simpara>desktop (workspace) the window belongs to</simpara></listitem>
<listitem><simpara>whether the window was the active window</simpara></listitem>
<listitem><simpara>whether the window is hidden (minimized, on another desktop, unmapped, …)</simpara></listitem>
</itemizedlist>
Based on this information and on the rules in
<filename>categorize.cfg</filename>, the categorizer
(<command>arbtt-stats</command>) assigns <emphasis>tags</emphasis> to
each sample.
</para>
<para>
A simple rule consists of a condition followed by an arrow
(<literal>==></literal>) and a tag expression
(<literal>tag</literal> keyword followed by tag name).
The rule ends with a coma (<literal>,</literal>).
</para>
<para>
The keyword <literal>tag</literal>, usually preceded with a condition,
assigns a <emphasis>tag</emphasis> to the sample; <literal>tag</literal>
keyword is followed by a tag name (any sequence of alphanumeric symbols,
underscores and hyphens). If tag name contains a colon
(<literal>:</literal>), the first part of the name before the colon, is
considered to be tag <emphasis>category</emphasis>.
</para>
<para>
For example, this rule
<programlisting>month $date == 1 ==> tag month:January,</programlisting>
if it succeeds, assigns a the tag <literal>January</literal> in the
category <literal>month</literal>.
</para>
<para>If the tag has a <emphasis>category</emphasis>, it will only be
assigned if no other tag of that category has been assigned. This means
that for each sample and each category, there can be at most only one tag
in that category. Tags can contain references to group matches in the
regular expressions used in conditions (<literal>$1</literal>,
<literal>$2</literal>)...). Tags can also reference some
variables such as window title (<literal>$current.title</literal>) or
program name (<literal>$current.program</literal>).
</para>
<para>
The variable <literal>$idle</literal> contains the idle time of the user,
measured in seconds. Usually, it is used to assign the tag
<literal>inactive</literal>, which is handled specially by
<command>arbtt-stats</command>, as can be seen in <xref linkend="catex"/>.
</para>
<para>
When applying the rules, the categorizer has a notion of
the <emphasis>window in scope</emphasis>, and the variables
<literal>$title</literal>, <literal>$program</literal>,
<literal>$wdesktop</literal>, <literal>$active</literal> and
<literal>$hidden</literal> always refer to the window in scope.
By default, there is no window is in scope. Condition should be prefixed
with either <literal>current window</literal> or <literal>any
window</literal>, to define scope of these variables.
</para>
<para>
The name of the current desktop (or workspace) is available as
<literal>$desktop</literal>.
</para>
<para>
For <literal>current window</literal>, the currently active window is in
scope. If there is no such window, the condition is false.
</para>
<para>
For <literal>any window</literal>, the condition is applied to each
window, in turn, and if any of the windows matches, the result is true. If
more than one window matches it is not defined from which match the
variables <literal>$1</literal>... are taken from (see more about regular
expressions below).
</para>
<para>
The variable <literal>$time</literal> refers to the time-of-day of the
sample (i.e. the time since 00:00 that day, local time), while
<literal>$sampleage</literal> refers to the
time span from when the sample was recored until now, the time of
evaluating the statistics. The latter variable is especially useful when
passed to the <option>--filter</option> option of
<command>arbtt-stats</command>. They can be compared with expressions
like "hh:mm", for example
<programlisting>$time >= 8:00 && $time < 12:00 ==> tag time-of-day:morning</programlisting>
</para>
<para>
The variable <literal>$date</literal> refers to the date and time of the
recorded sample. It can be compared with date literals in the form
YYYY-MM-DD (which stand for midnight, so <programlisting>$date ==
2001-01-01</programlisting> will not do what you want, but
<programlisting>$date >= 2001-01-01 && $date <= 2001-01-02</programlisting>
would). All dates are evaluated in local time.
</para>
<para>
Expression <literal>format $date</literal> evaluates to a string with
a date formatted according to ISO 8601, i.e. like "YYYY-MM-DD". The 19th
of March 2010 is formatted as "2010-03-19". Formatted date can be compared
to strings. Formatted dates may be useful to tag particular date ranges. But
also note that this is a rather expensive operation that can slow down your
data processing.
</para>
<para>
Expression <literal>month $date</literal> evaluates to an integer, from 1
to 12, corresponding to the month number. Expression <literal>year
$date</literal> evaluates to an integer which is a year number.
Expression <literal>day of month $date</literal> evaluates to an integer,
from 1 to 31, corresponding to the day of month.
Expression <literal>day of week $date</literal> evaluates to an integer,
from 1 to 7, corresponding to the day of week, Monday is 1, Sunday is 7.
These expressions can be compared to integers.
Expression <literal>week of year $date</literal> evaluates to an integer,
from 0 to 53, corresponding to the week of year. January 1 falls in week 0.
These expressions are integers, and can be combined and compared as such.
</para>
<para>
Expressions can be compared to literal values with <literal>==</literal>
(equal), <literal>/=</literal> (not equal), <literal><</literal>,
<literal><=</literal>, <literal>>=</literal>,
<literal>></literal> operators. String expressions
(<literal>$program</literal>, <literal>$title</literal>) can be matched
against regular expressions with <literal>=~</literal> operator. With these
operators, the right hand side can be a comma-separated list of
literals enclosed in square brackets (<literal>[</literal>
<emphasis>...</emphasis>, <emphasis>...</emphasis>, <literal>]</literal>), which
succeeds if any of them succeeds.
</para>
<para>
Integer expressions can be combined via <literal>+</literal>
(addition), <literal>-</literal> (subtraction), <literal>*</literal> (multiplication),
operators.
</para>
<para>Regular expressions are written either between slashes
(<literal>/</literal> regular expression <literal>/</literal>),
or after a letter <literal>m</literal> followed by any symbol
(<literal>m</literal> <emphasis>c</emphasis> regular expression <emphasis>c</emphasis>, where <emphasis>c</emphasis> is any symbol).
The second appearance of that symbol ends the expression.
You can find both variants in <xref linkend="catex"/>.
</para>
<para>Complex conditions may be constructed from the simpler ones,
using Boolean AND (<literal>&&</literal>), OR
(<literal>||</literal>), and NOT (<literal>!</literal>) functions and
parentheses.
</para>
<para>
You can define short-hand names for conditions using
<literal>condition</literal>:
<programlisting>
condition arbtt = current window $title =~ m/arbtt/ in {
$arbtt && $time < 14:00 ==> tag arbtt-morning,
$arbtt && $time > 14:00 ==> tag arbtt-afternoon
}
</programlisting>
Everything that is a valid condition is assignable and you can
reference bound variables in rules by prefixing them with a
dollar (<literal>$</literal>).
</para>
</sect2>
<sect2>
<title>The syntax</title>
<para>
<filename>categorize.cfg</filename> is a plain text file.
Whitespace is insignificant and Haskell-style comments are allowed.
A formal grammar is provided in <xref linkend="grammar"/>.
</para>
<figure id="grammar">
<title>The formal grammar of <filename>categorize.cfg</filename></title>
<productionset>
<production id="g-rules">
<lhs>Rules</lhs>
<rhs>
[ <nonterminal def="#g-aliasspec"/> ]
<nonterminal def="#g-rule"/> ( (<quote>,</quote>
<nonterminal def="#g-rule"/>)* | ( <quote>;</quote>
<nonterminal def="#g-rule"/>)* )
</rhs>
</production>
<production id="g-aliasspec">
<lhs>AliasSpec</lhs>
<rhs><quote>aliases</quote> <quote>(</quote> <nonterminal
def="#g-alias"/> (<quote>,</quote> <nonterminal def="#g-alias"/>)*
<quote>)</quote> </rhs>
</production>
<production id="g-alias">
<lhs>Alias</lhs>
<rhs>Literal <quote>-></quote> Literal</rhs>
</production>
<production id="g-rule">
<lhs>Rule</lhs>
<rhs><quote>{</quote> <nonterminal def="#g-rules"/>
<quote>}</quote>
</rhs>
<rhs>
<nonterminal def="#g-cond"/> <quote>==></quote>
<nonterminal def="#g-rule"/> | <quote>if</quote>
<nonterminal def="#g-cond"/> <quote>then</quote>
<nonterminal def="#g-rule"/> <quote>else</quote>
<nonterminal def="#g-rule"/>
</rhs>
<rhs>
<quote>tag</quote> <nonterminal def="#g-tag"/>
</rhs>
<rhs><nonterminal def="#g-condition"/></rhs>
</production>
<production id="g-cond">
<lhs>Cond</lhs>
<rhs><quote>(</quote> <nonterminal def="#g-cond"/>
<quote>)</quote>
</rhs>
<rhs><quote>!</quote> <nonterminal def="#g-cond"/> |
<nonterminal def="#g-cond"/> <quote>&&</quote>
<nonterminal def="#g-cond"/> |
<nonterminal def="#g-cond"/> <quote>||</quote> <nonterminal
def="#g-cond"/>
</rhs>
<rhs> <quote>$active</quote> </rhs>
<rhs> <quote>$hidden</quote> </rhs>
<rhs> <nonterminal def="#g-string"/> <nonterminal def="#g-cmpop"/>
<nonterminal def="#g-string"/> </rhs>
<rhs> <nonterminal def="#g-string"/> <nonterminal def="#g-cmpop"/>
<quote>[</quote> <nonterminal def="#g-listofstring"/>
<quote>]</quote>
</rhs>
<rhs> <nonterminal def="#g-string"/> <quote>=~</quote>
<nonterminal def="#g-regex"/></rhs>
<rhs> <nonterminal def="#g-string"/> <quote>=~</quote>
<quote>[</quote> <nonterminal def="#g-listofregex"/>
<quote>]</quote>
</rhs>
<rhs> <nonterminal def="#g-number"/> <nonterminal def="#g-cmpop"/>
<nonterminal def="#g-number"/> </rhs>
<rhs> <nonterminal def="#g-timediff"/> <nonterminal def="#g-cmpop"/>
<nonterminal def="#g-timediff"/> </rhs>
<rhs> <nonterminal def="#g-date"/> <nonterminal def="#g-cmpop"/>
<nonterminal def="#g-date"/> </rhs>
<rhs> <quote>current window</quote> <nonterminal def="#g-cond"/> </rhs>
<rhs> <quote>any window</quote> <nonterminal def="#g-cond"/> </rhs>
<rhs> <quote>$</quote> Literal </rhs>
</production>
<production id="g-string">
<lhs>String</lhs>
<rhs> <quote>$title</quote> </rhs>
<rhs> <quote>$program</quote> </rhs>
<rhs> <quote>$wdesktop</quote> </rhs>
<rhs> <quote>$desktop</quote> </rhs>
<rhs> <quote>format</quote> <nonterminal def="#g-date" /> </rhs>
<rhs> <quote>"</quote> string literal <quote>"</quote> </rhs>
</production>
<production id="g-listofstring">
<lhs>
ListOfString</lhs>
<rhs> <quote>"</quote> string literal <quote>"</quote> </rhs>
<rhs> <quote>"</quote> string literal <quote>"</quote> , <nonterminal def="#g-listofstring"/> </rhs>
</production>
<production id="g-number">
<lhs>Number</lhs>
<rhs> <quote>$idle</quote> </rhs>
<rhs> <quote>day of week</quote> <nonterminal def="#g-date" /> </rhs>
<rhs> <quote>day of month</quote> <nonterminal def="#g-date" /> </rhs>
<rhs> <quote>week of year</quote> <nonterminal def="#g-date" /> </rhs>
<rhs> <quote>month</quote> <nonterminal def="#g-date" /> </rhs>
<rhs> <quote>year</quote> <nonterminal def="#g-date" /> </rhs>
<rhs> <nonterminal def="#g-number"/> <nonterminal def="#g-mathop"/>
<nonterminal def="#g-number"/> </rhs>
<rhs> number literal </rhs>
</production>
<production id="g-date">
<lhs>Date</lhs>
<rhs> <quote>$date</quote> </rhs>
<rhs> <quote>$now</quote> </rhs>
</production>
<production id="g-timediff">
<lhs>TimeDiff</lhs>
<rhs> <quote>$time</quote> </rhs>
<rhs> <quote>$sampleage</quote> </rhs>
<!-- <rhs> <nonterminal def="#g-date"/> <quote>-</quote> <nonterminal def="#g-date"/></rhs> -->
<rhs>( Digit )* Digit <quote>:</quote> Digit Digit</rhs>
</production>
<production id="g-tag">
<lhs>Tag</lhs>
<rhs> [ Literal <quote>:</quote> ] Literal </rhs>
</production>
<production id="g-regex">
<lhs>RegEx</lhs>
<rhs> <quote>/</quote> Literal <quote>/</quote> |
<quote>m</quote> <replaceable>c</replaceable> Literal
<replaceable>c</replaceable><lineannotation>Where
<replaceable>c</replaceable> can be any
character.</lineannotation> </rhs>
</production>
<production id="g-listofregex">
<lhs>ListOfRegex</lhs>
<rhs> <quote>"</quote> <nonterminal def="#g-regex"/> <quote>"</quote> </rhs>
<rhs> <quote>"</quote> <nonterminal def="#g-regex"/> <quote>"</quote> , <nonterminal def="#g-listofregex"/> </rhs>
</production>
<production id="g-cmpop">
<lhs>CmpOp</lhs>
<rhs><quote><=</quote> | <quote><</quote> |
<quote>==</quote> | <quote>!=</quote>
| <quote>></quote> | <quote>>=</quote></rhs>
</production>
<production id="g-mathop">
<lhs>MathOp</lhs>
<rhs><quote>+</quote> | <quote>-</quote> |
<quote>*</quote></rhs>
</production>
<production id="g-condition">
<lhs>ConditionBinding</lhs>
<rhs><quote>condition</quote> Literal <quote>=</quote> <nonterminal def="#g-cond"/> <quote>in</quote> <nonterminal def="#g-rule"/></rhs>
</production>
</productionset>
</figure>
<para>
A <literal>String</literal> refers to a double-quoted string of
characters, while a <literal>Literal</literal> is not quoted.
<nonterminal def="#g-tag">Tags</nonterminal> may only consist of
letters, dashes and underscores, or variable interpolations. A Tag maybe
be optionally prepended with a category, separated by a colon. The
category itself follows he same lexical rules as the tag. A variable
interpolation can be one of the following:
<variablelist>
<varlistentry>
<term><literal>$1</literal>, <literal>$2</literal>,...</term>
<listitem><simpara> will be replaced by the respective group in the last
successfully applied regular expression in the conditions enclosing the
current rule.
</simpara></listitem>
</varlistentry>
<varlistentry>
<term><literal>$current.title</literal></term>
<term><literal>$current.program</literal></term>
<listitem><simpara> will be replaced by title the currently active
window, resp. by the name of the currently active program.
If no window happens to be active, this tag will be ignored.
</simpara></listitem>
</varlistentry>
</variablelist>
</para>
<para>
A regular expression is, like in perl, either enclosed in forward
slashes or, alternatively, in any character of your choice with an
<literal>m</literal> (for <quote>match</quote>) in front. This is handy if you need
to use regular expressions that match directory names. Otherwise, the
syntax of the regular expressions is that of perl-compatible regular
expressions.
</para>
</sect2>
</sect1>
<sect1 id="effective-use">
<title>Effective Use of Arbtt</title>
<para>
Now that the syntax has been described and the toolbox laid out,
how do you practically go about using and configuring arbtt?
</para>
<sect2>
<title>Enabling data collection</title>
<para>
After installing arbtt, you need to configure it to run. There
are many ways you can run the <literal>arbtt-capture</literal>
daemon. One standard way is to include the command
<programlisting>
arbtt-capture &
</programlisting>
in your desktop environments startup script, e.g.
<filename>~/.xinitrc</filename> or similar.
</para>
<para>
Another trick is add it as a <ulink
url="https://en.wikipedia.org/wiki/Cron"><literal>cron</literal></ulink>
job. To do so, edit your crontab file (<literal>crontab -e</literal>) and
add a line like this:
</para>
<programlisting>
DISPLAY=:0
@reboot arbtt-capture --logfile=/home/username/doc/arbtt/capture.log
</programlisting>
<para>
At boot, <literal>arbtt-capture</literal> will be run in the
background and will capture a snapshot of the X metadata for
active windows every 60 seconds (the default). If you want more
fine-grained time data at the expense of doubling storage use,
you could increase the sampling rate with an option like
<literal>--sample-rate=30</literal>. To be resilient to any errors
or segfaults, you could also wrap it in an infinite loop to restart
the daemon should it ever crash, with a command like
</para>
<programlisting>
DISPLAY=:0
@reboot while true; do arbtt-capture --sample-rate=30; sleep 1m; done
</programlisting>
</sect2>
<sect2>
<title>Checking data availability</title>
<para>
arbtt tracks <ulink url="https://en.wikipedia.org/wiki/X_Window_System_protocols_and_architecture#Attributes_and_properties"
>X</ulink> properties like window title, class, and running
program, and you write rules to classify those strings as
you wish; but this assumes that the necessary data is present in
those properties.
</para>
<para>
For some programs, this is the case. For example, web browsers
like Firefox typically set the X title to the
HTML <literal><title></literal> element of the web page in the
currently-focused tab, which is enough for classification.
</para>
<para>
Some programs have title-setting available as plugins. The IRC client <ulink url="http://www.irssi.org/">irssi</ulink>
in a GNU screen or X terminal usually sets the title to just "<literal>irssi</literal>",
which blocks more accurate time-classification based on IRC channel (one channel may be for
recreation, another for programming, and yet another for work), but can be easily configured
to set the title using the extension
<ulink url="http://scripts.irssi.org/scripts/title.pl"><literal>title.pl</literal></ulink>.
</para>
<para>
Some programs do not set titles or class, and all arbtt sees is
empty strings like <literal>""</literal>; or they may set the title/class
to a constant like <literal>"Liferea"</literal>, which may be acceptable if
that program is used for only one purpose, but if it is used for
many purposes, then you cannot write a rule matching it without
producing highly-misleading time analyses. (For example, a web
browser may be used for countless purposes, ranging from work to
research to music to writing to programming; but if the web
browser's title/class were always just <literal>"Web browser"</literal>,
how would you classify 5 hours spent using the web browser? If the
5 hours are classified as any or all of those purposes, then the
results will be misleading garbage - you probably did not spend 5
hours just listening to music, but a mixture of those purposes,
which changes from day to day.)
</para>
<para>
You should check for such problematic programs upon starting using
arbtt. It would be unfortunate if you were to log for a few
months, go back for a detailed report for some reason, and
discover that the necessary data was never available for
arbtt to log!
</para>
<para>
These programs can sometimes be customized internally, a bug
report filed with the maintainers, or their titles can be
externally set by
<ulink url="https://en.wikipedia.org/wiki/Wmctrl"><literal>wmctrl</literal></ulink>
or
<ulink url="http://jonisalonen.com/2014/setting-x11-window-properties-with-xprop/"><literal>xprop</literal></ulink>.
</para>
<sect3>
<title><literal>xprop</literal></title>
<para>
You can check the X properties of a running window by running
the command
<ulink url="http://www.xfree86.org/current/xprop.1.html"><literal>xprop</literal></ulink>
and clicking on the window; <literal>xprop</literal> will print
out all the relevant X information. For example, the output for
Emacs might look like this
</para>
<programlisting>
$ xprop | tail -5
WM_CLASS(STRING) = "emacs", "Emacs"
WM_ICON_NAME(STRING) = "emacs@elan"
_NET_WM_ICON_NAME(UTF8_STRING) = "emacs@elan"
WM_NAME(STRING) = "emacs@elan"
_NET_WM_NAME(UTF8_STRING) = "emacs@elan"
</programlisting>
<para>
This is not very helpful: it does not tell us the filename being
edited, the mode being used, or anything. You could classify
time spent in Emacs as "programming" or
"writing", but this would be imperfect, especially if
you do both activities regularly. However, Emacs can be
customized by editing <literal>~/.emacs</literal>, and after
some searching with queries like "setting Emacs window
title", the
<ulink url="http://www.emacswiki.org/emacs-en/FrameTitle">Emacs
wiki</ulink> and
<ulink url="https://www.gnu.org/software/emacs/manual/html_node/efaq/Displaying-the-current-file-name-in-the-titlebar.html">manual</ulink>
advise us to put something like this Elisp in our
<literal>.emacs</literal> file:
</para>
<programlisting>
(setq frame-title-format "%f")
</programlisting>
<para>
Now the output looks different:
</para>
<programlisting>
$ xprop | tail -5
WM_CLASS(STRING) = "emacs", "Emacs"
WM_ICON_NAME(STRING) = "/home/gwern/arbtt.page"
_NET_WM_ICON_NAME(UTF8_STRING) = "/home/gwern/arbtt.page"
WM_NAME(STRING) = "/home/gwern/arbtt.page"
_NET_WM_NAME(UTF8_STRING) = "/home/gwern/arbtt.page"
</programlisting>
<para>
With this, we can usefully classify all such time samples as
being “writing”:
</para>
<programlisting>
current window $title == "/home/gwern/arbtt.page" ==> tag Writing,
</programlisting>
<para>
Another common gap is terminals/shells: they often do not
include information in the title like the current working
directory or last shell command. For example, urxvt/Bash:
</para>
<programlisting>
WM_COMMAND(STRING) = { "urxvt" }
_NET_WM_ICON_NAME(UTF8_STRING) = "urxvt"
WM_ICON_NAME(STRING) = "urxvt"
_NET_WM_NAME(UTF8_STRING) = "urxvt"
WM_NAME(STRING) = "urxvt"
</programlisting>
<para>
Programmers may spend many hours in the shell doing a variety of
things (like Emacs), so this is a problem. Fortunately, this is
also solvable by customizing one's <literal>.bashrc</literal> to
set the prompt to emit an escape code interpreted by the
terminal (baroque, but it works). The following will include the
working directory, a timestamp, and the last command:
</para>
<programlisting>
trap 'echo -ne "\033]2;$(pwd); $(history 1 | sed "s/^[ ]*[0-9]*[ ]*//g")\007"' DEBUG
</programlisting>
<para>
Now the urxvt samples are useful:
</para>
<programlisting>
_NET_WM_NAME(UTF8_STRING) = "/home/gwern/wiki; 2014-09-03 13:39:32 arbtt-stats --help"
</programlisting>
<para>
Some distributions (e.g. Debian) already provide the relevant
configuration for this to happen. If it does not work for you, you can try to add
<programlisting>. /etc/profile.d/vte.sh</programlisting>
to your <filename>~/.bashrc</filename>.
</para>
<para>
A rule could classify based on the directory you are working in,
the command one ran, or both. Other shells like zsh can be fixed
this way too but the exact command may differ; you will need to
research and experiment.
</para>
<para>
Some programs can be tricky to set. The
<ulink url="http://feh.finalrewind.org/">X image viewer
feh</ulink> has a <literal>--title</literal> option but it
cannot be set in the configuration file,
<literal>.config/feh/themes</literal>, because it needs to be
specified dynamically; so you need to set up a shell alias or
script to wrap the command like
<literal>feh --title "$(pwd) / %f / %n"</literal>.
</para>
</sect3>
<sect3>
<title>Raw samples</title>
<para>
<literal>xprop</literal> can be tedious to use on every running
window and you may forget to check seldomly used programs. A better
approach is to use <literal>arbtt-stats</literal>’s
<literal>--dump-samples</literal> option: this option will print
out the collected data for specified time periods, allowing you
to examine the X properties en masse. This option can be used
with the <literal>--exclude=</literal>
option to print the samples for <emphasis>samples not matched
by existing rules</emphasis> as well, which is indispensable for
improving coverage and suggesting ideas for new rules. A good
way to figure out what customizations to make is to run arbtt as
a daemon for a day or so, and then begin examining the raw
samples for problems.
</para>
<example>
<title>An initial configuration session</title>
<para>
An example: suppose I create a simple category file named
<literal>foo</literal> with just the line
</para>
<programlisting>
$idle > 30 ==> tag inactive
</programlisting>
<para>
I can then dump all my arbtt samples for the past day with a
command like this:
</para>
<programlisting>
arbtt-stats --categorizefile=foo --m=0 --filter='$sampleage <24:00' --dump-samples
</programlisting>
<para>
Because there are so many open windows, this produces a large
amount (26586 lines) of hard-to-read output:
</para>
<programlisting>
...
( ) Navigator: /r/Touhou's Favorite Arranges! Part 71: Retribution for the Eternal Night ~ Imperishable Night : touhou - Iceweasel
( ) Navigator: Configuring the arbtt categorizer (arbtt-stats) - Iceweasel
( ) evince: ATTACHMENT02
( ) evince: 2009-geisler.pdf — Heart rate variability predicts self-control in goal pursuit
( ) urxvt: /home/gwern; arbtt-stats --categorizefile=foo --m=0 --filter='$sampleage <24:00' --dump-samples
( ) mnemosyne: Mnemosyne
( ) urxvt: /home/gwern; 2014-09-03 13:11:45 xprop
( ) urxvt: /home/gwern; 2014-09-03 13:42:17 history 1 | cut --delimiter=' ' --fields=5-
( ) urxvt: /home/gwern; 2014-09-03 13:12:21 git log -p .emacs
(*) emacs: emacs@elan
( ) urxvt: /home/gwern/blackmarket-mirrors/silkroad2-forums; 2014-08-31 23:20:10 mv /home/gwern/cookies.txt ./; http_proxy="localhost:8118" wget...
( ) urxvt: /home/gwern/blackmarket-mirrors/agora; 2014-08-31 23:15:50 mv /home/gwern/cookies.txt ./; http_proxy="localhost:8118" wget --mirror ...
( ) urxvt: /home/gwern/blackmarket-mirrors/evolution-forums; 2014-08-31 23:04:10 mv ~/cookies.txt ./; http_proxy="localhost:8118" wget --mirror ...
( ) puddletag: puddletag: /home/gwern/music
</programlisting>
<para>
Active windows are denoted by an asterisk, so I can focus &
simplify by adding a pipe like <literal>| fgrep '(*)'</literal>,
producing more manageable output like
</para>
<programlisting>
(*) urxvt: irssi
(*) urxvt: irssi
(*) urxvt: irssi
(*) Navigator: Pyramid of Technology - NextNature.net - Iceweasel
(*) Navigator: Search results - gwern0@gmail.com - Gmail - Iceweasel
(*) Navigator: [New comment] The Wrong Path - gwern0@gmail.com - Gmail - Iceweasel
(*) Navigator: Iceweasel
(*) Navigator: Litecoin Exchange Rate - $4.83 USD - litecoinexchangerate.org - Iceweasel
(*) Navigator: PredictionBook: LiteCoin will trade at >=10 USD per ltc in 2 years, - Iceweasel
(*) urxvt: irssi
(*) Navigator: Bug#691547 closed by Mikhail Gusarov <dottedmag@dottedmag.net> (Re: s3cmd: Man page: --default-mime-type documentation incomplete...)
(*) Navigator: Bug#691547 closed by Mikhail Gusarov <dottedmag@dottedmag.net> (Re: s3cmd: Man page: --default-mime-type documentation incomplete...)
(*) Navigator: Bug#691547 closed by Mikhail Gusarov <dottedmag@dottedmag.net> (Re: s3cmd: Man page: --default-mime-type documentation incomplete...)
(*) urxvt: /home/gwern; 2014-09-02 14:25:17 man s3cmd
(*) evince: bayesiancausality.pdf
(*) evince: bayesiancausality.pdf
(*) puddletag: puddletag: /home/gwern/music
(*) puddletag: puddletag: /home/gwern/music
(*) evince: bayesiancausality.pdf
(*) Navigator: ▶ Umineko no Naku Koro ni Music Box 4 - オルガン小曲 第2億番 ハ短調 - YouTube - Iceweasel
...
</programlisting>
<para>
This is better. We can see a few things: the windows all now
produce enough information to be usefully classified (Gmail can
be classified under email, irssi can be classified as IRC, the
urxvt usage can clearly be classified as programming, the PDF
being read is statistics, etc) in part because of customizations
to bash/urxvt. The duplication still impedes focus, and we don't
know what's most common. We can use another pipeline to sort,
count duplicates, and sort by number of duplicates
(<literal>| sort | uniq --count | sort --general-numeric-sort</literal>),
yielding:
</para>
<programlisting>
...
14 (*) Navigator: A Bluer Shade of White Chapter 4, a frozen fanfic | FanFiction - Iceweasel
14 (*) Navigator: Iceweasel
15 (*) evince: 2009-geisler.pdf — Heart rate variability predicts self-control in goal pursuit
15 (*) Navigator: Tool use by animals - Wikipedia, the free encyclopedia - Iceweasel
16 (*) Navigator: Hacker News | Add Comment - Iceweasel
17 (*) evince: bayesiancausality.pdf
17 (*) Navigator: Comments - Less Wrong Discussion - Iceweasel
17 (*) Navigator: Keith Gessen · Why not kill them all?: In Donetsk · LRB 11 September 2014 - Iceweasel
17 (*) Navigator: Notes on the Celebrity Data Theft | Hacker News - Iceweasel
18 (*) Navigator: A Bluer Shade of White Chapter 1, a frozen fanfic | FanFiction - Iceweasel
19 (*) gl: mplayer2
19 (*) Navigator: Neural networks and deep learning - Iceweasel
20 (*) Navigator: Harry Potter and the Philosopher's Zombie, a harry potter fanfic | FanFiction - Iceweasel
20 (*) Navigator: [OBNYC] Time tracking app - gwern0@gmail.com - Gmail - Iceweasel
25 (*) evince: ps2007.pdf — untitled
35 (*) emacs: /home/gwern/arbtt.page
43 (*) Navigator: CCC comments on The Octopus, the Dolphin and Us: a Great Filter tale - Less Wrong - Iceweasel
62 (*) evince: The physics of information processing superobjects - Anders Sandberg - 1999.pdf — Brains2
69 (*) liferea: Liferea
82 (*) evince: BMS_raftery.pdf — untitled
84 (*) emacs: emacs@elan
87 (*) Navigator: overview for gwern - Iceweasel
109 (*) puddletag: puddletag: /home/gwern/music
150 (*) urxvt: irssi
</programlisting>
<para>
Put this way, we can see what rules we should write to
categorize: we could categorize the activities here into a few
categories of "recreational", "statistics",
"music", "email", "IRC",
"research", and "writing"; and add to the
<literal>categorize.cfg</literal> some rules like thus:
</para>
<programlisting>
$idle > 30 ==> tag inactive,
current window $title =~ [/.*Hacker News.*/, /.*Less Wrong.*/, /.*overview for gwern.*/, /.*[fF]an[fF]ic.*/, /.* LRB .*/]
|| current window $program == "liferea" ==> tag Recreation,
current window $title =~ [/.*puddletag.*/, /.*mplayer2.*/] ==> tag Music,
current window $title =~ [/.*[bB]ayesian.*/, /.*[nN]eural [nN]etworks.*/, /.*ps2007.pdf.*/, /.*[Rr]aftery.*/] ==> tag Statistics,
current window $title =~ [/.*Wikipedia.*/, /.*Heart rate variability.*/, /.*Anders Sandberg.*/] ==> tag Research,
current window $title =~ [/.*Gmail.*/] ==> tag Email,
current window $title =~ [/.*arbtt.*/] ==> tag Writing,
current window $title == "irssi" ==> tag IRC,
</programlisting>
<para>
If we reran the command, we'd see the same output, so we need to
leverage our new rules and <emphasis>exclude</emphasis> any
samples matching our current tags, so now we run a command like:
</para>
<programlisting>
arbtt-stats --categorizefile=foo --filter='$sampleage <24:00' --dump-samples --exclude=Recreation --exclude=Music --exclude=Statistics
--exclude=Research --exclude=Email --exclude=Writing --exclude=IRC |
fgrep '(*)' | sort | uniq --count | sort --general-numeric-sort
</programlisting>
<para>
Now the previous samples disappear, leaving us with a fresh
batch of unclassified samples to work with:
</para>
<programlisting>
9 (*) Navigator: New Web Order > Nik Cubrilovic - - Notes on the Celebrity Data Theft - Iceweasel
9 ( ) urxvt: /home/gwern; arbtt-stats --categorizefile=foo --filter='$sampleage <24:00' --dump-samples | fgrep '(*)' | less
10 (*) evince: ATTACHMENT02
10 (*) Navigator: These Giant Copper Orbs Show Just How Much Metal Comes From a Mine | Design | WIRED - Iceweasel
12 (*) evince: [Jon_Elster]_Alchemies_of_the_Mind_Rationality_an(BookFi.org).pdf — Alchemies of the mind
12 (*) Navigator: Morality Quiz/Test your Morals, Values & Ethics - YourMorals.Org - Iceweasel
33 ( ) urxvt: /home/gwern; arbtt-stats --categorizefile=foo --filter='$sampleage <24:00' --dump-samples | fgrep '(*)'...
</programlisting>
<para>
We can add rules categorizing these as 'Recreational',
'Writing', 'Research', 'Recreational', 'Research', 'Writing',
and 'Writing' respectively; and we might decide at this point
that 'Writing' is starting to become overloaded, so we'll split
it into two tags, 'Writing' and 'Programming'. And then after
tossing another <literal>--exclude=Programming</literal> into
our rules, we can repeat the process.
</para>
<para>
As we refine our rules, we will quickly spot instances where the
title/class/program are insufficient to allow accurate
classification, and we will figure out the best collection of
tags for our particular purposes. A few iterations is enough for
most purposes.
</para>
</example>
</sect3>
</sect2>
<sect2>
<title>Categorizing advice</title>
<para>
When building up rules, a few rules of thumb should be kept in
mind:
</para>
<sect3>
<title>
Categorize by purpose, not by program
</title>
<para>
This leads to misleading time reports. Avoid, for example,
lumping all web browser time into a single category named
'Internet'; this is more misleading than helpful. Good
categories describe an activity or goal, such as 'Work' or
'Recreation', not a tool, like 'Emacs' or 'Vim'.
</para>
</sect3>
<sect3>
<title>
When in doubt, write narrow rules and generalize later
</title>
<para>
Regexps are tricky and it can be easy to write rules far
broader than one intended. The <literal>--exclude</literal>
filters mean that one will never see samples which are matched
accidentally. If one is in doubt, it can be helpful to take a
specific sample one wants to match and several similar strings
and look at how well one's regexp rule works in Emacs's
<ulink url="http://www.emacswiki.org/emacs/ReBuilder">regexp-builder</ulink>
or online regexp-testers like
<ulink url="http://regexpal.com/">regexpal</ulink>.
</para>
</sect3>
<sect3>
<title>
Don't try to classify everything
</title>
<para>
You will never classify 100% of samples because sometimes
programs do not include useful X properties and cannot be
fixed, you have samples from before you fixed them, or they
are too transient (like popups and dialogues) to be worth
fixing. It is not necessary to classify 100% of your time,
since as long as the most common programs and, say,
<ulink url="https://en.wikipedia.org/wiki/Pareto_principle">80%</ulink>
of your time is classified, then you have most of the value.
It is easy to waste more time tweaking arbtt than one gains
from increased accuracy or more finely-grained tags.
</para>
</sect3>