Skip to content

Latest commit

 

History

History
22 lines (20 loc) · 613 Bytes

README.md

File metadata and controls

22 lines (20 loc) · 613 Bytes

Code generation human evaluation

Evaluated over NL2ProcessOps and Copilot outputs.

NL2ProcessOps

Criterion 1: 4.364285714285714
Criterion 2: 4.535714285714286
Criterion 3: 4.085714285714285
Variance Criterion 1: 0.617295918367347
Variance Criterion 2: 0.8630102040816325
Variance Criterion 3: 1.0069387755102037

Copilot

Criterion 1: 3.807142857142857
Criterion 2: 3.942857142857143
Criterion 3: 4.171428571428572
Variance Criterion 1: 1.3556632653061225 
Variance Criterion 2: 1.3395918367346937 
Variance Criterion 3: 0.9563265306122448