Skip to content
/ HPGN Public

基于层次注意力机制的源代码迁移模型-HPGN(Hierarchical Pointer-Generator Networks)

Notifications You must be signed in to change notification settings

Xmr-nxbx/HPGN

Repository files navigation

基于层次注意力机制的源代码迁移模型

本文构建了一种基于层次注意力机制的源代码迁移模型(Hierarchical Pointer-Generator Networks,HPGN)。其在迁移过程中通过关注代码语句的语法和语义,从而进一步提升迁移代码的语义一致性。

本文已被《计算机应用研究录用》录用,计划2023年10月至11月发表
论文名:基于层次注意力机制的源代码迁移模型
作者:李征 徐明瑞 吴永豪 刘勇 陈翔 武淑美 刘恒源
论文发表后更新引用格式

HPGN模型架构: HPGN模型架构

残差状态传递机制: 残差状态传递机制

门控状态传递机制: 门控状态传递机制

目前正在调整项目结构,部分代码正在更改中

HPGN
│  hpgn_main.py                  层次指针生成网络——训练
│  metric_result_analysis.py     指标分析
│  pgn_main.py                   指针生成网络——训练
│  readme.md
├─dataset
│  └─Csharp_Java      数据、模型、分词器、参考输出
│      ├─dataset        预处理数据集和单词表
│      ├─model          模型以及输出样例
│      ├─trained-model  本实验模型的输出样例
│      └─Tokenizers     分词器
├─transformer_result    transformer输出结果
├─evaluator         bleu、Codebleu
│  └─CodeBLEU
└─Network           指针生成网络、层次指针生成网络
  

运行环境

python=3.9
tensorflow=2.7

评测指标 与 数据集

  • BLEU:计算生成的序列和参考序列的n-gram重叠率,并返回0到100%之间的分值。BLEU值越高,表示生成的序列越接近参考序列
  • 精确匹配(Exact Match,EM):评测预测输出和参考输出是否完全一致
  • CodeBLEU:采用BLEU的n-gram匹配算法,并通过代码解析工具引入了抽象语法树和数据流匹配算法。CodeBLEU根据代码的文本、语法和语义来评估代码,给出0到100%之间的分数。CodeBLEU值越高,代码生成的质量越高

数据集:基于真实项目的(Java-C#)数据集CodeTrans

实验结果

HPGN 隐藏层-64维,网络-1层

Java到C#

模型 BLEU EM CodeBLEU
Naive 18.54 0.0 -
PBSMT 43.53 12.5 42.71
tree-to-tree 36.34 3.4 42.13
Transformer 58.53 34.4 64.20
Resnet-HPGN 58.74 19.5 63.08
Gate-HPGN 61.40 27.3 64.93
Base-HPGN 59.79 20.7 63.88
指针生成网络 26.18 13.8 43.87

C#到Java

模型 BLEU EM CodeBLEU
Naive 18.69 0.0 -
PBSMT 40.06 16.1 43.48
tree-to-tree 32.09 4.4 43.86
Transformer 52.87 34.7 58.56
Resnet-HPGN 60.28 29.5 64.20
Gate-HPGN 60.95 32.1 64.62
Base-HPGN 58.26 30.0 62.35
指针生成网络 27.84 20.5 44.88

人工评分

指标

分值 评价指标
语法 语义
5 没有语法错误 语义一致
4 存在少数语法错误 少数语义缺失
3 存在部分语法错误 部分语义缺失或存在少数无关语义
2 存在大量语法错误 大量语义缺失或存在部分无关语义
1 不能看出语法结构 语义完全无关

成绩

模型 Java到C# C#到Java
语法 语义 语法 语义
Gate-HPGN 1 4.44 4.47 4.49 4.49
2 4.16 4.47 4.07 4.38
3 4.66 4.68 4.73 4.74
4 4.30 4.34 4.67 4.54
5 4.45 4.50 4.62 4.53
平均 4.40 4.49 4.52 4.54
Transformer 1 4.64 4.21 4.73 4.29
2 4.28 4.14 4.22 4.16
3 4.79 4.67 4.83 4.69
4 4.44 4.12 4.78 4.37
5 4.64 4.45 4.79 4.35
平均 4.56 4.32 4.67 4.37

输出案例

与对比模型对比的案例研究

样例1
源代码 public static final WeightedTerm[] getTerms(Query query){return getTerms(query,false);}
参考代码 public static WeightedTerm[] GetTerms(Query query){return GetTerms(query, false);}
tree-to-tree public static WeightedTerm[] identifier(Query query) { return capacity(query, false); }
Transformer public static WeightedTerm[] GetTerms(Query queryTerm) { return GetTerms(query, queryTerms, Term); }
Gate-HPGN public static WeightedTerm[] GetTerms(Query query)) { return GetTerms(query,false); }
样例2
源代码 public long skip(long n){int s = (int) Math.min(available(), Math.max(0, n));ptr += s;return s;}
参考代码 public override long Skip(long n){int s = (int)Math.Min(Available(), Math.Max(0, n)); ptr += s;return s;}
tree-to-tree public override long skip(long n) { int s = (int) Math.Min(0, n); ptr += n; return s; }
Transformer public override long Skip(long n) { int s = (int)(MIN_SHIFT); return ((long)((ulong)block >> shift)) & unchecked((int)(0xff)); }
Gate-HPGN public virtual long skip(long n) { int s = (int)(offsetmin(available(), Math.max(0, n)); ptr += s; return s; }

与消融实验模型对比的案例研究

样例3
源代码 public final boolean hasPassedThroughNonGreedyDecision() { return passedThroughNonGreedyDecision; }
参考代码 public bool hasPassedThroughNonGreedyDecision() { return passedThroughNonGreedyDecision; }
Resnet-HPGN public bool H hasPassedThroughNonGreedyDecision() { return passedThroughNonGreedyDecision; }
Gate-HPGN public bool hasPassedThroughNonGreedyDecision() { return passedThroughNonGreedyDecision; }
Base-HPGN public bool HPasassedThroughNonGreedyDecision() { return passedThroughNonGreedyDecision; }
指针生成网络 public virtual bool hasPassedThroughNonGreedyDecision() { return passedroughNonGreedyDecision; } } } } } } } } } } }
样例4
源代码 public UpdateUserRequest(String userName) { setUserName(userName); }
参考代码 public UpdateUserRequest(string userName) { _userName = userName; }
Resnet-HPGN public UpdateUserRequest(string userName) { _userName =NameName; }
Gate-HPGN public UpdateUserRequest(string userName) { _userName = userName; }
Base-HPGN public UpdateUserRequest(string userName) { _userName = groupName; }
指针生成网络 public UpdateUserRequest(string userName) { _UserNameuserName); }

与代码迁移工具对比的案例研究

样例5
源代码 public LatvianStemFilterFactory(Map args) { super(args);     if (!args.isEmpty()) { throw new IllegalArgumentException("Unknown parameters: " + args); } }
参考代码 public LatvianStemFilterFactory(IDictionary args): base(args) {     if (args.Count > 0) { throw new System.ArgumentException("Unknown parameters: " + args); } }
代码迁移工具 using System.Collections.Generic; //JAVA TO C# CONVERTER WARNING: The following constructor is declared outside of its associated class: //ORIGINAL LINE: public LatvianStemFilterFactory(Map args) public LatvianStemFilterFactory(IDictionary args) : base(args) {     if (args.Count > 0) {throw new System.ArgumentException("Unknown parameters: " + args);} }
Gate-HPGN public LatvianStemFilterFactory(IDictionary args): base(args) {     if (args.Count > 0) {throw new System.ArgumentException("Unknown parameters: " + args);} }
样例6
源代码 public PutIntegrationResult putIntegration(PutIntegrationRequest request) {     request = beforeClientExecution(request);     return executePutIntegration(request); }
参考代码 public virtual PutIntegrationResponse PutIntegration(PutIntegrationRequest request) {     var options = new InvokeOptions();     options.RequestMarshaller = PutIntegrationRequestMarshaller.Instance;     options.ResponseUnmarshaller = PutIntegrationResponseUnmarshaller.Instance;     return Invoke(request, options); }
代码迁移工具 public virtual PutIntegrationResult putIntegration(PutIntegrationRequest request) {     request = beforeClientExecution(request);     return executePutIntegration(request); }
Gate-HPGN public virtual PutIntegrationResponse PutIntegration(PutIntegrationRequest request) {     var options = new InvokeOptions();     options.RequestMarshaller = PutIntegrationRequestMarshaller.Instance;     options.ResponseUnmarshaller = PutIntegrationResponseUnmarshaller.Instance;     return Invoke(request, options); }

HPGN的错误输出案例研究

错误:不符合EM指标评价方法的代码

样例7
源代码 public boolean remove(Object o) { synchronized (mutex) { return delegate().remove(o); } }
参考代码 public virtual bool remove(object @object) { lock (mutex) { return c.remove(@object); } }
Gate-HPGN public override bool Remove(object o) { lock (mutex) { return } }
样例8
源代码 public static String toHex(long value) { StringBuilder sb = new StringBuilder(16); writeHex(sb, value, 16, "");     return sb.toString(); }
参考代码 public static string ToHex(int value) { return ToHex((long)value, 8); }
Gate-HPGN public static string ToHex(long value) { StringBuilder sb = new StringBuilder(16); writeHex(sb, value, 16, ""); return sb.ToString(); }

About

基于层次注意力机制的源代码迁移模型-HPGN(Hierarchical Pointer-Generator Networks)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published