本文构建了一种基于层次注意力机制的源代码迁移模型(Hierarchical Pointer-Generator Networks,HPGN)。其在迁移过程中通过关注代码语句的语法和语义,从而进一步提升迁移代码的语义一致性。
本文已被《计算机应用研究录用》录用,计划2023年10月至11月发表
论文名:基于层次注意力机制的源代码迁移模型
作者:李征 徐明瑞 吴永豪 刘勇 陈翔 武淑美 刘恒源
论文发表后更新引用格式
目前正在调整项目结构,部分代码正在更改中
HPGN
│ hpgn_main.py 层次指针生成网络——训练
│ metric_result_analysis.py 指标分析
│ pgn_main.py 指针生成网络——训练
│ readme.md
├─dataset
│ └─Csharp_Java 数据、模型、分词器、参考输出
│ ├─dataset 预处理数据集和单词表
│ ├─model 模型以及输出样例
│ ├─trained-model 本实验模型的输出样例
│ └─Tokenizers 分词器
├─transformer_result transformer输出结果
├─evaluator bleu、Codebleu
│ └─CodeBLEU
└─Network 指针生成网络、层次指针生成网络
python=3.9
tensorflow=2.7
- BLEU:计算生成的序列和参考序列的n-gram重叠率,并返回0到100%之间的分值。BLEU值越高,表示生成的序列越接近参考序列
- 精确匹配(Exact Match,EM):评测预测输出和参考输出是否完全一致
- CodeBLEU:采用BLEU的n-gram匹配算法,并通过代码解析工具引入了抽象语法树和数据流匹配算法。CodeBLEU根据代码的文本、语法和语义来评估代码,给出0到100%之间的分数。CodeBLEU值越高,代码生成的质量越高
数据集:基于真实项目的(Java-C#)数据集CodeTrans
HPGN 隐藏层-64维,网络-1层
模型 | BLEU | EM | CodeBLEU |
---|---|---|---|
Naive | 18.54 | 0.0 | - |
PBSMT | 43.53 | 12.5 | 42.71 |
tree-to-tree | 36.34 | 3.4 | 42.13 |
Transformer | 58.53 | 34.4 | 64.20 |
Resnet-HPGN | 58.74 | 19.5 | 63.08 |
Gate-HPGN | 61.40 | 27.3 | 64.93 |
Base-HPGN | 59.79 | 20.7 | 63.88 |
指针生成网络 | 26.18 | 13.8 | 43.87 |
模型 | BLEU | EM | CodeBLEU |
---|---|---|---|
Naive | 18.69 | 0.0 | - |
PBSMT | 40.06 | 16.1 | 43.48 |
tree-to-tree | 32.09 | 4.4 | 43.86 |
Transformer | 52.87 | 34.7 | 58.56 |
Resnet-HPGN | 60.28 | 29.5 | 64.20 |
Gate-HPGN | 60.95 | 32.1 | 64.62 |
Base-HPGN | 58.26 | 30.0 | 62.35 |
指针生成网络 | 27.84 | 20.5 | 44.88 |
分值 | 评价指标 | |
---|---|---|
语法 | 语义 | |
5 | 没有语法错误 | 语义一致 |
4 | 存在少数语法错误 | 少数语义缺失 |
3 | 存在部分语法错误 | 部分语义缺失或存在少数无关语义 |
2 | 存在大量语法错误 | 大量语义缺失或存在部分无关语义 |
1 | 不能看出语法结构 | 语义完全无关 |
模型 | Java到C# | C#到Java | |||
---|---|---|---|---|---|
语法 | 语义 | 语法 | 语义 | ||
Gate-HPGN | 1 | 4.44 | 4.47 | 4.49 | 4.49 |
2 | 4.16 | 4.47 | 4.07 | 4.38 | |
3 | 4.66 | 4.68 | 4.73 | 4.74 | |
4 | 4.30 | 4.34 | 4.67 | 4.54 | |
5 | 4.45 | 4.50 | 4.62 | 4.53 | |
平均 | 4.40 | 4.49 | 4.52 | 4.54 | |
Transformer | 1 | 4.64 | 4.21 | 4.73 | 4.29 |
2 | 4.28 | 4.14 | 4.22 | 4.16 | |
3 | 4.79 | 4.67 | 4.83 | 4.69 | |
4 | 4.44 | 4.12 | 4.78 | 4.37 | |
5 | 4.64 | 4.45 | 4.79 | 4.35 | |
平均 | 4.56 | 4.32 | 4.67 | 4.37 |
样例1 | |
---|---|
源代码 | public static final WeightedTerm[] getTerms(Query query){return getTerms(query,false);} |
参考代码 | public static WeightedTerm[] GetTerms(Query query){return GetTerms(query, false);} |
tree-to-tree | public static WeightedTerm[] identifier(Query query) { return capacity(query, false); } |
Transformer | public static WeightedTerm[] GetTerms(Query queryTerm) { return GetTerms(query, queryTerms, Term); } |
Gate-HPGN | public static WeightedTerm[] GetTerms(Query query)) { return GetTerms(query,false); } |
样例2 | |
---|---|
源代码 | public long skip(long n){int s = (int) Math.min(available(), Math.max(0, n));ptr += s;return s;} |
参考代码 | public override long Skip(long n){int s = (int)Math.Min(Available(), Math.Max(0, n)); ptr += s;return s;} |
tree-to-tree | public override long skip(long n) { int s = (int) Math.Min(0, n); ptr += n; return s; } |
Transformer | public override long Skip(long n) { int s = (int)(MIN_SHIFT); return ((long)((ulong)block >> shift)) & unchecked((int)(0xff)); } |
Gate-HPGN | public virtual long skip(long n) { int s = (int)(offsetmin(available(), Math.max(0, n)); ptr += s; return s; } |
样例3 | |
---|---|
源代码 | public final boolean hasPassedThroughNonGreedyDecision() { return passedThroughNonGreedyDecision; } |
参考代码 | public bool hasPassedThroughNonGreedyDecision() { return passedThroughNonGreedyDecision; } |
Resnet-HPGN | public bool H hasPassedThroughNonGreedyDecision() { return passedThroughNonGreedyDecision; } |
Gate-HPGN | public bool hasPassedThroughNonGreedyDecision() { return passedThroughNonGreedyDecision; } |
Base-HPGN | public bool HPasassedThroughNonGreedyDecision() { return passedThroughNonGreedyDecision; } |
指针生成网络 | public virtual bool hasPassedThroughNonGreedyDecision() { return passedroughNonGreedyDecision; } } } } } } } } } } } |
样例4 | |
---|---|
源代码 | public UpdateUserRequest(String userName) { setUserName(userName); } |
参考代码 | public UpdateUserRequest(string userName) { _userName = userName; } |
Resnet-HPGN | public UpdateUserRequest(string userName) { _userName =NameName; } |
Gate-HPGN | public UpdateUserRequest(string userName) { _userName = userName; } |
Base-HPGN | public UpdateUserRequest(string userName) { _userName = groupName; } |
指针生成网络 | public UpdateUserRequest(string userName) { _UserNameuserName); } |
样例5 | |
---|---|
源代码 |
public LatvianStemFilterFactory(Map args) { super(args);
if (!args.isEmpty()) { throw new IllegalArgumentException("Unknown parameters: " + args); }
} |
参考代码 | public LatvianStemFilterFactory(IDictionary args): base(args) {
if (args.Count > 0) { throw new System.ArgumentException("Unknown parameters: " + args); }
} |
代码迁移工具 | using System.Collections.Generic;
//JAVA TO C# CONVERTER WARNING: The following constructor is declared outside of its associated class:
//ORIGINAL LINE: public LatvianStemFilterFactory(Map args)
public LatvianStemFilterFactory(IDictionary args) : base(args) {
if (args.Count > 0) {throw new System.ArgumentException("Unknown parameters: " + args);}
} |
Gate-HPGN | public LatvianStemFilterFactory(IDictionary args): base(args) {
if (args.Count > 0) {throw new System.ArgumentException("Unknown parameters: " + args);}
} |
样例6 | |
---|---|
源代码 | public PutIntegrationResult putIntegration(PutIntegrationRequest request) {
request = beforeClientExecution(request);
return executePutIntegration(request);
} |
参考代码 | public virtual PutIntegrationResponse PutIntegration(PutIntegrationRequest request) {
var options = new InvokeOptions();
options.RequestMarshaller = PutIntegrationRequestMarshaller.Instance;
options.ResponseUnmarshaller = PutIntegrationResponseUnmarshaller.Instance;
return Invoke(request, options);
} |
代码迁移工具 | public virtual PutIntegrationResult putIntegration(PutIntegrationRequest request) {
request = beforeClientExecution(request);
return executePutIntegration(request);
} |
Gate-HPGN | public virtual PutIntegrationResponse PutIntegration(PutIntegrationRequest request) {
var options = new InvokeOptions();
options.RequestMarshaller = PutIntegrationRequestMarshaller.Instance;
options.ResponseUnmarshaller = PutIntegrationResponseUnmarshaller.Instance;
return Invoke(request, options);
} |
错误:不符合EM指标评价方法的代码
样例7 | |
---|---|
源代码 | public boolean remove(Object o) { synchronized (mutex) { return delegate().remove(o); } } |
参考代码 | public virtual bool remove(object @object) { lock (mutex) { return c.remove(@object); } } |
Gate-HPGN | public override bool Remove(object o) { lock (mutex) { return } } |
样例8 | |
---|---|
源代码 | public static String toHex(long value) {
StringBuilder sb = new StringBuilder(16);
writeHex(sb, value, 16, "");
return sb.toString();
} |
参考代码 | public static string ToHex(int value) { return ToHex((long)value, 8); } |
Gate-HPGN | public static string ToHex(long value) {
StringBuilder sb = new StringBuilder(16);
writeHex(sb, value, 16, "");
return sb.ToString();
} |