请问fine-tuning数据集是没有上传吗？ #2

Duanexiao · 2019-06-12T02:01:00Z

大佬，哪里可以找的你fine-tuning时候的数据集？

lonePatient · 2019-06-12T02:18:25Z

@Duanexiao 您好，我使用的是THUCNews数据中的一个子集（百度可以找到的，回头我上传下），该数据集比较小，所以很适合调试一个算法.。然后在转移到实际项目数据集中。

Duanexiao · 2019-06-12T02:25:40Z

@lonePatient 谢谢，请问你的text classification是multi-class还是multi-label

lonePatient · 2019-06-12T02:37:59Z

@Duanexiao 当前数据是multi-class，multi-label使用的是kaggke的toxic数据集，如https://github.com/lonePatient/Bert-Multi-Label-Text-Classification

Duanexiao · 2019-06-12T03:06:49Z

好的，谢谢

sunyh214 · 2019-07-22T07:26:37Z

大佬，可以看看你的数据存储形式吗，在csv中是什么样子的？？？谢谢您

lonePatient · 2019-07-22T07:31:20Z

@sunyh214 这个数据集很简单其实就是每一行 “ label context“” 格式

sunyh214 · 2019-07-22T07:38:30Z

在这每一行label context中包含标签吗？可以上传个样例吗

sunyh214 · 2019-07-22T10:04:18Z

大佬，再打扰一下，我数据集下了一个THUCNews数据中的一个子集，但是报了下述错误：
feature = self.build_features(example)
File "/home/company/Textclassification/ERNIE-text-classification-pytorch-master/ERNIE-text-classification-pytorch-master/pyernie/io/dataset.py", line 105, in build_features
label_id = int(example.label)
ValueError: invalid literal for int() with base 10: '伊朗举行全国范围空军演习保护敏感核设施(图)新华网消息：据法新社14日消息，伊朗高级军事官员称伊朗军方在该国敏感核设施附近进行了防御性演练，他还说伊朗16日开始将在全国范围内展开空军演习。据伊朗法尔斯通讯社报道，伊朗空军将领米盖尼表示，伊朗空军将自16日开始进行为期5天的军事演习，演习将在全国范围进行，以增强防御能力。伊朗梅尔通讯社援引伊朗空军将领米盖尼的话称，“今年，我们在弗多、德黑兰、纳坦兹

sunyh214 · 2019-07-22T10:08:30Z

不知道哪错了，label咋变成文本了

sunyh214 · 2019-07-26T05:46:42Z

大佬，程序我已调通，我是在cpu上训练的，现在出现了这样的错误：
OSError: [Errno 12] Cannot allocate memory
不知道怎么解决，请教一哈大佬？？？？？？？？？谢谢您！

sunyh214 · 2019-07-26T06:14:33Z

@Duanexiao 大佬，程序我已调通，我是在cpu上训练的，现在出现了这样的错误：
OSError: [Errno 12] Cannot allocate memory
不知道怎么解决，请教一哈大佬？？？？？？？？？谢谢您！

lonePatient · 2019-07-26T06:19:03Z

@sunyh214 内存满了,或者进程满了,估计你使用cpu时限制下进程数或者内存吧.

sunyh214 · 2019-07-26T06:26:31Z

@Duanexiao 我线程数设的是1，我是不是设为0就可以了，应该不是内存的问题。

sunyh214 · 2019-07-26T06:45:14Z

@Duanexiao 好的，谢谢您！

wzjj98 · 2019-08-12T12:34:42Z

你好，请问一个样本中大概有六七百个字，请问应该怎么修改max_seq_len,np.percentile又是用来处理什么的。

lonePatient · 2019-08-12T13:27:10Z

@wzjj98 如果文本长度超过512的话,一般看你的具体任务了,如果是简单的分类任务的话,直接进行截断,可以截头+截尾或者截头+截中,这个需要进行实验. 如果是序列任务的话,一般而言使用窗口进行处理根据窗口大小进行平移.

EternalWhiteShrimp · 2019-09-22T03:02:56Z

大佬，您好！我想使用这个完成一个文本相似度的训练，但是这个'label_to_id'该怎么写啊！我的文件格式为tsv，id|sentence1|sentence2|label

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问fine-tuning数据集是没有上传吗？ #2

请问fine-tuning数据集是没有上传吗？ #2

Duanexiao commented Jun 12, 2019

lonePatient commented Jun 12, 2019

Duanexiao commented Jun 12, 2019

lonePatient commented Jun 12, 2019

Duanexiao commented Jun 12, 2019

sunyh214 commented Jul 22, 2019

lonePatient commented Jul 22, 2019

sunyh214 commented Jul 22, 2019

sunyh214 commented Jul 22, 2019

sunyh214 commented Jul 22, 2019

sunyh214 commented Jul 26, 2019

sunyh214 commented Jul 26, 2019

lonePatient commented Jul 26, 2019

sunyh214 commented Jul 26, 2019

sunyh214 commented Jul 26, 2019

wzjj98 commented Aug 12, 2019

lonePatient commented Aug 12, 2019

EternalWhiteShrimp commented Sep 22, 2019

请问fine-tuning数据集是没有上传吗？ #2

请问fine-tuning数据集是没有上传吗？ #2

Comments

Duanexiao commented Jun 12, 2019

lonePatient commented Jun 12, 2019

Duanexiao commented Jun 12, 2019

lonePatient commented Jun 12, 2019

Duanexiao commented Jun 12, 2019

sunyh214 commented Jul 22, 2019

lonePatient commented Jul 22, 2019

sunyh214 commented Jul 22, 2019

sunyh214 commented Jul 22, 2019

sunyh214 commented Jul 22, 2019

sunyh214 commented Jul 26, 2019

sunyh214 commented Jul 26, 2019

lonePatient commented Jul 26, 2019

sunyh214 commented Jul 26, 2019

sunyh214 commented Jul 26, 2019

wzjj98 commented Aug 12, 2019

lonePatient commented Aug 12, 2019

EternalWhiteShrimp commented Sep 22, 2019