Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyspark+hadoop集群下,加载自定义的字典文件总是报:无法找到自定义字典所在的路径 #975

Closed
NYcleaner opened this issue Aug 29, 2022 · 3 comments

Comments

@NYcleaner
Copy link

NYcleaner commented Aug 29, 2022

INFO SparkContext:54 - Added file hdfs://xxxx/user_dict_2022.txt at hdfs://xxxx/user_dict_2022.txt with timestamp 1661761280375 Utils:54 - Fetching hdfs://xxxx/user_dict_2022.txt to /data/data16/yarn/nm2/usercache/o_zzzz/appcache/application_1655780863565_yyyy/spark-f9b4a2ca-aeba-45d7-ae8c-f3a40ddbab15/userFiles-9da5a8ee-8220-41dd-bd77-73aee4e92042/fetchFileTemp9124107129410970331.tmp Traceback (most recent call last): File "project1_jieba_train_online.py", line 137, in <module> jieba.load_userdict(user_dict_path) File "/data/data13/yarn/nm2/usercache/o_zzzz/appcache/application_1655780863565_yyyy/container_e4075_1655780863565_3544813_01_000001/py3/lib/python3.7/site-packages/jieba/__init__.py", line 398, in load_userdict f = open(f, 'rb') FileNotFoundError: [Errno 2] No such file or directory: 'hdfs://xxxx/user_dict_2022.txt' ERROR ApplicationMaster:70 - User application exited with status 1
错误信息如上所示,

1.已经在pyspark submit 的--file 参数上添加了自定义字典所在的hdfs系统文件的绝对路径
--file hdfs://xxxx/user_dict_2022.txt

2在py文件里面加载自定义路径的代码如下 :
`
jieba.initialize()

user_dict_path='hdfs://xxxx/user_dict_2022.txt '

ss.sparkContext.addFile(user_dict_path)

jieba.load_userdict(user_dict_path)

main(ss, jieba)
`

看以前的issue,还没有我这样的问题,特此来寻求大家帮助,多谢

@hvgdfx
Copy link

hvgdfx commented Feb 23, 2023

jieba可以直接读hdfs路径吗,应该用fileSystem来读吧

@NYcleaner
Copy link
Author

最后的解决方案:
使用 --archives hdfs://路径/文件.zip#别名
然后jieba.load_userdict(别名)

@hvgdfx
Copy link

hvgdfx commented May 17, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants