-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More than 100% data in Alluxio #18577
Comments
Which alluxio version do you use? |
you need to refresh metadata. |
Thank you for your reply !
I'm wondering what's causing this, and I'm not going to bypass Alluxio and write UFS directly, as there should be no inconsistency in metadata |
In addition, the write to the file is a spark2.4.8 connection to hive write (the hive table metadata points to alluxio), and the alluxio log before and after the file generation is as follows:
|
Have you tried refreshing hive's metadata? I think the problem is caused by the difference between the metadata in hive and the metadata in alluxio. |
You can share more information from master.log
https://github.com/Alluxio/alluxio/blame/26919b8894d251b803c82513cb1eeee562bace0a/core/server/master/src/main/java/alluxio/master/file/InodeSyncStream.java#L503 |
@jasondrogba Of course, I'll share more logs, the following two logs show up repeatedly in master.log.Note that all files in /user/hive/warehouse/edw_user.db/adm_cs_device_tag_df/batch=20240416 will get the same two alerts
One thing that bothered me was that the logs were telling me that the sync failed because the files didn't exist, but when I checked manually, the files were always there,additionally, the Alluxio process is started as the HDFS superuser with full access to /user/hive/warehouse. After that, I ran the following command to make sure I didn't miss critical logs:
Could this be due to the fact that I set ACLs manually? Although I didn't see any permissions errors, all the tables written by alluxio, user and group are different from those written without Alluxio (note that Alluxio's manual acl is the same as HDFS's acl). |
@jasondrogba hi, did you forget this issue, and the problem still exists. If possible, please help to confirm whether this is a bug and how to solve it. If you need any logs or information, please feel free to tell me |
oh! @ziyangRen ,can you share the worker log? and why do you have so many block replicas? |
hi @jasondrogba I tried my best to gather the worker logs and didn't find any errors, but I've tried to summarize a few recurring events from that time that might help: The first is a large number of block trasfer, and this time the two blockids with the wrong data have gone through this process:
The following logs appear to be normal for reading and writing to HDFS, and I have listed them below:
As for the problem of too many replicas you mentioned, I am also confused. Although there will be a large number of concurrent reads and writes by clients in the actual scenario, it should be possible to avoid a large number of replicas by configuring it as follows, using the relevant alluxio configuration:
If there is any need, please let me know at any time and I will provide you with relevant information as soon as possible |
I see you're using different medium types, MEM and SSD, and based on your URIStatus, there are many blocks. I suspect this might be due to your configuration of multiple-tier storage. |
@jasondrogba Thanks for your quick reply. But if you say this is the case, I have three questions:
|
|
@jasondrogba Thanks again for your patience, but I still have some questions about the previous question:
|
@yuzhu You have more wisdom and experience. |
@ziyangRen we recommend to use single tieredstore, you can try it. |
@jasondrogba Thank you for your suggestions and patient answers. I will change to single tieredstore. If the same problem still occurs after modification, I will update it again. |
Alluxio Version:
What version of Alluxio are you using?
Describe the bug
The error logs I observed in spark were:
Protocol message tag had invalid wire type.
The error logs I observed in trino were:
I check the information of this file in alluxio as follows, the size of the file is more than 100%, specifically, it is made up of two 256M blocks, and the size of the file in HDFS is 258.1M,it should be noted that the data in HDFS is written by Alluxio,I used alluxio.user.file.metadata.sync.interval=216000000 and alluxio.user.file.writetype.default=CACHE_THROUGH
When I switched the metadata for this table back to HDFS, the job worked, meaning the HDFS data was working, but Alluxio's data was causing problems
Here's how this file looks in alluxio:
In addition, neither checksum nor copyToLocal attempts were successful for this file, and I didn't see any errors from the master or worker when the Spark and Trino tasks failed
The text was updated successfully, but these errors were encountered: