Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在无索引的数据上建索引是否会发生数据不一致 #144

Closed
zhouyang209117 opened this issue Oct 30, 2018 · 1 comment · Fixed by #150
Closed

在无索引的数据上建索引是否会发生数据不一致 #144

zhouyang209117 opened this issue Oct 30, 2018 · 1 comment · Fixed by #150
Labels
bug Something isn't working
Milestone

Comments

@zhouyang209117
Copy link

后端使用cassandra

  1. 向图里添加大量数据
  2. 按user的userid创建索引
schema.indexLabel('userById').onV('user').by('userid').secondary().ifNotExist().create();
  1. 继续添加数据

分析了数据库的表结果。建索引前表graph_secondary_indexes为空,表graph_secondary_indexes有3列,分别是

field_values 
index_label_id
element_ids

建索引后再添加数据,graph_secondary_indexes开始有数据。
通过分析,field_values是属性值,element_ids是属性值对应的顶点id,很显然,有这个表可以快速地通过属性值找到顶点id,通过观察数据发现:
建索引的时候,没有把老数据的相关信息加到表graph_secondary_indexes中。只是建索引后新数据相关信息加到了graph_secondary_indexes中。此时,若要查一个老数据的userid(建索引前添加到cassandra里的),它不在索引中,查询策略是怎么样的?是先查graph_secondary_indexes,查不到再遍历所有?还是查不到直接返回查不到?还是有其他查询方法?

@zhoney
Copy link
Contributor

zhoney commented Oct 30, 2018

@zhouyang209117 感谢反馈!这确实是一个bug。

创建索引时:

  1. 如果没有数据,则创建index label
  2. 如果已经有数据存在,会创建index label,并为已存在的数据补充索引数据

因此,只要建立了index label,且索引补充完成,是可以查询老数据和新数据的。

之所以失败,是因为上述第2项补充老数据索引的过程,实现逻辑是一次性提交所有补充的所有索引数据,当后端是Cassandra的时候,超过了Cassandra的batch limit 65535,所以失败了。patch #149 已经改为分批提交,修复了问题。

Linary pushed a commit that referenced this issue Oct 30, 2018
To avoid commit all together during rebuilding index,
especially for Cassandra backend, which has batch limit 65535

fixed: #144
implemented: #82

Change-Id: I88ff4bc878bc24122f0bb6ecf9964246a083b9ab
Linary pushed a commit that referenced this issue Nov 1, 2018
1. To avoid commit all together during rebuilding index,
   especially for Cassandra backend, which has batch limit 65535
2. Also move async codes to com.baidu.hugegraph.job package
3. fix bug that CacheManager might create more than one cache with same name

fixed: #144
implemented: #82

Change-Id: I88ff4bc878bc24122f0bb6ecf9964246a083b9ab
@javeme javeme added this to the 0.8 milestone Apr 13, 2019
@javeme javeme added the bug Something isn't working label Apr 13, 2019
VGalaxies pushed a commit that referenced this issue Aug 3, 2024
… writing (#144)

* #2578
fixed memory leaks occur in HugeGraph Server during data writing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants