Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable creating an index "from_existing" #174

Merged
merged 8 commits into from
Jul 2, 2024

Conversation

tylerhutcherson
Copy link
Collaborator

Since version 2.8.10 of the search module in Redis, FT.INFO now reports the contents of the index and associated fields and all low level attributes. Raw response from Redis:

 1) index_name
 2) user_simple
 3) index_options
 4) (empty array)
 5) index_definition
 6) 1) key_type
    2) HASH
    3) prefixes
    4) 1) user_simple_docs
    5) default_score
    6) "1"
 7) attributes
 8) 1) 1) identifier
       2) user
       3) attribute
       4) user
       5) type
       6) TAG
       7) SEPARATOR
       8) ,
    2) 1) identifier
       2) credit_score
       3) attribute
       4) credit_score
       5) type
       6) TAG
       7) SEPARATOR
       8) ,
    3) 1) identifier
       2) job
       3) attribute
       4) job
       5) type
       6) TEXT
       7) WEIGHT
       8) "1"
    4) 1) identifier
       2) age
       3) attribute
       4) age
       5) type
       6) NUMERIC
    5)  1) identifier
        2) user_embedding
        3) attribute
        4) user_embedding
        5) type
        6) VECTOR
        7) algorithm
        8) FLAT
        9) data_type
       10) FLOAT32
       11) dim
       12) (integer) 3
       13) distance_metric
       14) COSINE
 9) num_docs
10) "0"
11) max_doc_id
12) "0"
13) num_terms
14) "0"
15) num_records
16) "0"
17) inverted_sz_mb
18) "0"
19) vector_index_sz_mb
20) "0.00818634033203125"
21) total_inverted_index_blocks
22) "1813"
23) offset_vectors_sz_mb
24) "0"
25) doc_table_size_mb
26) "0"
27) sortable_values_size_mb
28) "0"
29) key_table_size_mb
30) "0"
31) geoshapes_sz_mb
32) "0"
33) records_per_doc_avg
34) "nan"
35) bytes_per_record_avg
36) "nan"
37) offsets_per_term_avg
38) "nan"
39) offset_bits_per_record_avg
40) "nan"
41) hash_indexing_failures
42) "0"
43) total_indexing_time
44) "0"
45) indexing
46) "0"
47) percent_indexed
48) "1"
49) number_of_uses
50) (integer) 2
51) cleaning
52) (integer) 0
53) gc_stats
54)  1) bytes_collected
     2) "0"
     3) total_ms_run
     4) "0"
     5) total_cycles
     6) "0"
     7) average_cycle_time_ms
     8) "nan"
     9) last_run_time_ms
    10) "0"
    11) gc_numeric_trees_missed
    12) "0"
    13) gc_blocks_denied
    14) "0"
55) cursor_stats
56) 1) global_idle
    2) (integer) 0
    3) global_total
    4) (integer) 0
    5) index_capacity
    6) (integer) 128
    7) index_total
    8) (integer) 0
57) dialect_stats
58) 1) dialect_1
    2) (integer) 0
    3) dialect_2
    4) (integer) 0
    5) dialect_3
    6) (integer) 0
    7) dialect_4
    8) (integer) 0
59) Index Errors
60) 1) indexing failures
    2) (integer) 0
    3) last indexing error
    4) N/A
    5) last indexing error key
    6) "N/A"
61) field statistics
62) 1) 1) identifier
       2) user
       3) attribute
       4) user
       5) Index Errors
       6) 1) indexing failures
          2) (integer) 0
          3) last indexing error
          4) N/A
          5) last indexing error key
          6) "N/A"
    2) 1) identifier
       2) credit_score
       3) attribute
       4) credit_score
       5) Index Errors
       6) 1) indexing failures
          2) (integer) 0
          3) last indexing error
          4) N/A
          5) last indexing error key
          6) "N/A"
    3) 1) identifier
       2) job
       3) attribute
       4) job
       5) Index Errors
       6) 1) indexing failures
          2) (integer) 0
          3) last indexing error
          4) N/A
          5) last indexing error key
          6) "N/A"
    4) 1) identifier
       2) age
       3) attribute
       4) age
       5) Index Errors
       6) 1) indexing failures
          2) (integer) 0
          3) last indexing error
          4) N/A
          5) last indexing error key
          6) "N/A"
    5) 1) identifier
       2) user_embedding
       3) attribute
       4) user_embedding
       5) Index Errors
       6) 1) indexing failures
          2) (integer) 0
          3) last indexing error
          4) N/A
          5) last indexing error key
          6) "N/A"

This enables the ability to "hydrate" a RedisVL IndexSchema class from this output. This makes state management for index information MUCH simpler. The caveat is that this capability is only available in newer versions of redis/search. So we have to hide it behind a "Feature flag" of sorts. This is directly needed in our integration clients like LangChain and LlamaIndex too.

@tylerhutcherson tylerhutcherson added the enhancement New feature or request label Jul 1, 2024
Copy link
Collaborator

@rbs333 rbs333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

seems like we have some mypy problems

@tylerhutcherson
Copy link
Collaborator Author

image

seems like we have some mypy problems

Yep, still have a few things I found I need to fix. Sorry for the early tag.

installed_modules = unpack_redis_modules(
convert_bytes(redis_client.module_list())
)
validate_modules(installed_modules, [{"name": "search", "ver": 20810}])
Copy link
Collaborator

@bsbodden bsbodden Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is search the only one we need to validate? I'm looking at the remote possibility that somebody did their own module add in the configuration with mismatched versions, but that likely entail us to have a compatibility matrix with at least 3 columns, Redis version , Search module version, and JSON module version

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the ability to read in from the FT.INFO output, it's not tied to any module besides Search. Not sure we can invest in supporting additional/ad-hoc modules (yet). Will likely require a bit of work to get there. But I do want to clean up the module and connection factory interfaces soon to make them a bit cleaner and more generic.

bsbodden
bsbodden previously approved these changes Jul 1, 2024
Copy link
Collaborator

@bsbodden bsbodden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to me! 🚀

rbs333
rbs333 previously approved these changes Jul 1, 2024
bsbodden
bsbodden previously approved these changes Jul 1, 2024
Copy link
Collaborator

@bsbodden bsbodden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tylerhutcherson tylerhutcherson dismissed stale reviews from bsbodden and rbs333 via 80e9549 July 2, 2024 13:23
@tylerhutcherson tylerhutcherson force-pushed the feat/RAAE-149/index-from-existing branch from 2fc6733 to 80e9549 Compare July 2, 2024 13:23
@tylerhutcherson tylerhutcherson merged commit eb1a907 into main Jul 2, 2024
20 checks passed
@tylerhutcherson tylerhutcherson deleted the feat/RAAE-149/index-from-existing branch July 2, 2024 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants