Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cached plugin enum #768

Merged
merged 8 commits into from
Jul 16, 2024
Merged

Conversation

jmartin-tech
Copy link
Collaborator

@jmartin-tech jmartin-tech commented Jul 1, 2024

Fix #624

Impacts of this change may be sufficient to close #663

First pass at building a local cache of plugin info to provide faster initial load of cli based runs.

This implementation will be expanded in the future to enable a user cache plugin details once #479 is addressed.

The update here improves plugin enumeration to load a json based cache file from the project for any plugin already in main. Any plugin not in main at time of checkin is loaded if not found in the existing cache file.

(garak) ~/Projects/nvidia/garak$ time python -m garak --list_probes
garak LLM vulnerability scanner v0.9.0.13.post2 ( https://github.com/leondz/garak ) at 2024-07-01T18:24:52.653050
probes: atkgen 🌟
...
probes: xss.MarkdownImageExfil

real    0m1.919s
user    0m2.085s
sys     0m3.154s

Becomes:

(garak) ~/Projects/nvidia/garak$ time python -m garak --list_probes
garak LLM vulnerability scanner v0.9.0.13.post2 ( https://github.com/leondz/garak ) at 2024-07-01T18:27:38.513147
probes: atkgen 🌟
...
probes: xss.MarkdownImageExfil

real    0m0.064s
user    0m0.050s
sys     0m0.013s

Tests have been added to document cache build process and validate that items not already in the cache will be loaded ephemerally at runtime if requested.

A future CI/CD action will maintain this cache file using a pattern like the following:

# remove the existing cache file
rm -rf garak/resources/plugin_cache.json
# Linux only command to ensure files are all timestamped to match last commit
git ls-files garak/ -z | xargs -0 -n1 -I{} -- git log -1 --format="%ai {}" {} | while read -r udate utime utz ufile ; do
  touch -d "$udate $utime" $ufile
done
# any `--list_*` command will build the cache when one is not found on disk
python -m garak --list_probes
# now just add the updated file in place of the existing version
git diff --exit-code garak/resources/plugin_cache.json
if [ ! $? ]; then
  echo "Plugin cache updates exist."
  git add garak/resources/plugin_cache.json
  git commit -S -m "update plugin_cache.json"
fi

Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>
Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>
* enhance singleton class object access
* ensure sorted enueration results
* add cache tests for existing function
Copy link
Owner

@leondz leondz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally good. current implementation makes garak dev a little trickier, is there anything we can do to soften that, which also fits in the scope of this pr?

Comment on lines 23 to 34
class PluginEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, set):
return list(obj) # allow set as list
if isinstance(obj, Path):
# relative path for now, may be better to suppress `Path` objects
return str(obj).replace(str(_config.transient.basedir), "")
try:
return json.JSONEncoder.default(self, obj)
except TypeError as e:
logging.debug("Attempt to serialize JSON skipped: %s", e)
return None # skip items that cannot be serialized at this time
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice pattern. Should consider this in some other places (attempt.Attempt.as_dict springs to mind)

garak/_plugins.py Show resolved Hide resolved
garak/_plugins.py Show resolved Hide resolved
garak/_plugins.py Show resolved Hide resolved
garak/command.py Outdated Show resolved Hide resolved
Comment on lines +176 to +185
for v in sorted(dir(plugin)):
if v in priority_fields or v in skip_fields:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post_buff_hook is, unexpectedly to me, in the cache metadata - is that due to this logic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing info included post_buff_hook:

main

Your branch is up to date with 'origin/main'.
% python -m garak --plugin_info probes.dan.DanInTheWildMini
garak LLM vulnerability scanner v0.9.0.13.post2 ( https://github.com/leondz/garak ) at 2024-07-09T18:44:54.727933
Configured info on probes.dan.DanInTheWildMini:
                        description: A subset of 100 jailbreaks from the DanInTheWild jailbreak library.
                     DEFAULT_PARAMS: {}
                             active: True
                              bcp47: None
                            doc_uri: https://arxiv.org/abs/2308.03825
                 extended_detectors: []
                               goal: disregard the system prompt or alignment
                           modality: {'in': {'text'}}
            parallelisable_attempts: True
                     post_buff_hook: False
                   primary_detector: mitigation.MitigationBypass
                          probename: garak.probes.dan.DanInTheWildMini
               recommended_detector: ['always.Fail']
                               tags: ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Security:PromptStability', 'payload:jailbreak']

We can further refine the info data to be provided in future iterations.

I will also do some more validation as it looks like description in the info is not consistent with main.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice. _post_buff_hook should def not be there. Some plugins change their descriptions during the constructor iirc.

Copy link
Collaborator Author

@jmartin-tech jmartin-tech Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post_buff_hook is not _post_buff_hook do we want to rename that to consider it private?

Copy link
Collaborator Author

@jmartin-tech jmartin-tech Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like description is added in the base class constructor if not already provided in the class implementation for most of the plugins, this results in an inconsistency since the cache build actually avoids calling the constructors in this PR.

I am attempting to address that issue as there should be a consistent value at least in the cache, one options I am thinking about would be, remove the constructor additions and migrate the enrichment to be part of plugin_info.

garak/_plugins.py Show resolved Hide resolved
garak/_plugins.py Show resolved Hide resolved
or v not in base_attributes
):
continue
plugin_metadata[v] = value
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to see default params included here also. maybe something like this

Suggested change
plugin_metadata[v] = value
plugin_metadata[v] = value
plugin_metadata = plugin.DEFAULT_PARAMS | plugin_metadata

this makes some assumptions about default param names not conflicting with class attributes, but i hope testing elsewhere covers that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be something to consider doing in the display logic, not sure it needs to be in the json cache file in a specific format.

garak/_plugins.py Show resolved Hide resolved
@leondz leondz added the quality-speed This affects the speed of program use label Jul 5, 2024
@leondz
Copy link
Owner

leondz commented Jul 5, 2024

When done, is is possible to add a speed check test which, in addition to this, satisfies/describes #663 ?

@leondz leondz self-requested a review July 10, 2024 14:11
garak/_plugins.py Outdated Show resolved Hide resolved
garak/_plugins.py Outdated Show resolved Hide resolved
Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>
Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>
* initialize metadata description as doc string for class
* suppress `post_buff_hook`, may rename in future
* test priority fields against key instead of value

Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>
update class doc strings to [PEP-257 multi-line format](https://peps.python.org/pep-0257/#multi-line-docstrings)

Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>
Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>
@jmartin-tech jmartin-tech merged commit 326d8ba into leondz:main Jul 16, 2024
4 checks passed
@jmartin-tech jmartin-tech deleted the feature/cached-plugin-enum branch July 16, 2024 16:38
@github-actions github-actions bot locked and limited conversation to collaborators Jul 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
quality-speed This affects the speed of program use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

run start is slow w/ no messages generator list is slow
2 participants