-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docstrings and type hints to Evaluate class. #935
base: main
Are you sure you want to change the base?
Conversation
Still some typing errors:
Will check these as type permits. |
With kind help from the mypy folks, fixed the overload hints for the `__call__` method on Evaluate.
With help from the mypy folks, the type errors are now fixed, and mypy likes the file. |
|
||
# noinspection PyUnresolvedReferences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these comments needed? prefer to remove non-code related comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are used to quash linter complaints. I'd prefer to keep them so that I don't see the same linter complaints over and over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't maintain these comments throughout the repo and it's actually fine to flag the linter and add in the changes within the PR. would prefer to remove them.
|
||
try: | ||
from IPython.display import HTML | ||
from IPython.display import display as ipython_display | ||
# noinspection PyPackageRequirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here as well
dspy/evaluate/evaluate.py
Outdated
except ImportError: | ||
ipython_display = print | ||
|
||
def HTML(x): | ||
def HTML(x): # noqa - linters dislike upper case function name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove this as well
dspy/evaluate/evaluate.py
Outdated
@@ -29,17 +34,40 @@ def HTML(x): | |||
|
|||
|
|||
class Evaluate: | |||
""" | |||
Reification of the process of evaluating a model. Invoked using its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the evaulate class is for a DSPy program, not a model. Also, can the wording be simplified? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by simplifying the wording. I can't think of a synonym for "reification," if that's what you mean.
Class representing the process of evaluating a DSpy program.
Would that be better?
dspy/evaluate/evaluate.py
Outdated
score: float, results: list of Prediction | ||
if `return_all_scores` is False and `return_outputs` is True. | ||
score: float | ||
if both flags are false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep consistent with phrasing in 222
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed this. Also one of the lines was wrong, and I added a check (to produce a more understandable error message when required arguments are missing).
@@ -132,8 +257,10 @@ def wrapped_program(example_idx, example): | |||
|
|||
# increment assert and suggest failures to program's attributes | |||
if hasattr(program, "_assert_failures"): | |||
# noinspection PyProtectedMember |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments
dspy/evaluate/evaluate.py
Outdated
@@ -151,11 +278,14 @@ def wrapped_program(example_idx, example): | |||
if creating_new_thread: | |||
del thread_stacks[threading.get_ident()] | |||
|
|||
devset = list(enumerate(devset)) | |||
tqdm.tqdm._instances.clear() | |||
devset = list(enumerate(devset)) # This actually changes the type of `devset` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, but renamed. See if you are ok with the change, please.
df = df.map(truncate_cell) | ||
else: | ||
df = df.applymap(truncate_cell) | ||
df = df.applymap(truncate_cell) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments
Docstring for the return value had inconsistent wording and one error. Also added explicit check for missing `metric` and `devset` arguments, since these do not have defaults.
@arnavsinghvi11 @rpgoldman what is left for this to be merged? |
@cdowellmdb I think it's good to go. @arnavsinghvi11 didn't like the comments I put in to muffle linters; I prefer to keep them. Is that a show-stopper? If not, do you need me to merge in the changes since this was pushed? |
|
||
# noinspection PyUnresolvedReferences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't maintain these comments throughout the repo and it's actually fine to flag the linter and add in the changes within the PR. would prefer to remove them.
========== | ||
devset: Iterable[Example] | ||
An iterable of Examples | ||
metric: Callable, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link was for L53 above, corresponding to the metric description
self, | ||
program, | ||
metric: Optional[Callable] = None, | ||
devset: Optional[Iterable] = None, # Needs more specific type, if possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the latterUnion[List[bool], List[float]]
makes sense for this!
program._assert_failures += dspy.settings.get("assert_failures") | ||
if hasattr(program, "_suggest_failures"): | ||
# noinspection PyProtectedMember |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flagging the comments
devset = list(enumerate(devset)) | ||
tqdm.tqdm._instances.clear() | ||
ndevset: List[int, Example] = list(enumerate(devset)) | ||
# noinspection PyProtectedMember |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flagging the comments
Thanks for circling back to this @rpgoldman @cdowellmdb ! Left a few comments back on the review and it should be good to merge once the last parts are resolved! (There's also a merge conflict but I believe rebasing to main should fix this). The in-line comments are definitely not a show-stopper, but I'd prefer to leave them out from committed code to ensure they catch any changes needed. |
Not sure I understand this last. What's "they" here? I assume you are saying that we should continue to see these linter warnings. Here's an example of why I disagree: E.g., I put in
One of 2 things should happen, IMO:
I'm ok with either solution, but a third solution, where we just have linters continue shout at us, seems generally bad. Bad because if we end up with lots of these linter shoutings then it's just human nature to start ignoring the linter warnings altogether. And we certainly cannot put "must build cleanly" into our testing process, if we do this. I have been bitten many times by code repositories that emit lots of warnings, training their developers to ignore linter warnings. Because sooner or later there's a warning that's really important, but that signal is lost in the noise. I'm a strong believer in building clean. If there's a warning, either it's real and you fix it, or it's not real and you quash it. |
Add docstrings and type hints for the Evaluate class.
Needs review!