-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add support for custom judges via evaluation metric key #86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Default Judge-specific AI Config with required evaluation metric key. | ||
| """ | ||
| messages: Optional[List[LDMessage]] = None | ||
| # Deprecated: evaluation_metric_key is used instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are sub 1.0 release as long as we can guarantee the api is always returning the new single key we should be able to just drop this and do a breaking change. They only thing that really makes this breaking is people will need to update their defaults if they defined it. If you want to drop it now update the PR to be "feat!: ".
I won't block if you want to leave this in for a little while but it likely isn't necessary. The real question is how long do we want to continue sending the old values in the API as that is what will break older SDKs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we want to make sure this is non-breaking, but soon we're going to remove "legacy" support. For keeping this change as minimal and safe as possible I'd err on the side of caution and keep it in for the time being.
Requirements
Related issues
https://launchdarkly.atlassian.net/browse/REL-11511
See tech spec at https://docs.google.com/document/d/1lzYwQqCcTzN_2zkxJZDfJtgUcEJ4jbpx0KSsJ2bRENw/edit?tab=t.0#heading=h.69bdm7karsxh
Describe the solution you've provided
Updating the SDK to check the AI Config's evaluationMetricKey property which now exists. Also added missing tests from previous implementation, and fallback to the original evaluationMetricKeys list.
Describe alternatives you've considered
Provide a clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context about the pull request here.
Note
Introduces single-key judge evaluation with backward compatibility and test coverage.
evaluationMetricKey(fallback to first inevaluationMetricKeys) and validates responses accordinglyEvaluationSchemaBuilderbuilds schema for a single metric key and can returnNonewhen absentAIJudgeConfig(Default)includeevaluation_metric_keyand serializeevaluationMetricKey;evaluation_metric_keysretained only for backward compatibilityLDAIClient.__evaluatenow returns the flagvariation;judge_configextractsevaluation_metric_keyfrom that single variation to prevent race conditionscreate_judge, chat/agent paths adjusted to new types; minor cleanup of unused imports/commentsWritten by Cursor Bugbot for commit d67d0ab. This will update automatically on new commits. Configure here.