feat: add support for custom judges via evaluation metric key #86

knfreemLD · 2026-01-21T15:01:01Z

Requirements

I have added test coverage for new or changed functionality
I have followed the repository's pull request submission guidelines
I have validated my changes against all supported platform versions

Related issues

https://launchdarkly.atlassian.net/browse/REL-11511
See tech spec at https://docs.google.com/document/d/1lzYwQqCcTzN_2zkxJZDfJtgUcEJ4jbpx0KSsJ2bRENw/edit?tab=t.0#heading=h.69bdm7karsxh

Describe the solution you've provided

Updating the SDK to check the AI Config's evaluationMetricKey property which now exists. Also added missing tests from previous implementation, and fallback to the original evaluationMetricKeys list.

Describe alternatives you've considered

Provide a clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the pull request here.

Note

Introduces single-key judge evaluation with backward compatibility and test coverage.

Judge now uses evaluationMetricKey (fallback to first in evaluationMetricKeys) and validates responses accordingly
EvaluationSchemaBuilder builds schema for a single metric key and can return None when absent
Models updated: AIJudgeConfig(Default) include evaluation_metric_key and serialize evaluationMetricKey; evaluation_metric_keys retained only for backward compatibility
LDAIClient.__evaluate now returns the flag variation; judge_config extracts evaluation_metric_key from that single variation to prevent race conditions
create_judge, chat/agent paths adjusted to new types; minor cleanup of unused imports/comments
New tests cover Judge behavior, schema builder, model serialization, and client extraction/consistency

^{Written by Cursor Bugbot for commit d67d0ab. This will update automatically on new commits. Configure here.}

packages/sdk/server-ai/src/ldai/client.py

jsonbailey · 2026-01-21T16:31:24Z

packages/sdk/server-ai/src/ldai/models.py

    Default Judge-specific AI Config with required evaluation metric key.
    """
    messages: Optional[List[LDMessage]] = None
+    # Deprecated: evaluation_metric_key is used instead


Since we are sub 1.0 release as long as we can guarantee the api is always returning the new single key we should be able to just drop this and do a breaking change. They only thing that really makes this breaking is people will need to update their defaults if they defined it. If you want to drop it now update the PR to be "feat!: ".

I won't block if you want to leave this in for a little while but it likely isn't necessary. The real question is how long do we want to continue sending the old values in the API as that is what will break older SDKs.

For now we want to make sure this is non-breaking, but soon we're going to remove "legacy" support. For keeping this change as minimal and safe as possible I'd err on the side of caution and keep it in for the time being.

Add support for custom judges via evaluation metric key

8d01693

knfreemLD changed the title ~~[REL-11511] Add support for custom judges via evaluation metric key~~ feat: add support for custom judges via evaluation metric key Jan 21, 2026

knfreemLD added 2 commits January 21, 2026 10:12

fixed linter issues

350f884

Linting

00a265e

knfreemLD requested a review from jsonbailey January 21, 2026 15:21

knfreemLD marked this pull request as ready for review January 21, 2026 15:43

knfreemLD requested a review from a team as a code owner January 21, 2026 15:43

knfreemLD requested review from andrewklatzke and mattrmc1 January 21, 2026 15:44

jsonbailey requested changes Jan 21, 2026

View reviewed changes

knfreemLD added 2 commits January 21, 2026 12:27

Addressed PR feedback; fixed race condition

d277b49

modified default behaviour

d67d0ab

knfreemLD requested a review from jsonbailey January 21, 2026 17:45

jsonbailey approved these changes Jan 21, 2026

View reviewed changes

mattrmc1 approved these changes Jan 21, 2026

View reviewed changes

knfreemLD mentioned this pull request Jan 22, 2026

feat: Added custom judge support for ai configs launchdarkly/js-core#1073

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for custom judges via evaluation metric key #86

feat: add support for custom judges via evaluation metric key #86

Uh oh!

knfreemLD commented Jan 21, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

jsonbailey Jan 21, 2026

Uh oh!

knfreemLD Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add support for custom judges via evaluation metric key #86

Are you sure you want to change the base?

feat: add support for custom judges via evaluation metric key #86

Uh oh!

Conversation

knfreemLD commented Jan 21, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jsonbailey Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

knfreemLD Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

knfreemLD commented Jan 21, 2026 •

edited by cursor bot

Loading