Skip to content

Conversation

@major
Copy link
Contributor

@major major commented Jan 23, 2026

Description

Integrates Splunk HEC telemetry into the rlsapi v1 /infer endpoint. This is the final PR in the Splunk integration series (building on #1031 and #1032).

Changes:

  • Add _get_rh_identity_context() helper to extract org_id/system_id from request.state
  • Add _queue_splunk_event() to build and queue telemetry events via FastAPI BackgroundTasks
  • Add timing measurement around inference calls
  • Queue infer_with_llm events on success, infer_error events on failure
  • Add user-facing documentation (docs/splunk.md)
  • Add developer documentation (src/observability/README.md)

Events are sent asynchronously and never block or affect the main request flow.

Type of change

  • New feature
  • Documentation Update
  • Unit tests improvement
  • Integration tests improvement

Tools used to create PR

  • Assisted-by: Claude (Anthropic)
  • Generated by: N/A

Related Tickets & Documents

  • Related Issue # RSPEED-2326
  • Closes # N/A (part of multi-PR implementation)

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  1. Unit tests: uv run pytest tests/unit/app/endpoints/test_rlsapi_v1.py -v (24 tests)
  2. Integration tests: uv run pytest tests/integration/endpoints/test_rlsapi_v1_integration.py -v (11 tests)
  3. Observability tests: uv run pytest tests/unit/observability/ -v (11 tests)

New tests verify:

  • RH Identity context extraction (with/without identity data, empty values)
  • Splunk event queuing on successful inference
  • Splunk error event queuing on failed inference
  • Event payload includes correct org_id/system_id from RH Identity

Summary by CodeRabbit

  • New Features

    • Splunk HEC integration for async inference telemetry — includes timing, request identity context, and events for successes and error paths.
  • Documentation

    • Added Splunk HEC configuration guide and expanded Observability docs with event formats, troubleshooting, graceful degradation, and extension guidance.
  • Tests

    • Updated/added unit and integration tests to cover telemetry behavior, identity extraction, and background task handling.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

Walkthrough

Adds Splunk HEC telemetry: the /rlsapi/v1/infer endpoint now accepts FastAPI Request and BackgroundTasks, extracts RH Identity context (with fallback), measures inference time, and queues asynchronous Splunk events for successes and specific error paths. Documentation and tests updated.

Changes

Cohort / File(s) Summary
Documentation
docs/splunk.md, src/observability/README.md
New Splunk HEC integration docs and observability README covering config, token handling/rotation, HEC envelope & event formats, emitting endpoints, graceful degradation, troubleshooting, and extension guidance.
Endpoint & Telemetry
src/app/endpoints/rlsapi_v1.py
infer_endpoint signature extended to accept Request and BackgroundTasks; added AUTH_DISABLED, _get_rh_identity_context, _queue_splunk_event, CLA version extraction, and timing instrumentation. Queues Splunk events on success and on API errors (APIConnectionError, RateLimitError, APIStatusError).
Integration Tests
tests/integration/endpoints/test_rlsapi_v1_integration.py
Added helpers/fixtures to create mock Request and BackgroundTasks; updated integration tests to pass these mocks into infer_endpoint across scenarios.
Unit Tests
tests/unit/app/endpoints/test_rlsapi_v1.py
Exported and tested AUTH_DISABLED and _get_rh_identity_context. Added mock request/background helpers; refactored tests to pass mocks and added assertions for RH Identity extraction and Splunk event queuing behavior.

Sequence Diagram

sequenceDiagram
    participant Client
    participant InferEndpoint as "infer_endpoint\n(Request, BackgroundTasks)"
    participant RHContext as "RH Identity\nContext"
    participant Processor as "Inference\nProcessor"
    participant Background as "BackgroundTasks"
    participant SplunkHEC as "Splunk HEC"

    Client->>InferEndpoint: POST /rlsapi/v1/infer
    InferEndpoint->>RHContext: extract org_id, system_id
    RHContext-->>InferEndpoint: context or AUTH_DISABLED
    InferEndpoint->>InferEndpoint: start inference timer
    InferEndpoint->>Processor: run inference
    alt Success
        Processor-->>InferEndpoint: response_text
        InferEndpoint->>InferEndpoint: compute inference_time
        InferEndpoint->>Background: add_task(_queue_splunk_event, success payload + RH context)
    else API/RateLimit/Status Error
        Processor-->>InferEndpoint: raises error
        InferEndpoint->>InferEndpoint: compute inference_time
        InferEndpoint->>Background: add_task(_queue_splunk_event, error payload + RH context)
    end
    InferEndpoint-->>Client: response or HTTP error
    Background->>SplunkHEC: async send event
    SplunkHEC-->>Background: acknowledgement
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • tisnik
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: integrating Splunk telemetry into the v1 /infer endpoint, which is the primary objective across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@major major force-pushed the rspeed-2326-splunk-endpoint branch from 0ae1460 to 40b19fa Compare January 26, 2026 17:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@docs/splunk.md`:
- Around line 139-142: The fenced log snippet in docs/splunk.md is missing a
language tag which triggers markdownlint MD040; update the code fence
surrounding the log line "Splunk HEC request failed with status 403: Invalid
token" from ``` to ```text so the block is explicitly marked as plain text and
renders/lints correctly.

In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 262-317: The except branches currently call
_queue_splunk_event(...) then raise HTTPException(...), which drops FastAPI
background tasks and loses telemetry; instead, after recording inference_time,
metrics, logging, and calling _queue_splunk_event(...), construct the
appropriate response model (ServiceUnavailableResponse, QuotaExceededResponse,
or InternalServerErrorResponse.generic()), convert it to a dict via
model_dump(), and return a fastapi.responses.JSONResponse with that body, the
correct status_code, and background=background_tasks so the queued telemetry
runs; update the handlers for APIConnectionError, RateLimitError, and
APIStatusError (the except blocks around retrieve_simple_response) to return
JSONResponse(...) rather than raising HTTPException.

In `@tests/integration/endpoints/test_rlsapi_v1_integration.py`:
- Around line 40-46: In _create_mock_request replace the deletion of the
non-existent attribute (del mock_request.state.rh_identity_data) with explicitly
setting mock_request.state.rh_identity_data = None so it won't raise
AttributeError; update the mock setup in the _create_mock_request function to
assign None instead of deleting the attribute to reflect absent identity data.

In `@tests/unit/app/endpoints/test_rlsapi_v1.py`:
- Around line 40-50: The helper _create_mock_request should avoid
unconditionally deleting mock_request.state.rh_identity_data because del will
raise AttributeError if the attribute doesn't exist; instead, check for
existence (e.g., hasattr/getattr) and only delete or set to None when present.
Update the _create_mock_request implementation to conditionally remove or clear
mock_request.state.rh_identity_data using a safe existence check so tests won’t
break when the attribute was never set.
🧹 Nitpick comments (1)
src/app/endpoints/rlsapi_v1.py (1)

182-196: Bring helper docstrings up to Google style.

_get_cla_version and _queue_splunk_event need full Args/Returns sections to match module conventions.

✍️ Suggested docstring shape
-def _get_cla_version(request: Request) -> str:
-    """Extract CLA version from User-Agent header."""
+def _get_cla_version(request: Request) -> str:
+    """Extract CLA version from the User-Agent header.
+
+    Args:
+        request: The FastAPI request object.
+
+    Returns:
+        The User-Agent header value or an empty string.
+    """
@@
 def _queue_splunk_event(
@@
 ) -> None:
-    """Build and queue a Splunk telemetry event for background sending."""
+    """Build and queue a Splunk telemetry event for background sending.
+
+    Args:
+        background_tasks: FastAPI background tasks queue.
+        infer_request: Inference request payload.
+        request: FastAPI request for headers/state.
+        request_id: Unique request identifier.
+        response_text: LLM response or error message.
+        inference_time: Total inference duration in seconds.
+        sourcetype: Splunk sourcetype (e.g., infer_with_llm).
+    """
As per coding guidelines, please keep Google-style docstrings for all functions.

@major major force-pushed the rspeed-2326-splunk-endpoint branch from 40b19fa to a15a818 Compare January 26, 2026 19:34
…endpoint

- Add _get_rh_identity_context() to extract org_id/system_id from request.state
- Add _queue_splunk_event() to build and queue telemetry events via BackgroundTasks
- Add timing measurement around inference calls
- Queue infer_with_llm events on success, infer_error on failure
- Add unit tests for RH Identity context extraction and Splunk integration
- Update integration tests for new endpoint signature
- Add user-facing docs (docs/splunk.md) and developer docs (src/observability/README.md)

Signed-off-by: Major Hayden <major@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant