Skip to content

Conversation

@njshah301
Copy link
Collaborator

@njshah301 njshah301 commented Jan 13, 2026

Introduces the metrics exporter for the MoSAPI system.

  • Implements MosApiMetrics to export TLD and service states to Cloud Monitoring.
  • Maps ICANN MoSAPI status codes to numeric gauges: 1 (UP), 0 (DOWN), and 2 (DISABLED/INCONCLUSIVE).
  • Sets MAX_TIMESERIES_PER_REQUEST to respect Cloud Monitoring API limits

This change is Reviewable

@njshah301 njshah301 changed the title Add MoSApiMetrics exporter Add MosApiMetrics exporter Jan 13, 2026
@njshah301 njshah301 enabled auto-merge January 13, 2026 09:17
@njshah301 njshah301 added kokoro:force-run Force a Kokoro build. and removed kokoro:force-run Force a Kokoro build. labels Jan 13, 2026
@njshah301 njshah301 force-pushed the njshah301/mosapi-metrics-logic branch from ee71228 to 44605b9 Compare January 13, 2026 10:08
Introduces the metrics exporter for the MoSAPI system.

- Implements `MosApiMetrics` to export TLD and service states to Cloud Monitoring.
- Maps ICANN status codes to numeric gauges: 1 (UP), 0 (DOWN), and 2 (DISABLED/INCONCLUSIVE).
- Sets `MAX_TIMESERIES_PER_REQUEST` to 195 to respect Cloud Monitoring API limits
@njshah301 njshah301 force-pushed the njshah301/mosapi-metrics-logic branch from 44605b9 to 9fdf976 Compare January 13, 2026 16:10
@njshah301 njshah301 requested a review from weiminyu January 18, 2026 07:45
Copy link
Collaborator

@gbrodman gbrodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbrodman reviewed 4 files and all commit messages, and made 18 comments.
Reviewable status: 4 of 6 files reviewed, 18 unresolved discussions (waiting on @CydeWeys, @njshah301, @ptkach, and @weiminyu).


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 103 at r1 (raw file):

    // Initialize Metric Descriptors once on startup
    ensureMetricDescriptors();

Don't do work in the constructor. It's bad practice, as it can lead to unexpected oddities.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 146 at r1 (raw file):

          } catch (Exception e) {
            logger.atWarning().withCause(e).log(
                "Failed to create metric descriptors (they may already exist).");

if this is expected to happen regularly, it probably shouldn't be in an exception / warning case


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 156 at r1 (raw file):

      String displayName,
      String description,
      String metricKind,

this is always GAUGE right?


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 161 at r1 (raw file):

      throws IOException {

    List<LabelDescriptor> labelDescriptors = new ArrayList<>();

Use stream-collect (toImmutableList()) instead of manually constructing these


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 190 at r1 (raw file):

          try {
            pushBatchMetrics(states);
          } catch (Throwable t) {

Generally, avoid catching a raw Throwable. It can hide things. We should be more precise.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 196 at r1 (raw file):

  }

  private void pushBatchMetrics(List<TldServiceState> states) throws IOException {

One batch failure shouldn't cause all the others to fail with an IOException. Catch the exception where it occurs, log it, and then try the next batch.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 197 at r1 (raw file):

  private void pushBatchMetrics(List<TldServiceState> states) throws IOException {
    List<TimeSeries> allTimeSeries = new ArrayList<>();

Constructing a list piece by piece over many methods is brittle and it's not immediately clear where everything happens. Instead, convert the stream of TldServiceState into a stream of TimeSeries using flatMap (flatMap instead of just map because it seems like one TldServiceState maps to multiple TimeSeries objects)


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 199 at r1 (raw file):

    List<TimeSeries> allTimeSeries = new ArrayList<>();
    TimeInterval interval =
        new TimeInterval().setEndTime(new DateTime(System.currentTimeMillis()).toString());

Inject a clock and use that for the time instead of calling System


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 206 at r1 (raw file):

      // 2. Service-level Metrics
      Map<String, ServiceStatus> services = state.serviceStatuses();

state::serviceStatuses should use an immutable map if it doesn't already

can this ever even be null? if not, don't worry about the null case.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 227 at r1 (raw file):

  private void addServiceMetrics(
      List<TimeSeries> list,

Use the full type (ImmutableList, ImmutableMap etc) whenever possible for clarity (this goes for elsewhere in the file, too)

If we ever specify the standard List/Map type it's usually because we're forced to by an API we don't control


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 254 at r1 (raw file):

      String suffix, Map<String, String> labels, Number val, TimeInterval interval) {
    Metric metric = new Metric().setType(METRIC_DOMAIN + suffix).setLabels(labels);
    MonitoredResource resource =

i don't think we need to recreate this object every time


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 304 at r1 (raw file):

   */
  private long parseServiceStatus(String status) {
    if (status == null) {

By the time we've reached this point in the code, we should be guaranteed that the status has a non-null value. If the status is null, that means (as far as I'm aware) that the response from MosAPI was bad / wrong in a major way. If that happens, we should be taking care of that error when it happens, not here down the line.


core/src/main/java/google/registry/config/files/default-config.yaml line 651 at r1 (raw file):

  # Metrics Reporting Thread Count
  # Set to 1. Given the current TLD volume and the 5-minute reporting interval,

better phrasing "Defaults to 1".

The rest of this should be phrased in a way that it can apply broadly to any user of Nomulus, not just us. Give guidance on how other people should properly set this field.


core/src/main/java/google/registry/config/RegistryConfigSettings.java line 276 at r1 (raw file):

    public List<String> services;
    public int tldThreadCnt;

no extra line


core/src/main/java/google/registry/mosapi/module/MosApiModule.java line 208 at r1 (raw file):

  /**
   * Provides a single-threaded executor for sequential metrics reporting.

This is a config point, so it's not necessary single-threaded.


core/src/main/java/google/registry/mosapi/module/MosApiModule.java line 210 at r1 (raw file):

   * Provides a single-threaded executor for sequential metrics reporting.
   *
   * <p>Bound to 1 thread because the Google Cloud Monitoring exporter processes batches

"round"?


core/src/test/java/google/registry/mosapi/MosApiMetricsTest.java line 40 at r1 (raw file):

import org.junit.jupiter.api.Test;
import org.mockito.ArgumentCaptor;

nit: link the class being tested in a class-level javadoc comment like we do for other test classes


core/src/main/java/google/registry/config/RegistryConfig.java line 1472 at r1 (raw file):

    @Provides
    @Config("mosapiMetricsThreadCnt")

Use the full name Count, there's no point in saving only two letters when it's already this long

same for the config field too

Copy link
Collaborator Author

@njshah301 njshah301 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored code and resolved commits

@njshah301 made 19 comments and resolved 1 discussion.
Reviewable status: 4 of 6 files reviewed, 17 unresolved discussions (waiting on @CydeWeys, @gbrodman, @ptkach, and @weiminyu).


core/src/main/java/google/registry/config/RegistryConfig.java line 1472 at r1 (raw file):

Previously, gbrodman wrote…

Use the full name Count, there's no point in saving only two letters when it's already this long

same for the config field too

done


core/src/main/java/google/registry/config/RegistryConfigSettings.java line 276 at r1 (raw file):

Previously, gbrodman wrote…

no extra line

done


core/src/main/java/google/registry/config/files/default-config.yaml line 651 at r1 (raw file):

Previously, gbrodman wrote…

better phrasing "Defaults to 1".

The rest of this should be phrased in a way that it can apply broadly to any user of Nomulus, not just us. Give guidance on how other people should properly set this field.

done


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 103 at r1 (raw file):

Previously, gbrodman wrote…

Don't do work in the constructor. It's bad practice, as it can lead to unexpected oddities.

Agreed. I moved the API calls out of the constructor. I implemented in recordStates() so the descriptors are created exactly once when the first batch of metrics is processed.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 146 at r1 (raw file):

Previously, gbrodman wrote…

if this is expected to happen regularly, it probably shouldn't be in an exception / warning case

I have tried to add more try-catch around this process, so it can be more granular


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 156 at r1 (raw file):

Previously, gbrodman wrote…

this is always GAUGE right?

yes


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 161 at r1 (raw file):

Previously, gbrodman wrote…

Use stream-collect (toImmutableList()) instead of manually constructing these

done


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 190 at r1 (raw file):

Previously, gbrodman wrote…

Generally, avoid catching a raw Throwable. It can hide things. We should be more precise.

done


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 196 at r1 (raw file):

Previously, gbrodman wrote…

One batch failure shouldn't cause all the others to fail with an IOException. Catch the exception where it occurs, log it, and then try the next batch.

yeah, good suggestion, I have added try-catch blocks to not block the other batches because of failure in prev. batch


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 197 at r1 (raw file):

Previously, gbrodman wrote…

Constructing a list piece by piece over many methods is brittle and it's not immediately clear where everything happens. Instead, convert the stream of TldServiceState into a stream of TimeSeries using flatMap (flatMap instead of just map because it seems like one TldServiceState maps to multiple TimeSeries objects)

made change accordingly


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 199 at r1 (raw file):

Previously, gbrodman wrote…

Inject a clock and use that for the time instead of calling System

done


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 206 at r1 (raw file):

Previously, gbrodman wrote…

state::serviceStatuses should use an immutable map if it doesn't already

can this ever even be null? if not, don't worry about the null case.

done


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 227 at r1 (raw file):

Previously, gbrodman wrote…

Use the full type (ImmutableList, ImmutableMap etc) whenever possible for clarity (this goes for elsewhere in the file, too)

If we ever specify the standard List/Map type it's usually because we're forced to by an API we don't control

made change accordingly


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 254 at r1 (raw file):

Previously, gbrodman wrote…

i don't think we need to recreate this object every time

I have put it inside constructor


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 304 at r1 (raw file):

Previously, gbrodman wrote…

By the time we've reached this point in the code, we should be guaranteed that the status has a non-null value. If the status is null, that means (as far as I'm aware) that the response from MosAPI was bad / wrong in a major way. If that happens, we should be taking care of that error when it happens, not here down the line.

You are absolutely right. this class should assume the data is valid

I have removed the null checks from this class. Instead, I added a strict validation filter upstream in MosApiStateService.If the MoSAPI response violates the contract (e.g., null status), we now log a severe error and drop that specific TLD there, ensuring this class only ever receives well-formed data


core/src/main/java/google/registry/mosapi/module/MosApiModule.java line 208 at r1 (raw file):

Previously, gbrodman wrote…

This is a config point, so it's not necessary single-threaded.

yeah, refactored


core/src/main/java/google/registry/mosapi/module/MosApiModule.java line 210 at r1 (raw file):

Previously, gbrodman wrote…

"round"?

done


core/src/test/java/google/registry/mosapi/MosApiMetricsTest.java line 40 at r1 (raw file):

Previously, gbrodman wrote…

nit: link the class being tested in a class-level javadoc comment like we do for other test classes

done

@njshah301 njshah301 requested a review from gbrodman January 22, 2026 15:03
Copy link
Collaborator

@gbrodman gbrodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbrodman reviewed 6 files and all commit messages, made 12 comments, and resolved 15 discussions.
Reviewable status: 6 of 8 files reviewed, 12 unresolved discussions (waiting on @CydeWeys, @njshah301, @ptkach, and @weiminyu).


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 156 at r1 (raw file):

Previously, njshah301 (Nilay Shah) wrote…

yes

the point is you can just hardcode it if it's always the same


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 190 at r1 (raw file):

Previously, njshah301 (Nilay Shah) wrote…

done

why are you catching a generic exception here? if the whole push failed it'll get logged, and this could swallow it and make the caller think that it succeeded.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 120 at r2 (raw file):

  // Defines the custom metrics in Cloud Monitoring
  private void ensureMetricDescriptors() {

"ensure" is not the right word. We are creating them. "Ensure" does not sufficiently describe it by itself.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 167 at r2 (raw file):

      String valueType,
      ImmutableList<String> labelKeys) {
    try {

one should keep the scope of try/catch blocks as small as reasonably possible, only where the expected exception can possibly be thrown


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 192 at r2 (raw file):

      monitoringClient.projects().metricDescriptors().create(projectName, descriptor).execute();
    } catch (GoogleJsonResponseException e) {
      if (e.getStatusCode() == METRICS_ALREADY_EXIST) {

is there a way to check if the metric already exists via a "normal" call, rather than via exception?


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 200 at r2 (raw file):

            metricTypeSuffix, e.getStatusCode());
      }
    } catch (Exception e) {

If we failed to create the metrics with an unexpected exception (i.e., not just "the metric already exists"), I think it makes sense to fail catastrophically.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 207 at r2 (raw file):

  /** Accepts a list of states and processes them in a single async batch task. */
  public void recordStates(ImmutableList<TldServiceState> states) {

this is the main entry point of whatever calls this class, so it should be at the top (below the constructor)


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 228 at r2 (raw file):

        states.stream()
            .flatMap(state -> createMetricsForState(state, interval))
            .collect(toImmutableList());

i suppose technically we don't even need to collect these into a list -- Iterators.partition(someStream.iterator(), BATCH_SIZE).forEach(....)


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 241 at r2 (raw file):

            "Successfully pushed batch of %d time series to Cloud Monitoring.", chunk.size());
      } catch (IOException e) {
        logger.atWarning().withCause(e).log(

we should keep track of how many batches succeeded and how many batches failed, and log each of those


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 152 at r1 (raw file):

  private void createMetricDescriptor(
      String projectName,

this also doesn't need to be a param because it's just a constant based off project ID


core/src/main/java/google/registry/config/RegistryConfigSettings.java line 275 at r2 (raw file):

    public List<String> tlds;
    public List<String> services;
    public int tldThreadCnt;

still using the abbreviated form here


core/src/main/java/google/registry/mosapi/module/MosApiModule.java line 210 at r2 (raw file):

   * Provides executor for metrics reporting.
   *
   * <p>We should use a single thread to ensure metric batches are sent sequentially. This acts as a

If this is actually true, that "we should use a single thread", why is this even a configuration point?

- Kept projectName as part constant instead of inside method signature
- Added Summary logs for metrics execution
- Metric Executor defaults to Single Threaded
@njshah301 njshah301 requested a review from gbrodman January 23, 2026 12:37
Copy link
Collaborator Author

@njshah301 njshah301 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have resolved all comments and also tested current version in lower environment, which works as expected with clear summary for metrics push and metric descriptor

@njshah301 partially reviewed 8 files and made 13 comments.
Reviewable status: 2 of 8 files reviewed, 12 unresolved discussions (waiting on @CydeWeys, @gbrodman, @ptkach, and @weiminyu).


core/src/main/java/google/registry/config/RegistryConfigSettings.java line 275 at r2 (raw file):

Previously, gbrodman wrote…

still using the abbreviated form here

done


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 152 at r1 (raw file):

Previously, gbrodman wrote…

this also doesn't need to be a param because it's just a constant based off project ID

done


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 156 at r1 (raw file):

Previously, gbrodman wrote…

the point is you can just hardcode it if it's always the same

it's true that all current metrics are gauges, I've kept this as a parameter to ensure the helper method remains a general-purpose utility for future MoSAPI metrics. This allows us to easily add other MetricKind metrics in the future without having to modify the method signature later.


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 190 at r1 (raw file):

Previously, gbrodman wrote…

why are you catching a generic exception here? if the whole push failed it'll get logged, and this could swallow it and make the caller think that it succeeded.

Since this is an asynchronous task submitted to an executor, the caller returns immediately and never waits for the result. Therefore, the caller is already decoupled and does not assume success or failure.

We rely on Cloud Monitoring alerts to detect if metrics stop flowing for extended periods. This try/catch block acts as a safety boundary to ensure that unexpected crashes are captured and logged. I have upgraded the log level to SEVERE to ensure these errors are clearly visible for debugging


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 120 at r2 (raw file):

Previously, gbrodman wrote…

"ensure" is not the right word. We are creating them. "Ensure" does not sufficiently describe it by itself.

named "createCustomMetricDescriptors"


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 167 at r2 (raw file):

Previously, gbrodman wrote…

one should keep the scope of try/catch blocks as small as reasonably possible, only where the expected exception can possibly be thrown

done


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 192 at r2 (raw file):

Previously, gbrodman wrote…

is there a way to check if the metric already exists via a "normal" call, rather than via exception?

Actually, the Google API throws an exception if we try to get() a metric that doesn't exist, so we would still require try-catch logic either way. Since this initialization runs only once per instance at startup, I recommend keeping it as it is to avoid unnecessary API calls


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 200 at r2 (raw file):

Previously, gbrodman wrote…

If we failed to create the metrics with an unexpected exception (i.e., not just "the metric already exists"), I think it makes sense to fail catastrophically.

Actually, I prefer the 'Soft Failure' approach here. Cloud Monitoring implicitly auto-creates metric definitions if they are missing during the first createTimeSeries call (just without the custom display name/description).

If explicit descriptor creation fails, I'd rather log the warning and proceed to attempt writing the data rather than throwing an exception and dropping the entire metrics batch


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 207 at r2 (raw file):

Previously, gbrodman wrote…

this is the main entry point of whatever calls this class, so it should be at the top (below the constructor)

done


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 228 at r2 (raw file):

Previously, gbrodman wrote…

i suppose technically we don't even need to collect these into a list -- Iterators.partition(someStream.iterator(), BATCH_SIZE).forEach(....)

Done. That's a great suggestion. I switched to Iterators.partition(stream.iterator(), ...) to process the metrics lazily


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 241 at r2 (raw file):

Previously, gbrodman wrote…

we should keep track of how many batches succeeded and how many batches failed, and log each of those

Done. I updated the logic to explicitly track the number of successful and failed batches. A summary log is now written at the end of the process


core/src/main/java/google/registry/mosapi/module/MosApiModule.java line 210 at r2 (raw file):

Previously, gbrodman wrote…

If this is actually true, that "we should use a single thread", why is this even a configuration point?

I agree, I have changed it to keep the executor as default single thread

Copy link
Collaborator

@gbrodman gbrodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbrodman reviewed 7 files and all commit messages, made 7 comments, and resolved 10 discussions.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @CydeWeys, @njshah301, @ptkach, and @weiminyu).


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 156 at r1 (raw file):

Previously, njshah301 (Nilay Shah) wrote…

it's true that all current metrics are gauges, I've kept this as a parameter to ensure the helper method remains a general-purpose utility for future MoSAPI metrics. This allows us to easily add other MetricKind metrics in the future without having to modify the method signature later.

don't pre-emptively parameterize things that don't need it, e.g. go/tott/703


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 200 at r2 (raw file):

Previously, njshah301 (Nilay Shah) wrote…

Actually, I prefer the 'Soft Failure' approach here. Cloud Monitoring implicitly auto-creates metric definitions if they are missing during the first createTimeSeries call (just without the custom display name/description).

If explicit descriptor creation fails, I'd rather log the warning and proceed to attempt writing the data rather than throwing an exception and dropping the entire metrics batch

what happens if the metrics get auto-created (without the display name and description) and later we try to create them with the display name and description?


core/src/test/java/google/registry/mosapi/MosApiMetricsTest.java line 123 at r3 (raw file):

  @Test
  void testRecordStates_mapsStatusesToCorrectValues() throws IOException {

nit: we don't need newlines at the beginning of tests


core/src/test/java/google/registry/mosapi/MosApiMetricsTest.java line 205 at r3 (raw file):

              return (Number) ts.getPoints().get(0).getValue().getInt64Value();
            })
        .orElseThrow(

ehhh if we're never expecting this to actually be hit, then we can just return the value (.get()) rather than worrying about an explicit assertion error

or just return an Optional<Number> and use assertThat(someOptional).hasValue(2)


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 109 at r3 (raw file):

      @Config("projectId") String projectId,
      Clock clock,
      @Named("mosapiMetricsExecutor") ExecutorService executor) {

Revisiting this, why are we using an executor at all if we're guaranteed to be using a single thread executor?

executors complicate things because we generally want to (and, currently in this PR, do not) track the results


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 126 at r3 (raw file):

  public void recordStates(ImmutableList<TldServiceState> states) {
    // If this is the first time we are recording, ensure descriptors exist.
    if (isDescriptorInitialized.compareAndSet(false, true)) {

theoretically this could be called multiple times with different instances and we might end up in a weird state with a race condition.

We have a way of doing cross-instance locking, see LockHandler and related classes. It's probably best to lock the "create the metrics if necessary" code


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 134 at r3 (raw file):

            pushBatchMetrics(states);
          } catch (Exception e) {
            logger.atSevere().withCause(e).log("Async batch metric push failed.");

Why are we swallowing this exception entirely? I feel like by default we'd want the caller to know that it failed.

Copy link
Collaborator Author

@njshah301 njshah301 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored junit test. Thanks!

@njshah301 made 6 comments and resolved 1 discussion.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @CydeWeys, @gbrodman, @ptkach, and @weiminyu).


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 109 at r3 (raw file):

Previously, gbrodman wrote…

Revisiting this, why are we using an executor at all if we're guaranteed to be using a single thread executor?

executors complicate things because we generally want to (and, currently in this PR, do not) track the results

The primary reason for the Executor is to ensure the metrics upload is non-blocking

Even though we enforce sequential execution (single thread) to respect API rate limits, the createTimeSeries call involves blocking network I/O. If we ran this on the caller's thread, we would pause the main MosApiStateService workflow while waiting for the Cloud Monitoring API to respond. I recommend keeping the current implementation for this reason


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 126 at r3 (raw file):

Previously, gbrodman wrote…

theoretically this could be called multiple times with different instances and we might end up in a weird state with a race condition.

We have a way of doing cross-instance locking, see LockHandler and related classes. It's probably best to lock the "create the metrics if necessary" code

That is a valid concern for general distributed tasks, but in this specific case, we rely on our Cloud Scheduler configuration to enforce a singleton execution.

We deliberately limit this to a single execution path to respect MoSAPI's strict concurrency constraints. Since the architecture guarantees that only one instance handles this flow at a time, we don't require cross-instance locking for this design


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 134 at r3 (raw file):

Previously, gbrodman wrote…

Why are we swallowing this exception entirely? I feel like by default we'd want the caller to know that it failed.

Since this logic runs asynchronously inside executor.execute(), the original caller has already returned by the time this exception occurs. If we re-throw the exception here, it would simply terminate the background thread. By catching and logging it as SEVERE, we ensure the failure is clearly visible in Cloud Logging.Also we will know if we don't get metrics in stipulated time from cloud monitoring so that case will be covered. I recommend keeping the current implementation for this reason


core/src/test/java/google/registry/mosapi/MosApiMetricsTest.java line 123 at r3 (raw file):

Previously, gbrodman wrote…

nit: we don't need newlines at the beginning of tests

done


core/src/test/java/google/registry/mosapi/MosApiMetricsTest.java line 205 at r3 (raw file):

Previously, gbrodman wrote…

ehhh if we're never expecting this to actually be hit, then we can just return the value (.get()) rather than worrying about an explicit assertion error

or just return an Optional<Number> and use assertThat(someOptional).hasValue(2)

done

@njshah301 njshah301 requested a review from gbrodman January 26, 2026 21:15
Copy link
Collaborator Author

@njshah301 njshah301 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njshah301 made 1 comment.
Reviewable status: 7 of 8 files reviewed, 6 unresolved discussions (waiting on @CydeWeys, @gbrodman, @ptkach, and @weiminyu).


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 200 at r2 (raw file):

Previously, gbrodman wrote…

what happens if the metrics get auto-created (without the display name and description) and later we try to create them with the display name and description?

If the metric is auto-created first, the code will simply update the existing definition. The createMetricDescriptor call treats it as a metadata update, so the Display Name and Description will be successfully added without any errors

Copy link
Collaborator Author

@njshah301 njshah301 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved comments

@njshah301 made 2 comments.
Reviewable status: 6 of 8 files reviewed, 6 unresolved discussions (waiting on @CydeWeys, @gbrodman, @ptkach, and @weiminyu).


core/src/main/java/google/registry/mosapi/MosApiMetrics.java line 156 at r1 (raw file):

Previously, gbrodman wrote…

don't pre-emptively parameterize things that don't need it, e.g. go/tott/703

ok, changed to GAUGE and we will parameterize if we have any need in future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants