Skip to content

Conversation

@reez
Copy link

@reez reez commented Oct 1, 2025

Description

This PR addresses a panic path in stop gap scan loop. Removes expect used to compute gap boundary and instead tracks consecutive_unused with gap_limit = max(stop_gap, parallel_requests) to decide when to stop scanning.

Spurred by trying to address bitcoindevkit/bdk_wallet#30 for Esplora

Notes to the reviewers

Open to any and all feedback on this.

Behavior is unchanged for typical cases, it just avoids a panic in the control flow.

Changelog notice

### Fix:
- Avoid a panic in the Esplora stop‑gap scan loop by tracking consecutive unused scripts to compute the gap boundary.

Checklists

All Submissions:

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

@oleonardolima oleonardolima moved this to In Progress in BDK Chain Oct 1, 2025
@reez reez marked this pull request as ready for review October 6, 2025 16:48
@reez reez changed the title (draft) refactor(esplora): clear remaining panic paths refactor(esplora): clear remaining panic paths Oct 9, 2025
Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Some great simplifications in logic. Some parts are over-engineered.

Note that panics and expects indicate that "we should never hit this - unless there is a bug!" and I think it's justified to have them (where reasonable).

@reez
Copy link
Author

reez commented Oct 30, 2025

Updated PR due to great feedback from Evan, Mammal, Leonardo.

Noteworthy:

  • Introduced a crate-wide Error enum (in lib.rs) with Client and Checkpoint variants. Both async and blocking extensions now return this so real Esplora failures bubble up via Client while checkpoint ordering issues surface as Checkpoint instead of panicking or masquerading as InvalidResponse
  • Removed the custom ThreadPanic and reverted as suggested so we only guard against genuine remote errors
  • Dropped the processed_any guard added previously
  • Eliminated tests that provided no major value

Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid returning errors for internal logic bugs in BDK itself.

@reez reez force-pushed the esplora branch 3 times, most recently from a5bf97f to d4244e3 Compare November 13, 2025 22:48
@reez
Copy link
Author

reez commented Nov 18, 2025

I believe all review comments are addressed now; to recap current state this PR adds logic around removing panic tied to remote input, while leaving panics where appropriate based on PR review feedback. Thanks again for taking time to review this as I iterated on it (happy to continue tweaking if needed)!

Copy link
Collaborator

@ValuedMammal ValuedMammal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d4244e3

Note: I think it would make sense to use gap_limit = stop_gap.max(parallel_requests). Related discussion #2053 (comment) and bitcoindevkit/bdk_wallet#63.

@ValuedMammal ValuedMammal moved this from In Progress to Needs Review in BDK Chain Dec 2, 2025
@ValuedMammal ValuedMammal added this to the Chain 0.24.0 milestone Dec 2, 2025
@ValuedMammal ValuedMammal added the bug Something isn't working label Dec 2, 2025
@reez
Copy link
Author

reez commented Dec 2, 2025

ACK d4244e3

Note: I think it would make sense to use gap_limit = stop_gap.max(parallel_requests). Related discussion #2053 (comment) and bitcoindevkit/bdk_wallet#63.

@ValuedMammal I just updated PR to use gap_limit = stop_gap.max(parallel_requests) now 👍

@ValuedMammal
Copy link
Collaborator

ValuedMammal commented Dec 2, 2025

Need to debug a CI fail. Maybe try rebasing this PR onto master.

Edit: As a workaround, pinning the rust toolchain to nightly-2025-11-26 seems to have fixed it https://github.com/ValuedMammal/bdk/actions/runs/19878874973/job/56972371610, although I haven't pinpointed the exact issue.

ACK 074aed1

@thunderbiscuit
Copy link
Member

This is a bit of a blocker for the 2.3 release of bdk-ffi, I'd like to include it in there if possible. @reez do you mind trying the pin workaround proposed by @ValuedMammal?

@reez
Copy link
Author

reez commented Dec 10, 2025

This is a bit of a blocker for the 2.3 release of bdk-ffi, I'd like to include it in there if possible. @reez do you mind trying the pin workaround proposed by @ValuedMammal?

@thunderbiscuit sure let me give my context

@ValuedMammal has ACK’d this PR a few times, so after his last comment #2053 (comment) and because I didn’t want him to have to keep review+ACK’ing if no change actually needed on this PR I pinged him to ask if he wanted me to try either/both of those in this PR or keep the PR as is. My understanding is keep the PR as is (and maybe there would be a new issue opened scoped to resolve the coverage job), so I explicitly did not do the pin workaround for those reasons.

Additionally @oleonardolima has reviewed this previously and also let me know he’d be giving this PR a final review too, I know he’s been traveling a bunch though so there’s no rush from me on that.

Then from the bdk-ffi side of things my view point is this would be nice-to-have but is not a blocker for 2.3 release.

@ValuedMammal
Copy link
Collaborator

and maybe there would be a new issue opened scoped to resolve the coverage job

The CI fix is now in a separate PR #2080.

@reez
Copy link
Author

reez commented Dec 10, 2025

#2080

Thanks for the heads up!

I'll try to keep an eye on that PR so when it gets merged I can rebase this PR on it if needed for this PR to be merged.

@reez
Copy link
Author

reez commented Dec 12, 2025

#2080

Thanks for the heads up!

I'll try to keep an eye on that PR so when it gets merged I can rebase this PR on it if needed for this PR to be merged.

Rebased PR on master now that #2080 was merged, CI passing ✅

Copy link
Collaborator

@ValuedMammal ValuedMammal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 92dc6dd

Copy link
Contributor

@oleonardolima oleonardolima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 92dc6dd

It looks good for the removal of the single last_index use of .expect().

As a side-note in a follow-up I'd like to take a look on how the remaining ones could be reached (except the join threads one, we discussed that before).

I left a single nit that is not a blocker as well, can be addressed in a follow-up PR.

I'd just suggest updating the PR title and description before merging it.

@ValuedMammal
Copy link
Collaborator

Totally agree on updating the PR, i.e. what panic path remained, and how does this commit fix that?

@reez reez changed the title refactor(esplora): clear remaining panic paths Remove panic from stop gap Jan 12, 2026
@reez reez changed the title Remove panic from stop gap Remove panic from stop gap loop Jan 12, 2026
@reez reez changed the title Remove panic from stop gap loop Remove panic from stop gap scan loop Jan 12, 2026
@reez
Copy link
Author

reez commented Jan 12, 2026

@oleonardolima @ValuedMammal I removed ClientError alias, updated the PR title+description, and amended the commit message to hopefully reflect current state, but let me know if anything is off there. Thanks!

Copy link
Contributor

@oleonardolima oleonardolima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 741a30c

@ValuedMammal
Copy link
Collaborator

ACK 741a30c

Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 741a30c

let mut last_index = Option::<u32>::None;
let mut last_active_index = Option::<u32>::None;
let mut consecutive_unused = 0usize;
let gap_limit = stop_gap.max(parallel_requests);
Copy link
Member

@evanlinjin evanlinjin Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add comments explaining rationale. Same for the blocking version.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with brief rationale comments in both async and blocking. Happy to tweak wording if anyone wants anything different.

https://github.com/bitcoindevkit/bdk/compare/741a30c409bfe2f148b0d11b9905c7fc5c0ef948..81a2d280c975d98c093ccf006ae9d5674d43147f

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reez that's not really a rationale though. The reader would like to know why we are doing this, not what the code is doing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evanlinjin is this more in line with what would be better

// Use consecutive_unused so stop gap is based on consecutive empty spks instead of the last scanned index.
let mut consecutive_unused = 0usize;
// Use stop_gap.max(parallel_requests) since we check at least 1 spk-index for every request in parallel_requests.
let gap_limit = stop_gap.max(parallel_requests);

If so I'll update the pr with these comments instead.

last_index + 1 >= stop_gap as u32
};
if gap_limit_reached {
if consecutive_unused >= gap_limit {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reez this is the only place that gap_limit is used. Can you explain what is functionally different when the gap_limit is minimally bound to parallel_requests vs when it's not?

I may be missing something, but I'm failing to see a clear rationale for the bound.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at this original issue: bitcoindevkit/bdk_wallet#63

I think bounding to 1 makes more sense - since a stop_gap of 0 is undefined and the behaviour we want is to "treat 0 as 1".

Bounding it to parallel requests does the same thing, but the intention is obscured imo. It seems like the intention is to have consecutive_used be a multiple of gap_limit if the gap_limit is originally set lower than parallel_requests? However we are using >=, and this breaks down when gap_limit > parallel_requests anyway.

If the rationale is not obvious, the reader might assume they are missing something and spend time trying to understand it and then find out there is nothing to be understood.

Copy link
Member

@evanlinjin evanlinjin Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider parallel_requests=10, stop_gap=5, first batch processes indices 0-9 where indices 0-4 have transactions and 5-9 are empty:

Original code:

  • last_active_index=4, last_index=9
  • Check: 9 >= 4 + 5? → true, break

If we do .max(1):

  • consecutive_unused=5, gap_limit=5
  • 5 >= 5? → true, break ✓ (matches original)

If we keep this PR as is with .max(parallel_requests):

  • consecutive_unused=5, gap_limit=10
  • 5 >= 10? → false, continue ✗ (different from original)

So .max(parallel_requests) changes the effective stop_gap when the user provides a value less than parallel_requests, which is a behavioral change from the original code. .max(1) preserves the original semantics while handling the stop_gap=0 edge case.

Suggestion

Change to .max(1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: Needs Review

Development

Successfully merging this pull request may close these issues.

5 participants