Remove panic from stop gap scan loop #2053

reez · 2025-10-01T13:23:27Z

Description

This PR addresses a panic path in stop gap scan loop. Removes expect used to compute gap boundary and instead tracks consecutive_unused with gap_limit = max(stop_gap, parallel_requests) to decide when to stop scanning.

Spurred by trying to address bitcoindevkit/bdk_wallet#30 for Esplora

Notes to the reviewers

Open to any and all feedback on this.

Behavior is unchanged for typical cases, it just avoids a panic in the control flow.

Changelog notice

### Fix:
- Avoid a panic in the Esplora stop‑gap scan loop by tracking consecutive unused scripts to compute the gap boundary.

Checklists

All Submissions:

I followed the contribution guidelines

New Features:

I've added tests for the new feature
I've added docs for the new feature

Bugfixes:

This pull request breaks the existing API
I've added tests to reproduce the issue which are now passing
I'm linking the issue being fixed by this PR

crates/esplora/src/async_ext.rs

crates/esplora/src/blocking_ext.rs

evanlinjin

Thanks for working on this!

Some great simplifications in logic. Some parts are over-engineered.

Note that panics and expects indicate that "we should never hit this - unless there is a bug!" and I think it's justified to have them (where reasonable).

crates/esplora/src/blocking_ext.rs

crates/esplora/src/async_ext.rs

crates/esplora/src/blocking_ext.rs

reez · 2025-10-30T19:38:35Z

Updated PR due to great feedback from Evan, Mammal, Leonardo.

Noteworthy:

Introduced a crate-wide Error enum (in lib.rs) with Client and Checkpoint variants. Both async and blocking extensions now return this so real Esplora failures bubble up via Client while checkpoint ordering issues surface as Checkpoint instead of panicking or masquerading as InvalidResponse
Removed the custom ThreadPanic and reverted as suggested so we only guard against genuine remote errors
Dropped the processed_any guard added previously
Eliminated tests that provided no major value

evanlinjin

Avoid returning errors for internal logic bugs in BDK itself.

crates/esplora/src/lib.rs

crates/esplora/src/blocking_ext.rs

reez · 2025-11-18T22:58:51Z

I believe all review comments are addressed now; to recap current state this PR adds logic around removing panic tied to remote input, while leaving panics where appropriate based on PR review feedback. Thanks again for taking time to review this as I iterated on it (happy to continue tweaking if needed)!

ValuedMammal

ACK d4244e3

Note: I think it would make sense to use gap_limit = stop_gap.max(parallel_requests). Related discussion #2053 (comment) and bitcoindevkit/bdk_wallet#63.

reez · 2025-12-02T16:45:33Z

ACK d4244e3

Note: I think it would make sense to use gap_limit = stop_gap.max(parallel_requests). Related discussion #2053 (comment) and bitcoindevkit/bdk_wallet#63.

@ValuedMammal I just updated PR to use gap_limit = stop_gap.max(parallel_requests) now 👍

ValuedMammal · 2025-12-02T16:58:44Z

Need to debug a CI fail. Maybe try rebasing this PR onto master.

Edit: As a workaround, pinning the rust toolchain to nightly-2025-11-26 seems to have fixed it https://github.com/ValuedMammal/bdk/actions/runs/19878874973/job/56972371610, although I haven't pinpointed the exact issue.

ACK 074aed1

thunderbiscuit · 2025-12-10T15:58:47Z

This is a bit of a blocker for the 2.3 release of bdk-ffi, I'd like to include it in there if possible. @reez do you mind trying the pin workaround proposed by @ValuedMammal?

reez · 2025-12-10T16:29:26Z

This is a bit of a blocker for the 2.3 release of bdk-ffi, I'd like to include it in there if possible. @reez do you mind trying the pin workaround proposed by @ValuedMammal?

@thunderbiscuit sure let me give my context

@ValuedMammal has ACK’d this PR a few times, so after his last comment #2053 (comment) and because I didn’t want him to have to keep review+ACK’ing if no change actually needed on this PR I pinged him to ask if he wanted me to try either/both of those in this PR or keep the PR as is. My understanding is keep the PR as is (and maybe there would be a new issue opened scoped to resolve the coverage job), so I explicitly did not do the pin workaround for those reasons.

Additionally @oleonardolima has reviewed this previously and also let me know he’d be giving this PR a final review too, I know he’s been traveling a bunch though so there’s no rush from me on that.

Then from the bdk-ffi side of things my view point is this would be nice-to-have but is not a blocker for 2.3 release.

ValuedMammal · 2025-12-10T17:14:11Z

and maybe there would be a new issue opened scoped to resolve the coverage job

The CI fix is now in a separate PR #2080.

reez · 2025-12-10T17:18:58Z

#2080

Thanks for the heads up!

I'll try to keep an eye on that PR so when it gets merged I can rebase this PR on it if needed for this PR to be merged.

reez · 2025-12-12T14:29:41Z

#2080

Thanks for the heads up!

I'll try to keep an eye on that PR so when it gets merged I can rebase this PR on it if needed for this PR to be merged.

Rebased PR on master now that #2080 was merged, CI passing ✅

ValuedMammal

ACK 92dc6dd

oleonardolima

utACK 92dc6dd

It looks good for the removal of the single last_index use of .expect().

As a side-note in a follow-up I'd like to take a look on how the remaining ones could be reached (except the join threads one, we discussed that before).

I left a single nit that is not a blocker as well, can be addressed in a follow-up PR.

I'd just suggest updating the PR title and description before merging it.

crates/esplora/src/async_ext.rs

ValuedMammal · 2026-01-09T22:13:36Z

Totally agree on updating the PR, i.e. what panic path remained, and how does this commit fix that?

reez · 2026-01-12T19:37:57Z

@oleonardolima @ValuedMammal I removed ClientError alias, updated the PR title+description, and amended the commit message to hopefully reflect current state, but let me know if anything is off there. Thanks!

oleonardolima

utACK 741a30c

ValuedMammal · 2026-01-14T02:30:03Z

ACK 741a30c

evanlinjin

ACK 741a30c

evanlinjin · 2026-01-14T11:53:25Z

crates/esplora/src/async_ext.rs

-    let mut last_index = Option::<u32>::None;
    let mut last_active_index = Option::<u32>::None;
+    let mut consecutive_unused = 0usize;
+    let gap_limit = stop_gap.max(parallel_requests);


Nit: Add comments explaining rationale. Same for the blocking version.

Updated with brief rationale comments in both async and blocking. Happy to tweak wording if anyone wants anything different.

https://github.com/bitcoindevkit/bdk/compare/741a30c409bfe2f148b0d11b9905c7fc5c0ef948..81a2d280c975d98c093ccf006ae9d5674d43147f

@reez that's not really a rationale though. The reader would like to know why we are doing this, not what the code is doing.

@evanlinjin is this more in line with what would be better

// Use consecutive_unused so stop gap is based on consecutive empty spks instead of the last scanned index. let mut consecutive_unused = 0usize; // Use stop_gap.max(parallel_requests) since we check at least 1 spk-index for every request in parallel_requests. let gap_limit = stop_gap.max(parallel_requests);

If so I'll update the pr with these comments instead.

evanlinjin · 2026-01-16T00:14:24Z

crates/esplora/src/async_ext.rs

-            last_index + 1 >= stop_gap as u32
-        };
-        if gap_limit_reached {
+        if consecutive_unused >= gap_limit {


@reez this is the only place that gap_limit is used. Can you explain what is functionally different when the gap_limit is minimally bound to parallel_requests vs when it's not?

I may be missing something, but I'm failing to see a clear rationale for the bound.

I took a look at this original issue: bitcoindevkit/bdk_wallet#63

I think bounding to 1 makes more sense - since a stop_gap of 0 is undefined and the behaviour we want is to "treat 0 as 1".

Bounding it to parallel requests does the same thing, but the intention is obscured imo. It seems like the intention is to have consecutive_used be a multiple of gap_limit if the gap_limit is originally set lower than parallel_requests? However we are using >=, and this breaks down when gap_limit > parallel_requests anyway.

If the rationale is not obvious, the reader might assume they are missing something and spend time trying to understand it and then find out there is nothing to be understood.

Consider parallel_requests=10, stop_gap=5, first batch processes indices 0-9 where indices 0-4 have transactions and 5-9 are empty:

Original code:

last_active_index=4, last_index=9

Check: 9 >= 4 + 5? → true, break

If we do .max(1):

consecutive_unused=5, gap_limit=5

5 >= 5? → true, break ✓ (matches original)

If we keep this PR as is with .max(parallel_requests):

consecutive_unused=5, gap_limit=10

5 >= 10? → false, continue ✗ (different from original)

So .max(parallel_requests) changes the effective stop_gap when the user provides a value less than parallel_requests, which is a behavioral change from the original code. .max(1) preserves the original semantics while handling the stop_gap=0 edge case.

Suggestion

Change to .max(1).

github-project-automation bot added this to BDK Chain Oct 1, 2025

reez force-pushed the esplora branch from 1d05ee7 to 481f92d Compare October 1, 2025 13:45

ValuedMammal reviewed Oct 1, 2025

View reviewed changes

crates/esplora/src/async_ext.rs Outdated Show resolved Hide resolved

oleonardolima moved this to In Progress in BDK Chain Oct 1, 2025

oleonardolima assigned reez Oct 1, 2025

reez mentioned this pull request Oct 2, 2025

bdk_wallet 2.3/2.4/3.0 bitcoindevkit/bdk-ffi#869

Open

reez marked this pull request as ready for review October 6, 2025 16:48

ValuedMammal reviewed Oct 6, 2025

View reviewed changes