Skip to content

[grid] Apply TCP backpressure across the WebSocket tunnel handler#17543

Open
shs96c wants to merge 1 commit into
SeleniumHQ:trunkfrom
shs96c:grid-tunnel-backpressure
Open

[grid] Apply TCP backpressure across the WebSocket tunnel handler#17543
shs96c wants to merge 1 commit into
SeleniumHQ:trunkfrom
shs96c:grid-tunnel-backpressure

Conversation

@shs96c
Copy link
Copy Markdown
Member

@shs96c shs96c commented May 22, 2026

Summary

The optional transparent TCP tunnel introduced in #17146 forwarded inbound bytes from one channel to the other with no flow control. A slow drain on one leg let the opposite leg keep shipping bytes, which Netty queued in the outbound buffer indefinitely. A session with sustained bidirectional traffic and a slow client could grow the Router's heap until the per-channel outbound queue was many megabytes.

Hook channelWritabilityChanged on each tunnel handler and mirror the state onto the peer's read side via setAutoRead. When this channel's outbound buffer crosses the high-water mark Netty fires the event and the peer stops reading; when the buffer drains below the low-water mark the peer resumes. The two legs of the tunnel each watch their own writability and toggle the other's read side, so backpressure propagates in both directions.

The thresholds come from the channel's existing WriteBufferWaterMark (Netty's default of 32 KiB low / 64 KiB high, or whatever an operator configures). No new numeric cap is introduced.

🤖 Generated with Claude Code

@selenium-ci selenium-ci added C-java Java Bindings B-build Includes scripting, bazel and CI integrations labels May 22, 2026
@shs96c shs96c marked this pull request as ready for review May 22, 2026 14:34
@qodo-code-review
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Apply TCP backpressure across WebSocket tunnel handler

🐞 Bug fix 🧪 Tests

Grey Divider

Walkthroughs

Description
• Implement TCP backpressure propagation across WebSocket tunnel handler
• Mirror channel writability state to peer's read side via setAutoRead
• Prevent unbounded outbound buffer growth during bidirectional traffic
• Add unit test validating writability transitions and autoRead synchronization
Diagram
flowchart LR
  A["Source Channel<br/>Outbound Buffer"] -->|"exceeds high-water mark"| B["channelWritabilityChanged<br/>Event Fired"]
  B -->|"setAutoRead false"| C["Target Channel<br/>Read Paused"]
  C -->|"peer stops shipping bytes"| D["Backpressure Applied"]
  D -->|"buffer drains below<br/>low-water mark"| E["setAutoRead true"]
  E -->|"peer resumes reading"| F["Flow Resumed"]

Loading

File Changes

1. java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java 🐞 Bug fix +11/-0

Implement backpressure mirroring via writability events

• Added documentation explaining backpressure mirroring mechanism
• Implemented channelWritabilityChanged override to synchronize peer's autoRead state
• When outbound buffer crosses high-water mark, peer's read is paused
• When buffer drains below low-water mark, peer's read is resumed

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java


2. java/test/org/openqa/selenium/netty/server/TcpTunnelHandlerTest.java 🧪 Tests +58/-0

Add unit test for tunnel backpressure synchronization

• New test class validating backpressure behavior in tunnel handler
• Tests writability state transitions on source channel
• Verifies target's autoRead flag mirrors source's writability state
• Uses EmbeddedChannel with custom WriteBufferWaterMark for controlled testing

java/test/org/openqa/selenium/netty/server/TcpTunnelHandlerTest.java


3. java/test/org/openqa/selenium/netty/server/BUILD.bazel ⚙️ Configuration changes +1/-0

Register new tunnel handler test in build configuration

• Added TcpTunnelHandlerTest.java to test sources list

java/test/org/openqa/selenium/netty/server/BUILD.bazel


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 22, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (1)

Grey Divider


Remediation recommended

1. Backpressure trips idle close 🐞 Bug ☼ Reliability ⭐ New
Description
TcpTunnelHandler.channelWritabilityChanged pauses the peer channel by setting peer autoRead=false
when this channel becomes unwritable. Because TcpUpgradeTunnelHandler installs a reader-idle
IdleStateHandler that closes both channels on any IdleStateEvent, sustained backpressure can be
misinterpreted as read-idle and tear down an otherwise healthy tunnel.
Code

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[R65-75]

Evidence
Backpressure explicitly disables reads on the peer channel, while the tunnel’s read-idle logic
closes the connection when there are no reads for the idle window. Disabling reads for an extended
period therefore satisfies the read-idle condition and can close the tunnel even though the pause
was intentional backpressure.

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[65-75]
java/src/org/openqa/selenium/netty/server/TcpUpgradeTunnelHandler.java[324-345]
java/src/org/openqa/selenium/netty/server/TcpUpgradeTunnelHandler.java[390-401]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The tunnel’s backpressure mechanism disables the peer channel’s `autoRead` during congestion. The tunnel also installs a **read-idle** timeout that closes both channels when no bytes are read for the idle window. When `autoRead` is disabled for backpressure, reads stop by design, which can trigger the read-idle timeout and close the tunnel during prolonged backpressure.

### Issue Context
- Backpressure change: `TcpTunnelHandler.channelWritabilityChanged` calls `target.config().setAutoRead(ctx.channel().isWritable())`.
- Tunnel wiring installs `IdleStateHandler(idleSeconds, 0, 0, ...)` and `IdleCloseHandler` closes on `IdleStateEvent`.

### Fix Focus Areas
- Update idle-close logic to *not* close the tunnel when the channel’s `autoRead` is currently disabled (i.e., the lack of reads is intentional backpressure).
- file: `java/src/org/openqa/selenium/netty/server/TcpUpgradeTunnelHandler.java` — in `IdleCloseHandler.userEventTriggered`, gate the close behavior (e.g., only close if `ctx.channel().config().isAutoRead()` is true).

#### Suggested patch outline
- In `IdleCloseHandler.userEventTriggered`:
 - If `evt instanceof IdleStateEvent` **and** `!ctx.channel().config().isAutoRead()`, return/ignore (or log at FINE and return).
 - Otherwise keep current close-both-channels behavior.

Fix Focus Areas (exact refs):
- java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[65-75]
- java/src/org/openqa/selenium/netty/server/TcpUpgradeTunnelHandler.java[324-345]
- java/src/org/openqa/selenium/netty/server/TcpUpgradeTunnelHandler.java[390-401]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. setAutoRead cross-eventloop call 📘 Rule violation ☼ Reliability
Description
channelWritabilityChanged directly calls target.config().setAutoRead(...) from the source
channel's callback thread, but target may be owned by a different Netty event loop. This can cause
unsafe cross-thread state changes or unexpected read() behavior on the target channel under load.
Code

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[R65-70]

Evidence
PR Compliance ID 14 requires shared/concurrent state changes to be properly synchronized. The new
code mutates the peer channel's read configuration (target.config().setAutoRead(...)) from within
the current channel's writability callback without ensuring execution on target's event loop.

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[65-70]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`TcpTunnelHandler.channelWritabilityChanged` updates the peer channel's `autoRead` directly. If the two tunnel legs are on different Netty event loops, this becomes a cross-thread update and may not be safe.

## Issue Context
Netty generally expects channel operations that can trigger I/O (like enabling `autoRead`, which may cause a `read()`) to run on the channel's own event loop.

## Fix Focus Areas
- java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[65-70]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

3. Test bypasses tunnel read 🐞 Bug ⚙ Maintainability ⭐ New
Description
TcpTunnelHandlerTest triggers writability changes via outbound source.write(...) instead of
sending inbound data through TcpTunnelHandler.channelRead, so it doesn’t validate backpressure
mirroring under the actual tunnel forwarding path.
Code

java/test/org/openqa/selenium/netty/server/TcpTunnelHandlerTest.java[R29-55]

Evidence
The handler forwards inbound messages in channelRead via target.writeAndFlush(msg), but the test
never calls writeInbound on the source channel; it only does an outbound source.write(...), so
the forwarding path is not exercised.

java/test/org/openqa/selenium/netty/server/TcpTunnelHandlerTest.java[29-55]
java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[44-63]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The new test does not execute `TcpTunnelHandler.channelRead` (the forwarding path); it writes outbound on the source channel to force a writability event. This validates that the handler reacts to `channelWritabilityChanged`, but not that real tunnel traffic (inbound -> forwarded write) leads to the expected backpressure behavior.

### Issue Context
`TcpTunnelHandler` is fundamentally an inbound-forwarder (`channelRead` writes to `target`). The test should ideally simulate inbound bytes and verify that the *receiving side’s* writability transition causes the *sending side* to be paused via mirrored `autoRead`.

### Fix Focus Areas
- Adjust the unit test to use `writeInbound(...)` to drive `TcpTunnelHandler.channelRead`.
- Consider creating **two** `EmbeddedChannel`s with reciprocal `TcpTunnelHandler`s (A->B and B->A) to mimic the real tunnel symmetry.

Fix Focus Areas (exact refs):
- java/test/org/openqa/selenium/netty/server/TcpTunnelHandlerTest.java[29-55]
- java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[44-63]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Writability event swallowed 🐞 Bug ⚙ Maintainability
Description
TcpTunnelHandler.channelWritabilityChanged() handles the event but does not forward it down the
Netty pipeline, so any handlers after TcpTunnelHandler will never observe writability transitions.
This makes the handler non-composable and can silently break future additions that rely on
channelWritabilityChanged propagation.
Code

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[R65-70]

Evidence
The handler overrides channelWritabilityChanged and only toggles the peer autoRead; it does not call
ctx.fireChannelWritabilityChanged, so the event stops at this handler. TcpUpgradeTunnelHandler
installs TcpTunnelHandler into the pipeline as the tunnel handler, meaning any later pipeline
additions would be affected by this behavior.

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[65-70]
java/src/org/openqa/selenium/netty/server/TcpUpgradeTunnelHandler.java[294-305]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`TcpTunnelHandler#channelWritabilityChanged` consumes the Netty event without propagating it. In Netty inbound event handlers, if you override a callback and don’t call `ctx.fire...`, later handlers in the pipeline won’t receive the event.

### Issue Context
This tunnel handler is currently added as the last handler in the tunnel pipelines, but keeping event propagation intact makes the handler safer to reuse and allows future handlers (metrics/throttling/debug) to observe writability changes.

### Fix Focus Areas
- java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[65-70]

### Suggested change
After mirroring writability to `target.config().setAutoRead(...)`, call `ctx.fireChannelWritabilityChanged()` (or `super.channelWritabilityChanged(ctx)`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit 88b59df

Results up to commit 1d537cf


🐞 Bugs (1) 📘 Rule violations (1) 📎 Requirement gaps (0)


Remediation recommended
1. setAutoRead cross-eventloop call 📘 Rule violation ☼ Reliability
Description
channelWritabilityChanged directly calls target.config().setAutoRead(...) from the source
channel's callback thread, but target may be owned by a different Netty event loop. This can cause
unsafe cross-thread state changes or unexpected read() behavior on the target channel under load.
Code

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[R65-70]

Evidence
PR Compliance ID 14 requires shared/concurrent state changes to be properly synchronized. The new
code mutates the peer channel's read configuration (target.config().setAutoRead(...)) from within
the current channel's writability callback without ensuring execution on target's event loop.

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[65-70]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`TcpTunnelHandler.channelWritabilityChanged` updates the peer channel's `autoRead` directly. If the two tunnel legs are on different Netty event loops, this becomes a cross-thread update and may not be safe.

## Issue Context
Netty generally expects channel operations that can trigger I/O (like enabling `autoRead`, which may cause a `read()`) to run on the channel's own event loop.

## Fix Focus Areas
- java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[65-70]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments
2. Writability event swallowed 🐞 Bug ⚙ Maintainability
Description
TcpTunnelHandler.channelWritabilityChanged() handles the event but does not forward it down the
Netty pipeline, so any handlers after TcpTunnelHandler will never observe writability transitions.
This makes the handler non-composable and can silently break future additions that rely on
channelWritabilityChanged propagation.
Code

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[R65-70]

Evidence
The handler overrides channelWritabilityChanged and only toggles the peer autoRead; it does not call
ctx.fireChannelWritabilityChanged, so the event stops at this handler. TcpUpgradeTunnelHandler
installs TcpTunnelHandler into the pipeline as the tunnel handler, meaning any later pipeline
additions would be affected by this behavior.

java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[65-70]
java/src/org/openqa/selenium/netty/server/TcpUpgradeTunnelHandler.java[294-305]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`TcpTunnelHandler#channelWritabilityChanged` consumes the Netty event without propagating it. In Netty inbound event handlers, if you override a callback and don’t call `ctx.fire...`, later handlers in the pipeline won’t receive the event.

### Issue Context
This tunnel handler is currently added as the last handler in the tunnel pipelines, but keeping event propagation intact makes the handler safer to reuse and allows future handlers (metrics/throttling/debug) to observe writability changes.

### Fix Focus Areas
- java/src/org/openqa/selenium/netty/server/TcpTunnelHandler.java[65-70]

### Suggested change
After mirroring writability to `target.config().setAutoRead(...)`, call `ctx.fireChannelWritabilityChanged()` (or `super.channelWritabilityChanged(ctx)`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Qodo Logo

The optional transparent TCP tunnel introduced in 9a2df3a (2026-02-27,
"[grid] Router bypass WebSocket data path via transparent TCP tunnel",
SeleniumHQ#17146) forwarded inbound bytes from one channel to the other with no
flow control. A slow drain on one leg let the opposite leg keep
shipping bytes, which Netty queued in the outbound buffer indefinitely.
A session with sustained bidirectional traffic and a slow client could
grow the Router's heap until the per-channel outbound queue was many
megabytes.

Hook channelWritabilityChanged on each tunnel handler and mirror the
state onto the peer's read side via setAutoRead. When this channel's
outbound buffer crosses the high-water mark Netty fires the event and
the peer stops reading; when the buffer drains below the low-water mark
the peer resumes. The two legs of the tunnel each watch their own
writability and toggle the other's read side, so backpressure
propagates in both directions.

The thresholds come from the channel's existing WriteBufferWaterMark
(Netty's default of 32 KiB low / 64 KiB high, or whatever an operator
configures). No new numeric cap is introduced.

A small EmbeddedChannel test drives the writability transitions both
ways and asserts the peer's autoRead flag tracks them. The existing
end-to-end TunnelWebsocketTest continues to cover the happy path.
@shs96c shs96c force-pushed the grid-tunnel-backpressure branch from 1d537cf to 88b59df Compare May 22, 2026 14:51
@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 22, 2026

Persistent review updated to latest commit 88b59df

@shs96c
Copy link
Copy Markdown
Member Author

shs96c commented May 22, 2026

Addressed both qodo findings in 88b59df:

  • Writability event swallowed. Added ctx.fireChannelWritabilityChanged() so the event continues down the pipeline. The handler is currently the last in its pipeline, but propagation is correct Netty hygiene and lets future handlers (metrics, throttling, debugging) observe the transition.
  • Cross-event-loop setAutoRead. Verified this is not a real issue today: TcpUpgradeTunnelHandler puts both legs on the same event loop (Bootstrap().group(clientChannel.eventLoop())), and DefaultChannelConfig.setAutoRead is documented thread-safe in any case — the flag is mutated via an atomic updater and the channel.read() triggered on a false→true transition marshalls itself onto the channel's event loop. Added a comment at the call site explaining why direct invocation is safe, in case a future reader has the same concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations C-java Java Bindings

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants