Fix crash from asio callback during Client::close with pending operations#551
Open
BewareMyPower wants to merge 8 commits intoapache:mainfrom
Open
Fix crash from asio callback during Client::close with pending operations#551BewareMyPower wants to merge 8 commits intoapache:mainfrom
BewareMyPower wants to merge 8 commits intoapache:mainfrom
Conversation
… crash ASIO's async_connect (range_connect_op) holds a raw reference to the socket and calls socket.close() internally between endpoint retry attempts. If ClientConnection is destroyed before those internal calls complete (i.e. the socket_ shared_ptr drops to zero), that raw reference becomes dangling and causes a segfault. Fix by capturing socket_ as a shared_ptr in the async_connect completion handler. Since ASIO stores the completion handler as part of the operation object (range_connect_op), this keeps the socket alive for the entire lifetime of the async_connect operation, including all internal retry logic. Fixes: apache#550 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses a crash caused by ASIO completion handlers running during/after Client::close() when there are still in-flight network operations (notably pending async_connect), which can lead to use-after-free of socket / io_context internals during shutdown.
Changes:
- Adds tracking of in-flight (non-timer) async operations in
ClientConnectionand exposes apendingOperations()counter. - Adjusts connection shutdown behavior to avoid closing the socket while the TCP connect is still pending, deferring close handling to
handleTcpConnected. - Makes
ConnectionPool::close()wait (up to 5s) for each connection’s pending operations to complete before clearing the pool.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/PulsarFriend.h | Uses C++17 lock-guard deduction when locking ClientConnection internals in tests. |
| lib/MockServer.h | Updates mutex locking to use C++17 lock-guard deduction for consistency. |
| lib/ConnectionPool.cc | Waits for pending async ops per connection during pool shutdown to reduce shutdown-time crashes. |
| lib/ClientConnection.h | Introduces pendingOperations_ counter, exposes accessor, and changes mutex type to recursive_mutex. |
| lib/ClientConnection.cc | Instruments connect/handshake/read/write paths for pending-op tracking and modifies shutdown/connect completion handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes: #550
Motivation
ClientTest.testCloseClientis flaky that sometimes it could cause segmentation fault. This is becauseclient.close()could be called quickly aftercreateProducerAsync. In this case, theasync_connectoperation might not complete. In theclosemethod,socket_->close()is called, which could complete theasync_connectwithoperation_aborted. However, after closing theConnectionPool, theClientConnectionobject, as well as its internalsocket_object, will be destroyed. In asio, when the completion handler (callback) is called while the caller (socket) is destroyed, it will crash because the callback refers the caller internally.Capturing
socket_in the callback to extend socket's lifetime is not enough, because after closing the pool, theio_contextobject will be stopped by itsstopmethod inExecutorService. Then theio_context's internal queue will be destroyed, if the callback is triggered after that, it could access an destroyed object and causeModifications
closemethod, when the state isPending, i.e. the TCP connection is not done, don't close the socket. Instead, delay the socket close to the callback (handleTcpConnected).isClosedbecausestate_could be modified byclose()method in a different thread.