Skip to content

ByteBufBsonDocument and RawBsonDocument simplifications#1902

Open
rozza wants to merge 1 commit intomongodb:ByteBuffrom
rozza:ByteBufBsonDocumentClone
Open

ByteBufBsonDocument and RawBsonDocument simplifications#1902
rozza wants to merge 1 commit intomongodb:ByteBuffrom
rozza:ByteBufBsonDocumentClone

Conversation

@rozza
Copy link
Member

@rozza rozza commented Feb 26, 2026

Rationale

ByteBufBsonDocument#clone used to return a RawBsonDocument. The recent changes returned a normal BsonDocument, which is potentially expensive depending on its usage.

The ByteBufBsonDocument changes also added complex iterator logic, when RawBsonDocument deferred to BsonDocument iterators. As iteration is essentially a hydrating mechanism, there is opportunity for improvements for both implementations. By changing the RawBsonDocument iterators to be more efficient, ByteBufBsonDocument can now utilize these efficiency gains by proxy, relying on the cachedDocument iterators.

This change both reduces the complexity of ByteBufBsonDocument and relies on an improved RawBsonDocument implementation.

Summary of changes

  • ByteBufBsonDocument:

    • Simplify by returning RawBsonDocument from toBsonDocument, avoiding full BSON deserialization. When there are no sequence fields, the body bytes are cloned directly. When sequence fields exist, BsonBinaryWriter.pipe() merges the body with sequence arrays efficiently.
    • Use toBsonDocument for iterators. This eliminates the need for custom iterators (IteratorMode, CombinedIterator, createBodyIterator, and sequence iterators) since entrySet/values/keySet now delegate to the cached RawBsonDocument.
  • RawBsonDocument:

    • Renamed toBaseBsonDocument to override the default toBsonDocument implementation.
    • Implemented the iterators so that they don't need to fully convert the document to a BsonDocument.
  • Tests:

    • Updated ByteBufBsonDocumentTest iteration tests.
    • Updated ByteBufBsonArrayTest#fromValues as entrySet now returns RawBsonDocument instances.

JAVA-6010

### Rationale

`ByteBufBsonDocument#clone` used to return a `RawBsonDocument`.
The recent changes returned a normal `BsonDocument`, which is potentially
expensive depending on its usage.

The `ByteBufBsonDocument` changes also added complex iterator logic, when
`RawBsonDocument` deferred to `BsonDocument` iterators. As iteration is
essentially a hydrating mechanism, there is opportunity for improvements
for both implementations. By changing the `RawBsonDocument` iterators to
be more efficient, `ByteBufBsonDocument` can now utilize these efficiency
gains by proxy, relying on the `cachedDocument` iterators.

This change both reduces the complexity of `ByteBufBsonDocument` and relies
on an improved `RawBsonDocument` implementation.

### Summary of changes

* **`ByteBufBsonDocument`**:
  * Simplify by returning `RawBsonDocument` from `toBsonDocument`, avoiding
    full BSON deserialization. When there are no sequence fields, the body
    bytes are cloned directly. When sequence fields exist,
    `BsonBinaryWriter.pipe()` merges the body with sequence arrays efficiently.
  * Use `toBsonDocument` for iterators. This eliminates the need for custom
    iterators (`IteratorMode`, `CombinedIterator`, `createBodyIterator`, and
    sequence iterators) since `entrySet`/`values`/`keySet` now delegate to the
    cached `RawBsonDocument`.

* **`RawBsonDocument`**:
  * Renamed `toBaseBsonDocument` to override the default `toBsonDocument`
    implementation.
  * Implemented the iterators so that they don't need to fully convert the
    document to a `BsonDocument`.

* **Tests**:
  * Updated `ByteBufBsonDocumentTest` iteration tests.
  * Updated `ByteBufBsonArrayTest#fromValues` as `entrySet` now returns
    `RawBsonDocument` instances.

JAVA-6010

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rozza rozza requested a review from strogiyotec February 26, 2026 13:49
@rozza rozza requested a review from a team as a code owner February 26, 2026 13:49
@Override
public Collection<BsonValue> values() {
return toBaseBsonDocument().values();
List<BsonValue> values = new ArrayList<>();
Copy link
Contributor

@strogiyotec strogiyotec Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the logic , the entrySet returns a SET though looking at the previous implementation it can still return duplicates
Here is the test case using latest main without this change

    @Test
    void testLatestPR() {
        BsonDocument doc = new BsonDocument()
                .append("a", new BsonInt32(1))
                .append("b", new BsonInt32(1))
                .append("c", new BsonInt32(1));
        RawBsonDocument rawDoc = new RawBsonDocument(doc, new BsonDocumentCodec());

        // BsonDocument preserves all 3 values
        assertEquals(3, doc.values().size());

        // RawBsonDocument should too — but LinkedHashSet deduplicates them to 1
        assertEquals(3, rawDoc.values().size());
    }

This test case is passing but fails in this branch because LinkedHashSet removes duplicate values, I believe we should preserve an already broken behavior and return duplicates

@Override
public Set<Entry<String, BsonValue>> entrySet() {
return toBaseBsonDocument().entrySet();
List<Entry<String, BsonValue>> entries = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the javadoc for RawBsonDocument say it's immutable, does it make sense to store entrySet/values in memory instead of calculating them all from scratch ? Something like what java.lang.String does with a hashcode, keep hashcode as class field but calculates it only once during the first hashCode call

Not mandatory

cachedDocument = new RawBsonDocument(clonedBytes);
} else {
// With sequence fields: pipe body + extra elements
BasicOutputBuffer buffer = new BasicOutputBuffer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this buffer is never closed, can we wrap it with try with resources

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked , the close just sets buffer to null, so not mandatory but still good to have in case we later change the class and close() will be more complicated than that

return ByteBufBsonDocument.this.size();
}
};
return toBsonDocument().entrySet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add ensureOpen before delegating entrySet , values ,keySet calls to RawBsonDocument ?

public BsonDocument clone() {
ensureOpen();
return toBsonDocument().clone();
return toBsonDocument();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this implementation is not correct according to clone javadoc

Image
 x.clone() != x

so from what I understand
x.clone() != x.clone() should be true as well

    @Test
    void testClone() {
        try (ByteBufBsonDocument byteBufDocument = new ByteBufBsonDocument(documentByteBuf)) {
            final BsonDocument clone1 = byteBufDocument.clone();
            final BsonDocument clone2 = byteBufDocument.clone();
            assertTrue(clone1 != clone2);
        }
    }

This test fails because clone returns same cached document, I believe it's fine as main contract x.clone() != x is true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants