ByteBufBsonDocument and RawBsonDocument simplifications#1902
ByteBufBsonDocument and RawBsonDocument simplifications#1902rozza wants to merge 1 commit intomongodb:ByteBuffrom
ByteBufBsonDocument and RawBsonDocument simplifications#1902Conversation
### Rationale
`ByteBufBsonDocument#clone` used to return a `RawBsonDocument`.
The recent changes returned a normal `BsonDocument`, which is potentially
expensive depending on its usage.
The `ByteBufBsonDocument` changes also added complex iterator logic, when
`RawBsonDocument` deferred to `BsonDocument` iterators. As iteration is
essentially a hydrating mechanism, there is opportunity for improvements
for both implementations. By changing the `RawBsonDocument` iterators to
be more efficient, `ByteBufBsonDocument` can now utilize these efficiency
gains by proxy, relying on the `cachedDocument` iterators.
This change both reduces the complexity of `ByteBufBsonDocument` and relies
on an improved `RawBsonDocument` implementation.
### Summary of changes
* **`ByteBufBsonDocument`**:
* Simplify by returning `RawBsonDocument` from `toBsonDocument`, avoiding
full BSON deserialization. When there are no sequence fields, the body
bytes are cloned directly. When sequence fields exist,
`BsonBinaryWriter.pipe()` merges the body with sequence arrays efficiently.
* Use `toBsonDocument` for iterators. This eliminates the need for custom
iterators (`IteratorMode`, `CombinedIterator`, `createBodyIterator`, and
sequence iterators) since `entrySet`/`values`/`keySet` now delegate to the
cached `RawBsonDocument`.
* **`RawBsonDocument`**:
* Renamed `toBaseBsonDocument` to override the default `toBsonDocument`
implementation.
* Implemented the iterators so that they don't need to fully convert the
document to a `BsonDocument`.
* **Tests**:
* Updated `ByteBufBsonDocumentTest` iteration tests.
* Updated `ByteBufBsonArrayTest#fromValues` as `entrySet` now returns
`RawBsonDocument` instances.
JAVA-6010
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| @Override | ||
| public Collection<BsonValue> values() { | ||
| return toBaseBsonDocument().values(); | ||
| List<BsonValue> values = new ArrayList<>(); |
There was a problem hiding this comment.
I think I understand the logic , the entrySet returns a SET though looking at the previous implementation it can still return duplicates
Here is the test case using latest main without this change
@Test
void testLatestPR() {
BsonDocument doc = new BsonDocument()
.append("a", new BsonInt32(1))
.append("b", new BsonInt32(1))
.append("c", new BsonInt32(1));
RawBsonDocument rawDoc = new RawBsonDocument(doc, new BsonDocumentCodec());
// BsonDocument preserves all 3 values
assertEquals(3, doc.values().size());
// RawBsonDocument should too — but LinkedHashSet deduplicates them to 1
assertEquals(3, rawDoc.values().size());
}
This test case is passing but fails in this branch because LinkedHashSet removes duplicate values, I believe we should preserve an already broken behavior and return duplicates
| @Override | ||
| public Set<Entry<String, BsonValue>> entrySet() { | ||
| return toBaseBsonDocument().entrySet(); | ||
| List<Entry<String, BsonValue>> entries = new ArrayList<>(); |
There was a problem hiding this comment.
the javadoc for RawBsonDocument say it's immutable, does it make sense to store entrySet/values in memory instead of calculating them all from scratch ? Something like what java.lang.String does with a hashcode, keep hashcode as class field but calculates it only once during the first hashCode call
Not mandatory
| cachedDocument = new RawBsonDocument(clonedBytes); | ||
| } else { | ||
| // With sequence fields: pipe body + extra elements | ||
| BasicOutputBuffer buffer = new BasicOutputBuffer(); |
There was a problem hiding this comment.
this buffer is never closed, can we wrap it with try with resources
There was a problem hiding this comment.
I just checked , the close just sets buffer to null, so not mandatory but still good to have in case we later change the class and close() will be more complicated than that
| return ByteBufBsonDocument.this.size(); | ||
| } | ||
| }; | ||
| return toBsonDocument().entrySet(); |
There was a problem hiding this comment.
should we add ensureOpen before delegating entrySet , values ,keySet calls to RawBsonDocument ?
| public BsonDocument clone() { | ||
| ensureOpen(); | ||
| return toBsonDocument().clone(); | ||
| return toBsonDocument(); |
There was a problem hiding this comment.
I believe this implementation is not correct according to clone javadoc
x.clone() != x
so from what I understand
x.clone() != x.clone() should be true as well
@Test
void testClone() {
try (ByteBufBsonDocument byteBufDocument = new ByteBufBsonDocument(documentByteBuf)) {
final BsonDocument clone1 = byteBufDocument.clone();
final BsonDocument clone2 = byteBufDocument.clone();
assertTrue(clone1 != clone2);
}
}
This test fails because clone returns same cached document, I believe it's fine as main contract x.clone() != x is true
Rationale
ByteBufBsonDocument#cloneused to return aRawBsonDocument. The recent changes returned a normalBsonDocument, which is potentially expensive depending on its usage.The
ByteBufBsonDocumentchanges also added complex iterator logic, whenRawBsonDocumentdeferred toBsonDocumentiterators. As iteration is essentially a hydrating mechanism, there is opportunity for improvements for both implementations. By changing theRawBsonDocumentiterators to be more efficient,ByteBufBsonDocumentcan now utilize these efficiency gains by proxy, relying on thecachedDocumentiterators.This change both reduces the complexity of
ByteBufBsonDocumentand relies on an improvedRawBsonDocumentimplementation.Summary of changes
ByteBufBsonDocument:RawBsonDocumentfromtoBsonDocument, avoiding full BSON deserialization. When there are no sequence fields, the body bytes are cloned directly. When sequence fields exist,BsonBinaryWriter.pipe()merges the body with sequence arrays efficiently.toBsonDocumentfor iterators. This eliminates the need for custom iterators (IteratorMode,CombinedIterator,createBodyIterator, and sequence iterators) sinceentrySet/values/keySetnow delegate to the cachedRawBsonDocument.RawBsonDocument:toBaseBsonDocumentto override the defaulttoBsonDocumentimplementation.BsonDocument.Tests:
ByteBufBsonDocumentTestiteration tests.ByteBufBsonArrayTest#fromValuesasentrySetnow returnsRawBsonDocumentinstances.JAVA-6010