A StrChunk is an abstraction for an immutable byte string. One might think such an abstraction wouldn't be very complicated or deserving of a detailed explanation, but StrChunk's are a bit different.
One of the chief problems in moving data around is the time it takes to make copies. Each copy made of a buffer of data is a waste of memory bandwidth. Often memory bandwidth is even more scarce than CPU speed, especially in programs who's main job it is to move data around. These copies happen because protocols often have sections of messages that contain random data that's not of direct concern to a particular protocol layer. The common strategy is to copy the data that's of concern to the next layer out of the message, and throw the original data and layer wrappings away. StrChunks provide a better way.
A StrChunk is an abstraction for an immutable byte string. Ths means that the underlying implementation need not actually be a byte string. In fact, two common kinds of StrChunks (StrSubChunks and GroupChunks) use StrChunks for their implementation. This sounds pretty simple, but has some interesting implications.
Another concept that needs introduction is the idea of a LinearExtent. A LinearExtent is actually rather simple, and perhaps a better name could be thought of. It consists of a starting point or offset, and a length. It is used to refer to portions of a StrChunk.
A StrSubChunk is a StrChunk that uses another StrChunk to actually implement the byte string. It uses a LinearExtent to describe which portion of the child StrChunk the StrSubChunk consists of. This sounds kind of confusing, but the idea is simple.
Suppose you create a StrChunk like this:
BufferChunk *pab1 = new PreAllocBuffer<23>; // Using a BufferChunk * because we need getVoidP() memcpy(pab1->getVoidP(), "George Orwell has fleas", 23); // from BufferChunk to finish the StrChunk.
*pab1
now contains this sequence of bytes:
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Value | G | e | o | r | g | e | O | r | w | e | l | l | h | a | s | f | l | e | a | s |
If you created a StrSubChunk that had a LinearExtent of (offset: 7,
length: 10) like this:
StrChunk *subc = new StrSubChunk(pab1, LinearExtent(7, 10));
*pab1
to hold the bytes, this StrSubChunk (which is
also itself a StrChunk) would look like this:
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
Value | O | r | w | e | l | l | h | a | s |
As you can see, a new StrChunk has been created that contains part of another StrChunk without having to have made a copy of that part.
A GroupChunk is a kind of StrChunk that can contain several other
StrChunks. A GroupChunk appears as the concatenation of all the StrChunks it
contains. Suppose you have two StrChunks:
BufferChunk *bc1 = new PreAllocBuffer<10>; memcpy(pa1->getVoidP(), "What hath ", 10); BufferChunk *bc2 = new PreAllocBuffer<12>; memcpy(pa1->getVoidP(), "God wrought?", 12);
*bc1
contains:
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
Value | W | h | a | t | h | a | t | h |
*bc2
contains:
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Value | G | o | d | w | r | o | u | g | h | t | ? |
Then, suppose you create a new GroupChunk and add these two StrChunks to it. A GroupChunk is a StrChunk, so, from a StrChunk perspective, the new GroupChunk contains:
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Value | W | h | a | t | h | a | t | h | G | o | d | w | r | o | u | g | h | t | ? |
This allows you to 'virtually' concatenate StrChunks together without having to make any copies.
You can combine StrSubChunks and GroupChunks together to do anything you
would with C character arrays and pointers without having to actually move
any of your data bytes around. Better yet, these entities are reference
counted, immutable after creation, don't treat '\0'
differently,
and are resistant to buffer overflows.