|
|
|
|
# LZ4 Streaming API Example : Line by Line Text Compression
|
|
|
|
|
by *Takayuki Matsuoka*
|
|
|
|
|
|
|
|
|
|
`blockStreaming_lineByLine.c` is LZ4 Streaming API example which implements line by line incremental (de)compression.
|
|
|
|
|
|
|
|
|
|
Please note the following restrictions :
|
|
|
|
|
|
|
|
|
|
- Firstly, read "LZ4 Streaming API Basics".
|
|
|
|
|
- This is relatively advanced application example.
|
|
|
|
|
- Output file is not compatible with lz4frame and platform dependent.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## What's the point of this example ?
|
|
|
|
|
|
|
|
|
|
- Line by line incremental (de)compression.
|
|
|
|
|
- Handle huge file in small amount of memory
|
|
|
|
|
- Generally better compression ratio than Block API
|
|
|
|
|
- Non-uniform block size
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## How the compression works
|
|
|
|
|
|
|
|
|
|
First of all, allocate "Ring Buffer" for input and LZ4 compressed data buffer for output.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
(1)
|
|
|
|
|
Ring Buffer
|
|
|
|
|
|
|
|
|
|
+--------+
|
|
|
|
|
| Line#1 |
|
|
|
|
|
+---+----+
|
|
|
|
|
|
|
|
|
|
|
v
|
|
|
|
|
{Out#1}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(2)
|
|
|
|
|
Prefix Mode Dependency
|
|
|
|
|
+----+
|
|
|
|
|
| |
|
|
|
|
|
v |
|
|
|
|
|
+--------+-+------+
|
|
|
|
|
| Line#1 | Line#2 |
|
|
|
|
|
+--------+---+----+
|
|
|
|
|
|
|
|
|
|
|
v
|
|
|
|
|
{Out#2}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(3)
|
|
|
|
|
Prefix Prefix
|
|
|
|
|
+----+ +----+
|
|
|
|
|
| | | |
|
|
|
|
|
v | v |
|
|
|
|
|
+--------+-+------+-+------+
|
|
|
|
|
| Line#1 | Line#2 | Line#3 |
|
|
|
|
|
+--------+--------+---+----+
|
|
|
|
|
|
|
|
|
|
|
v
|
|
|
|
|
{Out#3}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(4)
|
|
|
|
|
External Dictionary Mode
|
|
|
|
|
+----+ +----+
|
|
|
|
|
| | | |
|
|
|
|
|
v | v |
|
|
|
|
|
------+--------+-+------+-+--------+
|
|
|
|
|
| .... | Line#X | Line#X+1 |
|
|
|
|
|
------+--------+--------+-----+----+
|
|
|
|
|
^ |
|
|
|
|
|
| v
|
|
|
|
|
| {Out#X+1}
|
|
|
|
|
|
|
|
|
|
|
Reset
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(5)
|
|
|
|
|
Prefix
|
|
|
|
|
+-----+
|
|
|
|
|
| |
|
|
|
|
|
v |
|
|
|
|
|
------+--------+--------+----------+--+-------+
|
|
|
|
|
| .... | Line#X | Line#X+1 | Line#X+2 |
|
|
|
|
|
------+--------+--------+----------+-----+----+
|
|
|
|
|
^ |
|
|
|
|
|
| v
|
|
|
|
|
| {Out#X+2}
|
|
|
|
|
|
|
|
|
|
|
Reset
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Next (see (1)), read first line to ringbuffer and compress it by `LZ4_compress_continue()`.
|
|
|
|
|
For the first time, LZ4 doesn't know any previous dependencies,
|
|
|
|
|
so it just compress the line without dependencies and generates compressed line {Out#1} to LZ4 compressed data buffer.
|
|
|
|
|
After that, write {Out#1} to the file and forward ringbuffer offset.
|
|
|
|
|
|
|
|
|
|
Do the same things to second line (see (2)).
|
|
|
|
|
But in this time, LZ4 can use dependency to Line#1 to improve compression ratio.
|
|
|
|
|
This dependency is called "Prefix mode".
|
|
|
|
|
|
|
|
|
|
Eventually, we'll reach end of ringbuffer at Line#X (see (4)).
|
|
|
|
|
This time, we should reset ringbuffer offset.
|
|
|
|
|
After resetting, at Line#X+1 pointer is not adjacent, but LZ4 still maintain its memory.
|
|
|
|
|
This is called "External Dictionary Mode".
|
|
|
|
|
|
|
|
|
|
In Line#X+2 (see (5)), finally LZ4 forget almost all memories but still remains Line#X+1.
|
|
|
|
|
This is the same situation as Line#2.
|
|
|
|
|
|
|
|
|
|
Continue these procedures to the end of text file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## How the decompression works
|
|
|
|
|
|
|
|
|
|
Decompression will do reverse order.
|
|
|
|
|
|
|
|
|
|
- Read compressed line from the file to buffer.
|
|
|
|
|
- Decompress it to the ringbuffer.
|
|
|
|
|
- Output decompressed plain text line to the file.
|
|
|
|
|
- Forward ringbuffer offset. If offset exceeds end of the ringbuffer, reset it.
|
|
|
|
|
|
|
|
|
|
Continue these procedures to the end of the compressed file.
|