LZ4 Streaming API Example : Line by Line Text Compression

by Takayuki Matsuoka

blockStreaming_lineByLine.c is LZ4 Streaming API example which implements line by line incremental (de)compression.

Please note the following restrictions :

Firstly, read "LZ4 Streaming API Basics".
This is relatively advanced application example.
Output file is not compatible with lz4frame and platform dependent.

What's the point of this example ?

Line by line incremental (de)compression.
Handle huge file in small amount of memory
Generally better compression ratio than Block API
Non-uniform block size

How the compression works

First of all, allocate "Ring Buffer" for input and LZ4 compressed data buffer for output.

(1)
    Ring Buffer

    +--------+
    | Line#1 |
    +---+----+
        |
        v
     {Out#1}


(2)
    Prefix Mode Dependency
          +----+
          |    |
          v    |
    +--------+-+------+
    | Line#1 | Line#2 |
    +--------+---+----+
                 |
                 v
              {Out#2}


(3)
          Prefix   Prefix
          +----+   +----+
          |    |   |    |
          v    |   v    |
    +--------+-+------+-+------+
    | Line#1 | Line#2 | Line#3 |
    +--------+--------+---+----+
                          |
                          v
                       {Out#3}


(4)
                        External Dictionary Mode
                +----+   +----+
                |    |   |    |
                v    |   v    |
    ------+--------+-+------+-+--------+
          |  ....  | Line#X | Line#X+1 |
    ------+--------+--------+-----+----+
                            ^     |
                            |     v
                            |  {Out#X+1}
                            |
                          Reset


(5)
                                    Prefix
                                    +-----+
                                    |     |
                                    v     |
    ------+--------+--------+----------+--+-------+
          |  ....  | Line#X | Line#X+1 | Line#X+2 |
    ------+--------+--------+----------+-----+----+
                            ^                |
                            |                v
                            |            {Out#X+2}
                            |
                          Reset

Next (see (1)), read first line to ringbuffer and compress it by LZ4_compress_continue(). For the first time, LZ4 doesn't know any previous dependencies, so it just compress the line without dependencies and generates compressed line {Out#1} to LZ4 compressed data buffer. After that, write {Out#1} to the file and forward ringbuffer offset.

Do the same things to second line (see (2)). But in this time, LZ4 can use dependency to Line#1 to improve compression ratio. This dependency is called "Prefix mode".

Eventually, we'll reach end of ringbuffer at Line#X (see (4)). This time, we should reset ringbuffer offset. After resetting, at Line#X+1 pointer is not adjacent, but LZ4 still maintain its memory. This is called "External Dictionary Mode".

In Line#X+2 (see (5)), finally LZ4 forget almost all memories but still remains Line#X+1. This is the same situation as Line#2.

Continue these procedures to the end of text file.

How the decompression works

Decompression will do reverse order.

Read compressed line from the file to buffer.
Decompress it to the ringbuffer.
Output decompressed plain text line to the file.
Forward ringbuffer offset. If offset exceeds end of the ringbuffer, reset it.

Continue these procedures to the end of the compressed file.

3.6 KiB Raw Blame History

LZ4 Streaming API Example : Line by Line Text Compression

What's the point of this example ?

How the compression works

How the decompression works

3.6 KiB

Raw Blame History