Pression  2.0.0
Compressor, decompressor, uploader and downloader plugins
Slicer.md
1 Restriping the output of a compressor for storage
2 ============
3 
4 This document specifies the transformation of the output of a data
5 compressor into a smaller set of larger output slices. The primary use
6 case is as a backend of the memcached keyv::Map, which has a maximum
7 value size of one megabyte.
8 
9 ## Requirements
10 
11 The new compression plugin API: @subpage data
12 
13 For an input of:
14 * data::CompressorInfo
15 * uncompressed data
16 * max output slice size
17 
18 The slicer produces:
19 * n output slices of size <= max output slice size
20 * Output of uncompressed data if data is uncompressible with zero-copy
21  during compression and decompression
22 
23 For an input of n output slices (see above), the slicer produces the
24 uncompressed data
25 
26 ## API
27 
28  namespace pression
29  {
30  namespace data
31  {
32  class Slicer
33  {
34  struct Result { uint8_t* data; uint32_t size; };
35  typedef std::vector< Result > Results; //!< Set of result slices
36  typedef std::vector< uint32_t > ResultSizes; //!< Remaining slice sizes
37 
38  Slicer( const CompressorInfo& compressor );
39 
40  // returned pointers are valid until next compress(), delete of
41  // input data, or dtor of Slicer called
42  Results&& compress( const uint8_t* data, size_t size,
43  uint32_t sliceSize );
44 
45  // input: first slice, output: remaining slice sizes
46  ResultSizes&& getRemainingSizes( const uint8_t* data, uint32_t size );
47 
48  // input: first slice, output: total decompressed data size
49  size_t getDecompressedSize( const uint8_t* data, uint32_t size );
50 
51  /** @overload convenience wrapper */
52  void decompress( const Results& input, uint8_t* data );
53  };
54  }
55  }
56 
57 ## Implementation
58 
59 compress() allocates a compressor and compresses the input data. Output
60 is uncompressible if pression::getDataSize() exceeds input size minus
61 header overhead
62 
63 Uncompressibly output is returned as:
64 * one zero-copy slice if size <= sliceSize
65 * or:
66  * one header slice: 16 byte magic 'uncompressed', 8 byte input size,
67  4 byte slice size
68  * n zero-copy output slices of sliceSize, pointing to input data memory
69 
70 Compressibly output is returned as:
71 * one header slice: 16 byte magic 'compressed', 16 byte compressor name hash,
72  8 byte input size, 4 byte nChunks, nChunks * 4 byte chunkSizes
73 * nSlices: complete, compressed chunks up to sliceSize
74 
75 First implementation throws if header size exceeds sliceSize for
76 compressed output and if a chunk is bigger than a slice.
77 
78 ## Examples
79 
80  void Keyv::memcached::Plugin::insert( const std::string& key,
81  const void* ptr, const size_t size )
82  {
83  const auto data = _slicer.compress( ptr, size, LB_1MB );
84  const std::string& hash = servus::make_uint128( key ).getString();
85 
86  for( const auto& slice : data )
87  {
88  ++hash;
89  memcached_set( _instance, hash.c_str(), hash.length(),
90  slice.data, slice.size, (time_t)0, (uint32_t)0 );
91  }
92  }
93 
94  std::string Keyv::memcached::Plugin::operator [] ( const std::string& key )
95  {
96  const std::string& hash = servus::make_uint128( key ).getString();
97  pression::data::Slicer::Results slices( 1 );
98  slices[0].data = memcached_get( _instance, hash.c_str(), hash.length(),
99  &slices[0].size );
100 
101  const auto remaining = _slicer.getRemainingSizes( slice[0].data,
102  slice[0].size );
103  slices.append( takeValues( hash, remaining ));
104 
105  std::string value( _slicer.getDecompressedSize( slice[0].data,
106  slice[0].size ));
107  _slicer.decompress( slices, value.data(), value.length( ));
108  return value;
109  }
110 
111 ## Issues
112 
113 ### Issue 1: What is the maximum allowed slice size?
114 
115 _Resolution: 4GB_
116 
117 It is unlikely that a storage system uses larger slices. Memcached has
118 a recommended limit of one megabyte.