1. 28 Feb, 2015 1 commit
  2. 27 Feb, 2015 2 commits
  3. 15 Jan, 2015 1 commit
  4. 09 Jan, 2015 2 commits
  5. 08 Jan, 2015 8 commits
  6. 07 Jan, 2015 7 commits
    • David Woodhouse's avatar
      Unroll PUT_BITS() loop and improve compile-time visibility. · 69d57944
      David Woodhouse authored
      We *know* that we'll go round the loop at least once if (nr) >= 8, and
      we *might* go round it one more time. There's no need for a loop.
      This gives around 8% performance improvement.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
    • David Woodhouse's avatar
      Simplify the common (literal) case in LZS compression · c7042d53
      David Woodhouse authored
      If we're only adding one byte then don't use the loop, don't recalculate
      the hash we already have.
      This gives roughly a 2% performance improvement
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
    • David Woodhouse's avatar
      Revert LZS optimisation experiments · 44476f1a
      David Woodhouse authored
      They weren't helping. With the 65536-byte packet tests (1500 would perhaps
      be more representative but didn't make them look any better):
      a99260d6 Add LZS test harness
      Samples: 6K of event 'cycles', Event count (approx.): 5627230524
      Overhead       Samples  Command  Shared Object      Symbol
        69.71%          4498  lzstest  lzstest            [.] lzs_compress
        12.91%           834  lzstest  lzstest            [.] lzs_decompress
      69156824 Reduce per-packet computation overhead for LZS compression
      Samples: 7K of event 'cycles', Event count (approx.): 6190866454
      Overhead       Samples  Command  Shared Object      Symbol
        71.37%          5050  lzstest  lzstest            [.] lzs_compress
        11.67%           826  lzstest  lzstest            [.] lzs_decompress
      64d6b19f Simplify LZS compression again
      Samples: 6K of event 'cycles', Event count (approx.): 6074401505
      Overhead       Samples  Command  Shared Object      Symbol
        71.27%          4923  lzstest  lzstest            [.] lzs_compress
        11.54%           797  lzstest  lzstest            [.] lzs_decompress
      The memset() just isn't as expensive as the other tricks we play to
      avoid it. So let's just stick with the simple version for now, unless
      someone comes up with some better ideas.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
    • David Woodhouse's avatar
      Simplify LZS compression again · 64d6b19f
      David Woodhouse authored
      I'm going to keep the previous commit in the history, in case we rethink
      it later, but it seems overly complicated. But the time I'd finished it,
      it was apparent that it doesn't actually *matter* what crap is in the hash
      tables; we can just be robust enough to cope.
      So go back to a simple 16-bit hash offset in the data structures (but
      keep them allocated at setup time instead of on the stack).
      When we first start walking a hash chain, it's simple enough to check
      that the first hofs we get from the hash_table is valid:
       - if it's later than the current position, it's obviously invalid.
       - if the hash value at hofs doesn't match, it's obviously invalid.
       - conversely, if the hash value *does* match and it's in the part of
         the packet that we have already processed, then we know it's valid
         because we will have *put* it there when we processed that offset in
         the current packet.
      So just do that validity check on hash_table[hash] when we first start
      looking, and reset it to INVALID_OFS if appropriate.
      This adds a little overhead, but it should still be cheaper than doing
      the full memset() each time, and simpler than the previous version with
      more consistent performance.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
    • David Woodhouse's avatar
      Reduce per-packet computation overhead for LZS compression · 69156824
      David Woodhouse authored
      Since there is no history preserved from one packet to the next, we don't
      actually keep a real history. We just use the packet data instead.
      However, we can *pretend* we do, and pretend that we keep a full 32 bits
      (4GiB) of it. The current packet data represents just the latest part
      of it, and we should never be looking at anything older anyway.
      By using 32 bits for the hash offsets, and starting each new packet
      sufficiently far from the previous packet that it cannot 'see' history
      that it shouldn't, we can avoid having to clear the entire hash table
      data structure for each packet.
      The data structures are now twice as big, but only cleared once in every
      65536 packets.
      Be a lot more paranoid about the contents of the hash tables now that
      they are more long-lived, to prevent problems caused by corruption. Now
      the worst that should happen is you waste some time looking for matches
      where there are none, even if you inherit complete crap in the data
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
    • David Woodhouse's avatar
      Add LZS compression support · 33a74166
      David Woodhouse authored
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
    • David Woodhouse's avatar
      Add decompress-only support for LZS · 84ecff1b
      David Woodhouse authored
      Newer gateways support LZS compression, and it seems to be mutually
      exclusive with deflate.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>