HTTP Output Compression at Application Level

No Comments

HTTP has got a nice yet hardly known feature: Content can be compressed just before being sent to the client. This is quite interesting as HTML can be compressed quite well since it is mostly text, so one can expect savings by 60 - 70% on average.

Given that compression on nowadays systems is an operation with as good as no performance penalty and taking into account that loading times are a SEO factor, output compression is quite promising. To get a feel for this, you might want to try this tool.

Now let's have a look at possible compression schemes:

  • gzip / x-gzip
    This is probably the scheme supported best. It has been introduced with HTTP version 1.0 (cf. RFC 1945 - Section 3.5: Content Codings). x-gzip is the original name, which turned into gzip in HTTP version 1.1 (cf. RFC 2616 - Section 3.5: Content Codings). RFC 2616 requires clients to treat x-gzip and gzip equally. The content is expected to be in gzip-format according to RFC 1952, which can be generated by PHP's gzencode() function.
  • compress / x-compress
    The compress scheme is mentioned in RFC 1945 as well. It is identical to the UNIX compress as produced by PHP's gzcompress() function.
  • deflate
    deflate is new in HTTP v1.1. It is more or less identical to gzip (sans CRC checksum) as stated in RFC 1951. Use gzdeflate() for this.

There is another scheme not being mentioned in the RFCs: bzip2. Bzip2 is an algorithm that offers improved compression while having an increased cost in CPU cycles when compared to gzip. To my knowledge, Lynx and its spin-offs are the only clients to support this. Another notable alternative seeing no support at all for the moment were LZF.

If clients wish to receive compressed content, they are required to inform the server, which compression schemes they support via the Accept-Encoding HTTP header. The server is supposed to answer with a Content-Encoding header containing the name of a scheme that is supported by the server and understood by the client (if neither applies, the header is omitted and the content will be sent uncompressed).

Now let's have a hands-on example. I think my code snippet from a recent post might be a good start...

#!/usr/bin/php -n
<?php
 
function getEncoding() {
  // Do not compress if PHP is doing so already...
  if(ini_get('zlib.output_compression') == 'On') {
    return false;
  }
 
  // Don't compress if the client doesn't wish to receive compressed content
  if(!array_key_exists('HTTP_ACCEPT_ENCODING', $_SERVER)) {
    return false;
  }
 
  $acceptEncoding = explode(', ', $_SERVER['HTTP_ACCEPT_ENCODING']);
  $encodings = array('gzip', 'x-gzip', 'bzip2', 'deflate', 'compress', 'x-compress');
  foreach($acceptEncoding as $encoding) {
    if(in_array($encoding, $encodings)) {
      return $encoding;
    }
  }
 
  // That's it... Only so much, we can do.
  return false;
}
 
$output = <<< EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
  <head>
    <title>Source of {$_SERVER['REQUEST_URI']}</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  </head>
  <body>
    <h1>Source of {$_SERVER['REQUEST_URI']}</h1>
    <hr>
EOT;
$output .= highlight_file($_SERVER['argv'][1], true);
$generated = @date('r', $_SERVER['REQUEST_TIME']);
$output .= <<< eot="" <hr="">
   <p align="center">
     Generated $generated. Almost certainly <a href="http://validator.w3.org/check?uri=referer">valid HTML 4.01 Transitional</a>.
   </p>
  </body>
</html>
EOT;
 
if($encoding = getEncoding()) {
  switch($encoding) {
    case 'compress':
    case 'x-compress':
      $output = gzcompress($output, 6);
      break;
    case 'gzip':
    case 'x-gzip':
      $output = gzencode($output, 6);
      break;
    case 'deflate':
      $output = gzdeflate($output, 6);
      break;
    case 'bzip2':
      $output = bzcompress($output, 6);
      break;
  }
  printf("Content-Encoding: %s\r\n", $encoding);
}
 
printf("Content-Type: text/html\r\nContent-Length: %u\r\n\r\n", strlen($output));
echo $output;

As such, this is a working example. However, there are some caveats:

  1. As this is a mere academic example, nearly every imaginable scheme is supported. In production, you should limit yourself to support for gzip/x-gzip.
  2. Starting with HTTP v1.1, the list of encodings in the Accept-Encoding header can be gravitated (cf. RFC 2616 - Section 14.3: Accept-Encoding, RFC 1945 - Appendix D 2.3: Accept Encoding). The example doesn't take this into account. It even fails altogether if a certain encoding is gravitated.

Up next: How to make proper use of HTTP cache mechanisms :)

Update: In an earlier version of this article I stated gzcompress() would produce the same output format as required by the compress scheme. It appears I were entirely wrong about that; this scheme asks for output as generated by the Lempel-Ziv-Welch algorithm, which is returned by no native PHP function I am aware of. Sorry for that.

Be the first to write a comment!