Monday, 10 May 2010

Speeding Up Web Access and Reducing Traffic With Apache

One of the parameters that might affect our web sites response times that we, developers or system administrators, do have under our control is the size of the HTTP Response. This size may be reduced after careful analysis and engineering so that responses are non redundant and efficient. Nevertheless developers often forget that, just before our web server returns the HTTP response to our clients, there's one last thing that can be done in the case you're using at least HTTP/1.1 (which will almost invariably be the case): apply a compression algorithm.

Compression algorithms are everywhere and the HTTP protocol is no exception. Although you should carefully analyze your application's resource consumption to discover potential bottlenecks of your application,  users typically spend much of their time waiting for page to load. Images, scripts, embedded objects, the page markup: all of them contribute to a bandwidth usage that affect your web application response time. The same way you spare hard disk storage when you compress your images or your music with an appropriate compression algorithm, you'll spare bandwidth (and hence time) if you compress your responses.

A Short Introduction

Let's make a short introduction before going on. For compressed output to be understood by agents, coordination between the server and the browser must take place: that's why HTTP/1.1 formalized and standardized how and when compression can be used. Basically, servers and clients exchange information to determine whether compressed requests and responses can be used and, if both support a common algorithm, they use it. Most of the time this information exchange is made with the Accept-Encoding and Content-Encoding HTTP headers.

HTTP/1.1 specifies three compression methods that can be used: gzip, deflate and compress. Many clients and servers support gzip: notably, the Apache HTTP server does so. Others do support deflate although its usage by browsers is more quirky than gzip's.

gzip, that is surely known to UNIX users, will produce good compression rates for text: it's not uncommon to achieve compression rates of 70% and above when compressing text files, typical HTML markup or JavaScript code.

Configuring Your Apache Web Server

Configuring the Apache HTTP Server to compress its output is pretty easy. One of the things you should take into account is almost obvious: not every content type will compress well and compression has a cost. So, depending on the content served by your application, consider configuring your web server accordingly so that precious CPU cycles aren't wasted compressing something that should not be. Text, hence HTML markup, JavaScript, CSSs and so on will quite surely compress well. Compressed images such as JPEGs, PDFs, compressed multimedia files such as mp3, ogg, flac, will not.

Enabling mod_deflate

mod_deflate is a module, bundled with standard Apache 2 distibutions, that will provide the filter you need to compress your traffic. To enable mod_deflate you must modify your Apache configuration file accordingly. Open httpd.conf and verify that mod_deflate is enabled:

[...snip...]
LoadModule deflate_module libexec/mod_deflate.so
[...snip...]

Deciding When and What To Compress

The next choice you have to make is when and what to compress. Apache is pretty flexible and you can apply compression at distinct levels of your configurations such as, for example:
  • Apply it to everything.
  • Apply it at multiple <Location/> level.
  • Apply it at <VirtualHost/> level.

The "best" configuration will depending on how you're using your Apache HTTP server. If you're using your Apache HTTP Server as a proxy and to manage different virtual hosts, you might be interested on reducing configuration complexity:
  • Disable compression on every web server proxied by your front-end Apache server.
  • Configure compression on Apache by using appropriate <Location/> sections on at a virtual host level.

The last web server I configured for a client of mine acted as a proxy for a great number of virtual hosts. Since every virtual host was serving compressible content, we applied just one configuration at the / location:

<Location />
[...snip...]
# mod_deflate configuration here
</Location>

Take time in analyzing the characteristics of your traffic before blindingly turning on compression: you may save CPU cycles. Do remember to disable compression behind your Apache server: there's probably no point in compressing twice or more times, you should do it just before your response is sent to your clients.

An Example Configuration

Typical configuration will take into account:
  • Browsers non-compliant behaviors.
  • Content types not to compress.

The configuration we're running usually is more or less the same configuration exemplified in mod_deflate official documentation:

[...snip...]
# Insert filter
SetOutputFilter DEFLATE

# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
# BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48
# the above regex won't work. You can use the following
# workaround to get the desired effect:
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html

BrowserMatch Safari gzip-only-text/html

# Don't compress images
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary
[...snip...]

A brief explanation of the configuration example is the following:
  • The first line sets the DEFLATE output filter.
  • The next four lines, beginning with the BrowserMatch directive, tells Apache to check its clients' browser version to solve some well-known quirks.
  • The sixth line is a regular expression to match the request URI with: if it matches, compression it's not applied. In this case, as you may see, common poorly compressible image formats are matched.
  • The last line tells Apache to append an additional header so that proxies will not deliver cached (compressed) responses to clients that cannot accept them.

Next Steps

Needless to say, that's just a basic configuration and much finer tunings can be done. One of the first thing you might want to tweak is using a better way to control which files will be compressed. Instead of using the SetEnvIfNoCase directive as shown in the example above, you could for example use the AddOutputFilterByType to register the DEFLATE filter and associated to the MIME Types of the files you want to compress. To do that, remove the SetOutputFilter directive from the example above and use the following instead:

[...snip...]
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
[...snip...]

and so on.

If you're managing many application with a variety of web and applications server on your boxes, consider using a front-end Apache to centralize such a configuration. Instead of configuring each of your servers you'll reduce your infrastructure complexity and improve its maintainability. If you want to know how to configure Apache Virtual Hosts, a previous blog post is a good starting point.

No comments:

Post a Comment