Preload Perl modules - Real Numbers

It's a crap!!!

First define the goal of testing and probably using GTop to make the process easier.

###########################################

debug.pod:

Collect all these notes about stack traces generation (here in this file)

the stacktrace looks right. it would be more useful to see the line number, which you can see if you follow this tip for building mod_perl from the SUPPORT doc:

CORE DUMPS

If you get a core dump, please send a backtrace if possible. Before you try, build mod_perl with perl Makefile.PL PERL_DEBUG=1 which will: -add `-g' to EXTRA_CFLAGS -turn on PERL_TRACE -set PERL_DESTRUCT_LEVEL=2 (additional checks during Perl cleanup) -link against libperld if it exists

###########################################

Add a new section:

Transitioning from Apache::Registry to Apache handlers

That should be pretty easy to convert, since you already have your main program logic off in a separate module.

> Should I just use Apache::Request instead of CGI.pm? (I do use the > cookie & checkbox/pulldown functionality often).

Use Apache::Request and Apache::Cookie (both in the libapreq distribution). I don't use the sticky forms stuff in CGI.pm, so I don't know if there's a handy replacement. Check CPAN or just roll your own. Maybe you could make a subclass of Apache::Request that adds somemethods for this.

> If there are any tutorials out there, I'd love some links.

It really isn't difficult enough to merit a tutorial, in my opinion. It's just ordinary perl module stuff.

###########################################

eagle book appendix B lists all the Makefile.PL options (we better include them all)

###########################################

> > META: Does it matter where we declare Apache::Peek, i.e. before/after > > Apache::Status? > > > > No, it doesn't. > > I asked this because when you talked about Apache::Status you made it > very clear that it *had to be first* so I kind of felt that since you'd > been explicit about that you ought to have said that the order of > Apache::Peek didn't matter -- since both Apache::Status and Apache::DBI > have order dependencies I think you should say which one's don't have > order dependencies.

###########################################

A reader suggested:

> It might be helpful to replicate some information about the the Apache Life > Cycle ( Eagle book pg 56 ), before talking about server startup file.

config.pod

###########################################

debug: =head3 Safe Resource Locking

You must review the section and correct this issue:

If the file is reopened in the next script invocation, the previous fh will be closed and unlocked, but only from within the same process.

Regarding leakages, if you use open IN, ... probably there is no leakage because it's the same handler. In case of gensym or IO::File use you should check this issue.

All the above is tentative and should be validated.

###########################################

I think that the Syg::Signal is being duplicated in two places!

###########################################

From: "Joseph R. Junkin" <jjunkin@datacrawler.com> To: Stas Bekman <sbekman@stason.org> Subject: Re: [summary+rfc] When One Machine is not Enough...

I doubt it will help, but you are free to look at a talk I gave:

Analysis of the Open Source Application Platform http://www.datacrawler.com/talk/tech/platform/

It addresses the scalability issues you have already covered.

###########################################

die() issue:

merge the porting.html#die_and_mod_perl snippets.html#Redirecting_Errors_to_the_Client

add notes from Matt's email: Warning: $SIG{__DIE__} considered dangerous regarding of eval {} if $@ try/catch use

###########################################

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

###########################################

Check whether this one is documented somewhere else too:

install: What Compiler Should Be Used to Build mod_perl?

I'm not sure, but Ged says it is.

###########################################

CREDITS! CREDITS! CREDITS! put all the credits in place, especially Ged!!!

###########################################

Mention the Apache::Watchdog::RunAway in the debug.pod where you have a META to fill.

###########################################

There are lots of META tags in the Guide, it's a time to go and fill in the missing details

###########################################

> Are you familiar with Apache::Resource and setrlimit? If so what do the > soft and hard limits mean? Is soft is the one that can be temperarely > be lower that the current usage of something? And the hard limit is the > one that if reached it triggers the killoff?

>From BSD's man page for setrlimit:

A resource limit is specified as a soft limit and a hard limit.  When a
soft limit is exceeded a process may receive a signal (for example, if
the cpu time or file size is exceeded), but it will be allowed to contin-
ue execution until it reaches the hard limit (or modifies its resource
limit).  The rlimit structure is used to specify the hard and soft limits
on a resource,
...
The system refuses to extend the data or stack space when the limits
would be exceeded in the normal way: a brk(2) call fails if the data
space limit is reached.  When the stack limit is reached, the process re-
ceives a segmentation fault (SIGSEGV); if this signal is not caught by a
handler using the signal stack, this signal will kill the process.

A file I/O operation that would create a file larger that the process'
soft limit will cause the write to fail and a signal SIGXFSZ to be gener-
ated; this normally terminates the process, but may be caught.  When the
soft cpu time limit is exceeded, a signal SIGXCPU is sent to the offend-
ing process.

###########################################

control.pod:

> > META: Stas: I would not have renamed apachectl to httpd_perl if I had > > META: Stas: renamed httpd to httpd_perl. This will be confusing for a > > META: Stas: beginner. How about 'httpd_perl_ctl start' etc? > > > > Neither symbolic links nor httpd_perl/docs themselves are visiable from > > PATH. Do you want to change the symlink name? > > I don't know if I've misunderstood you, or if I've failed to make my > point well. It wasn't about PATH. My thought was that if our reader > creates a binary called `/usr/whatever/httpd_perl' and also renames a > script from `/etc/init.d/apachectl' to `/etc/init.d/httpd_perl' he has > *two* files called httpd_perl and that could confuse him, at a time > when he's already likely to be confused about many other things. The > symlinks are OK because they all have a `S87...' (or whatever) prefix.

###########################################

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

###########################################

Jeffrey W. Baker said:

How about mod_backhand? http://www.backhand.org/. It is capable of not only buffering, but also load balancing and failover.

If all you want is buffering, it would be very easy to write a small program that accepted http requests and forwarded them to another daemon, then buffered the response. Perhaps this would be a good weekend project for one of us.

I've been tinkering with the idea of writing an httpd. As much as I like Apache, there are many things I don't like about it, and a ton of functionality that I won't ever need. All I really want is a web server that:

1) Is multithreaded or otherwise highly parallel. 2) Has a request stage interface like Apache's 3) Uses async I/O

Number 3 would releive us of this squid hack that we are all using. That doesn't seem too hard.

###########################################

Server split issue:

(Greg Stark)

I've learned the hard way that a proxy does not completely replace the need to put images and other other static components on a separate server. There are two reasons that you really really want to be serving images from another server (possibly running on the same machine of course).

1) Netscape/IE won't intermix slow dynamic requests with fast static requests on the same keep-alive connection

2) static images won't be delayed when the proxy gets bogged down waiting on the backend dynamic server.

Both of these result in a very slow user experience if the dynamic content server gets at all slow -- even out of proportion to the slowdown.

Eg, if the dynamic content generation becomes slow enough to cause a 2s backlog of connections for dynamic content, then a proxy will not protect the static images from that delay. Netscape or IE may queue those requests after another dynamic content request, and even if they don't the proxy server will eventually have every slot taken up waiting on the dynamic server.

So *every* image on the page will have another 2s latency, instead of just a 2s latency for the entire page. This is worst in Netscape of course course where the page can't draw until all the images sizes are known.

This doesn't mean having a proxy is a bad idea. But it doesn't replace putting your images on pics.mydomain.foo even if that resolves to the same address and run a separate apache instance for them.

===>

> > I think if you can avoid hitting a mod_perl server for the images, > > you've won more than half the battle, especially on a graphically > > intensive site. > > I've learned the hard way that a proxy does not completely replace the need to > put images and other other static components on a separate server. There are > two reasons that you really really want to be serving images from another > server (possibly running on the same machine of course).

I agree that it is correct to serve images from a lightweight server but I don't quite understand how these points relate. A proxy should avoid the need to hit the backend server for static content if the cache copy is current unless the user hits the reload button and the browser sends the request with 'pragma: no-cache'.

> 1) Netscape/IE won't intermix slow dynamic requests with fast static requests > on the same keep-alive connection

I thought they just opened several connections in parallel without regard for the type of content.

> 2) static images won't be delayed when the proxy gets bogged down waiting on > the backend dynamic server.

Is this under NT where mod_perl is single threaded? Serving a new request should not have any relationship to delays handling other requests on unix unless you have hit your child process limit.

> Eg, if the dynamic content generation becomes slow enough to cause a 2s > backlog of connections for dynamic content, then a proxy will not protect the > static images from that delay. Netscape or IE may queue those requests after > another dynamic content request, and even if they don't the proxy server will > eventually have every slot taken up waiting on the dynamic server.

A proxy that already has the cached image should deliver it with no delay, and a request back to the same server should be serviced immediately anyway.

> So *every* image on the page will have another 2s latency, instead of just a > 2s latency for the entire page. This is worst in Netscape of course course > where the page can't draw until all the images sizes are known.

Putting the sizes in the IMG SRC tag is a good idea anyway.

> This doesn't mean having a proxy is a bad idea. But it doesn't replace putting > your images on pics.mydomain.foo even if that resolves to the same address and > run a separate apache instance for them.

This is a good idea because it is easy to move to a different machine if the load makes it necessary. However, a simple approach is to use a non-mod_perl apache as a non-caching proxy front end for the dynamic content and let it deliver the static pages directly. A short stack of RewriteRules can arrange this if you use the [L] or [PT] flags on the matches you want the front end to serve and the [P] flag on the matches to proxy.

===>

> I agree that it is correct to serve images from a lightweight server > but I don't quite understand how these points relate. A proxy should > avoid the need to hit the backend server for static content if the > cache copy is current unless the user hits the reload button and > the browser sends the request with 'pragma: no-cache'.

I'll try to expand a bit on the details:

> > 1) Netscape/IE won't intermix slow dynamic requests with fast static requests > > on the same keep-alive connection > > I thought they just opened several connections in parallel without regard > for the type of content.

Right, that's the problem. If the two types of content are coming from the same proxy server (as far as NS/IE is concerned) then they will intermix the requests and the slow page could hold up several images queued behind it. I actually suspect IE5 is cleverer about this, but you still know more than it does.

By putting them on different hostnames the browser will open a second set of parallel connections to that server and keep the two types of requests separate.

> > 2) static images won't be delayed when the proxy gets bogged down waiting on > > the backend dynamic server.

Picture the following situation: The dynamic server normally generates pages in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can handle 20 connections per second. The mod_proxy runs 200 processes and it handles static requests very quickly, so it can handle some huge number of static requests, but it can still only handle 20 proxied requests per second.

Now something happens to your mod_perl server and it starts taking 2s to generate pages. The proxy server continues to get up to 20 requests per second for proxied pages, for each request it tries to connect to the mod_perl server. The mod_perl server can now only handle 5 requests per second though. So the proxy server processes quickly end up waiting in the backlog queue.

Now *all* the mod_proxy processes are in "R" state and handling proxied requests. The result is that the static images -- which under normal conditions are handled quicly -- become delayed until a proxy process is available to handle the request. Eventually the backlog queue will fill up and the proxy server will hand out errors.

> This is a good idea because it is easy to move to a different machine > if the load makes it necessary. However, a simple approach is to > use a non-mod_perl apache as a non-caching proxy front end for the > dynamic content and let it deliver the static pages directly. A > short stack of RewriteRules can arrange this if you use the > [L] or [PT] flags on the matches you want the front end to serve > and the [P] flag on the matches to proxy.

That's what I thought. I'm trying to help others avoid my mistake :)

Use a separate hostname for your pictures, it's a pain on the html authors but it's worth it in the long run.

=====>

> > > 1) Netscape/IE won't intermix slow dynamic requests with fast static requests > > > on the same keep-alive connection > > > > I thought they just opened several connections in parallel without regard > > for the type of content. > > Right, that's the problem. If the two types of content are coming from the > same proxy server (as far as NS/IE is concerned) then they will intermix the > requests and the slow page could hold up several images queued behind it. I > actually suspect IE5 is cleverer about this, but you still know more than it > does.

They have a maximum number of connections they will open at once but I don't think there is any concept of queueing involved.

> > > 2) static images won't be delayed when the proxy gets bogged down waiting on > > > the backend dynamic server. > > Picture the following situation: The dynamic server normally generates pages > in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can > handle 20 connections per second. The mod_proxy runs 200 processes and it > handles static requests very quickly, so it can handle some huge number of > static requests, but it can still only handle 20 proxied requests per second. > > Now something happens to your mod_perl server and it starts taking 2s to > generate pages.

The 'something happens' is the part I don't understand. On a unix server, nothing one httpd process does should affect another one's ability to serve up a static file quickly, mod_perl or not. (Well, almost anyway).

> The proxy server continues to get up to 20 requests per second > for proxied pages, for each request it tries to connect to the mod_perl > server. The mod_perl server can now only handle 5 requests per second though. > So the proxy server processes quickly end up waiting in the backlog queue.

If you are using squid or a caching proxy, those static requests would not be passed to the backend most of the time anyway.

> Now *all* the mod_proxy processes are in "R" state and handling proxied > requests. The result is that the static images -- which under normal > conditions are handled quicly -- become delayed until a proxy process is > available to handle the request. Eventually the backlog queue will fill up and > the proxy server will hand out errors.

But only if it doesn't cache or know how to serve static content itself.

> Use a separate hostname for your pictures, it's a pain on the html authors but > it's worth it in the long run.

That depends on what happens in the long run. If your domain name or vhost changes, all of those non-relative links will have to be fixed again.

==>

> Welcome to the real world however where "something" can and does happen. > Developers accidentally put untuned SQL code in a new page that takes too long > to run. Database backups slow down normal processing. Disks crash slowing down > the RAID array (if you're lucky). Developers include dependencies on services > like mail directly in the web server instead of handling mail asynchronously > and mail servers slow down for no reason at all. etc.

Of course. I have single httpd processes screw up all the time. They don't affect the speed of other httpd processes unless they consume all of the machine's resources or lock something in common. I suppose if you have a small limit on the number of backend programs you could get to a point where they are all busy doing something wrong.

> > If you are using squid or a caching proxy, those static requests > > would not be passed to the backend most of the time anyway. > > Please reread the analysis more carefully. I explained that. That is > precisely the scenario I'm describing faults in.

I read it, but just wasn't convinced. I'd like to understand this better, though. What did you do to show that there is a difference when netscape accesses different hostnames for fast static content as opposed to the same one where a cache responds quickly but dynamic content is slow? I thought Netscape would open 6 or so separate connections regardless and would only wait if all 6 were used. That is, it should not make anything wait unless you have dynamically-generated images (or redirects) tying up the other connections besides the one supplying the main html. Do you have some reason to think it will open fewer connections if they are all to the same host?

###########################################

Makefile.PL: build.pl gets called twice when running from CPAN shell. Because it does 'make' and 'make install' and both call 'manifypods'. Need to add a source change control, so if manifypods was called once for make, it won't be called for 'make install'

###########################################

###########################################

When you do use locking, be very very careful if you use Apache::DBI or similar persistent connections. MySQL threads keep tables locked until the thread ends (connection is closed) or the tables are unlocked. If your session die()'s while tables are locked, they will stay neatly locked as your connection won't be closed either .... This was a nasty one I bumped in to ...

###########################################

scenario: "One Light and One Heavy Server where ALL htmls are Perl-Generated" introduced a lot of info duplication in its tricks section! remove/modify/merge it.

###########################################

What's needed in order to sucessfully debug segfaulting modules under gdb:

Apache::DB/ httpd -X -D DEBUG

> if you set OPTMIZE => '-g', in the Makefile.PL and start httpd under gdb, > it's easy to debug. ###########################################

###########################################

###########################################

########################################### (merge of status.pod and debug.pod)

I think of merging the Apache::Status and the Debug sections since both are very relative and Apache::Status allows you to debug the code in some extend.

META: I think to move here the code to trap errors, to produce nice
messages to user: when the error occurs because of the user mistake,
or something went wrong on the server side. We user errors, add some
code that deployes a CGI's stickiness of variables, so you display
just the erroneous fields (give an code snippet from User Subscribe
from singlesheaven). Notice that error handling is an art, if you are
really concerned for users to be loyal and stay with your service.

like Doug said:

A virtuous Apache module must let at least two people know when a problem has occurred: you, the module's author, and the remote user. You can communicate errors and other exception conditions to yourself by writing out entries to the server log. For alerting the user when a problem has occurred, you can take advantage of the simple but flexible Apache ErrorDocument system, use CGI::Carp, or roll your own error handler.

#####################################################################

Important for both book and guide: The strategy chapter talks about performance improve among other things. The performance chapter doesn't mention it (refer to it). But this is a very important part of it.

#####################################################################

Include very important performance improve notes from:

http://www.apache.org/docs/misc/perf-tuning.html http://www.apache.org/docs/misc/perf.html

#####################################################################

#####################################################################

add benchmarks with keep-alive and without them!

#####################################################################

Notice this package in debug section.

Devel::Symdump - dump symbol names or the symbol table

Apache::Symdump - shows

Apache::Status uses it to show the process' internals

#####################################################################

#####################################################################

#####################################################################

#####################################################################

#####################################################################

Describe the BackLog (performance...)

On that note you might want to set the BackLog parameter (I forget the precise name), it depends on whether you want users to wait indefinitely or just get an error.

#####################################################################

> > What is the best way to have a Location directive apply to an entire > > site except for a single directory? > > Set the site-wide handler in a <Location "/"> and override the handler > for the "register" dir by setting the default handler in <Location > "/register">. Unfortuntaly, I don't know the name of the default > handler.

SetHandler default-handler

#####################################################################

META: add a section about setting and passing environment variables: It includes and merges (PerlSetVar, SetVar and Pass*), %ENV, (creating your own directives?), subprocess

Notes:

* I'd suggest using $r->subprocess_env() instead. I guess %ENV will work in many situations, but it might bite you later when you can't figure out why a particular env variable isn't getting set in certain situations (speaking from experience).

* I was going to suggest that too. %ENV controls the environment of the currently running Perl process, but child processes come from the "subprocess env", which only the call above sets.

#####################################################################

#######################################################################

Add a MOD_PERL_TRACE=all example...

An email:

> > > Any suggestions? How might I debug this? > > > > hmm, can you put a warn() trace in your sub SiteMap, I wonder if it's > > called the first time, but util.pm is not reloaded when Apache restarts > > itself on startup. > > any difference if you turn Off PerlFreshRestart? > > is mod_perl configured as a dso or static? > > > > -Doug > > mod_perl is static (my initial message included commands I used to build > mod_perl/apache). > > PerlFreshRestart Off has no effect. > > It does look like it's failing to load on the second pass, though, since I > get one response from the "warn" you suggested: > > # bin/httpd -X > util.pm: MSELproxy::util about to bootstrap MSELproxy::util ... > [Fri Oct 1 00:43:05 1999] null: ...saw SiteMap... > Syntax error on line 14 of /usr/local/apache/conf/perl.conf: > Invalid command 'SiteMap', perhaps mis-spelled or defined by a > module not included in the server configuration

... more evidence ... output of # MOD_PERL_TRACE=all bin/httpd -X

perl_parse args: '/dev/null' ...allocating perl interpreter...ok constructing perl interpreter...ok ok running perl interpreter...ok mod_perl: 0 END blocks encountered during server startup perl_cmd_require: conf/perl-startup.pl attempting to require `conf/perl-startup.pl' loading perl module 'Apache::Constants::Exports'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::util'...[Fri Oct 1 00:54:26 1999] util.pm: MSELproxy::util about to bootstrap MSELproxy::util ... ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::AccessManager'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::OCLC'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::RLG'...ok blessing cmd_parms=(0xbfffdb2c) [Fri Oct 1 00:54:26 1999] null: ...saw SiteMap... <--- [root@pembroke apache]# loading perl module 'Apache'...ok perl_startup: perl aleady running...ok loading perl module 'Apache'...ok cmd_cleanup: SvREFCNT($MSELproxy::util::$obj) == 1 cmd_cleanup: SvREFCNT($MSELproxy::util::$obj) == 1 loading perl module 'Apache'...ok perl_cmd_require: conf/perl-startup.pl attempting to require `conf/perl-startup.pl' loading perl module 'Apache'...ok loading perl module 'MSELproxy::util'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::AccessManager'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::OCLC'...ok loading perl module 'Apache'...ok loading perl module 'MSELproxy::RLG'...ok Syntax error on line 14 of /usr/local/apache/conf/perl.conf: Invalid command 'SiteMap', perhaps mis-spelled or defined by a module not included in the server configuration

#######################################################################

###################################

###################################

###################################

This is a stuff to be integrated into the DB section, all mostly by jwb:

Date: Thu, 14 Oct 1999 17:21:18 -0700 From: Jeffrey Baker <jwb@cp.net> To: modperl@apache.org, dbi-users@isc.org Cc: sbekman@iil.intel.com Subject: More on web application performance with DBI

Hi all,

I have converted my text-only guide to web application performance using mod_perl and DBI into HTML. The guide now lives alongside my DBI examples page at http://www.saturn5.com/~jwb/dbi-performance.html .

I have also conducted a silly benchmark to see how all of these optimization affect performance. Please remember that it is dangerous to extrapolate the results of a benchmark, especially one as rudimentary as this. With that said please consider the following data.

Environment: DB Server: Oracle 8.0.6, Sun Ultra2, 2 CPUs, 2GB RAM, Sun A1000 disks App Server: Linux, PII 350, 128MB RAM, Apache 1.3.6, mod_perl 1.19 Benchmark Client: ApacheBench on same machine as application server

Each benchmark consisted of a single request selecting one row from the database with a randomly selected primary key. The benchmark was run through 1000 requests with 10 simultaneous clients. The results were recorded using each level of optimization from my tutorial.

Zero optimization: 41.67 requests/second Stage 1 (persistent connections): 140.17 requests/second Stage 2 (bound parameters): 139.20 requests/second Stage 3 (persistent statement handles): 251.13 requests/second

It is interesting that the Stage 2 optimization didn't gain anything over Stage 1. I think this is because of the relative simplicity of my query, the small size of the test database (1000 rows), and the lack of other clients connecting to the database at the same time. In a real application, the cache thrashing that is caused by dynamic SQL statements would probably be detrimental to performance. In any case Stage 2 paves the way for Stage 3, which certainly does increase the request rate!

So, check it out at http://www.saturn5.com/~jwb/dbi-performance.html

Date: Wed, 23 Feb 2000 23:18:23 -0800 From: Jeffrey W. Baker <jwbaker@acm.org> To: modperl@apache.org Subject: Re: Database connection pooling... (beyond Apache::DBI)

Greg Stark wrote: > > Sean Chittenden <sean@serverninjas.com> writes: > > > Howdy. We're all probably pretty familiar with Apache::DBI and > > the fact that it opens a database connection per apache process. Sounds > > groovy and works well with only one or two servers. Everything is gravy > > until you get a cluster of servers, ie 20-30 machines, each with 300+ > > processes. > > 300+ perl processes per machine? No way. The only way that would make _any_ > sense is if your perl code is extremely i/o dependent and your perl code is > extremely light. Even then you're way better off having the i/o operations > queued quickly and processed asynchronously.

This conversation happens on an approximately biweekly schedule, either on modperl or dbi-users, or some other list I have the misfortune of frequenting. Please allow me to expand upon this subject a bit.

I have not yet gotten a satisfactory answer from anyone who starts these threads regarding why they want connection pooling. I suspect that people think it is needed because everyone else (Netscape, Microsoft, Bea) is doing it. There is a particular kind of application where pooled connections are useful, and there are particular situations where it is a waste. Every project I have ever done falls into the latter category, and I can only think of a few cases that fall under the former.

Connection pooling is a system where your application server threads or processes, which number n on a single machine, share a pool of database connections which number fewer than n. This is done to minimize the number of database connections which are open at once, which in turn is supposed to reduce the load on the database server. This is effective when database activity is a small fraction of the total load on the application server. For example, if your application server mostly performs matrix manipulation, and only occassionally hits the database, it would make sense for it to relinquish the connection when it is not in use.

The downside to connection pooling is that it imposes some overhead. The connections must be managed properly, and the scheme should be transparent to the programmer whose code is using it. So when a piece of code requests a database connection, the connection manager needs to decide which one to return. It may have to wait for one to free up, or it may have to open one based on some low-water-mark hueristic. It may also need to decide that a connection consumer has died or gone away, possibly taking the connection with it. So you can see that opening a pooled connection is more computationally expensive than opening a dedicated connection.

This pooling overhead is a total waste of time when the majority of what your application is doing is database-related. If your program will issue 100 queries and performa transaction during the course of fulfilling a request, pooled connections will not make sense. The reason is that Apache already provides a mechanism for killing off database connections in this scenario. If a process or thread is sitting about idle, Apache will come along an terminate it, freeing the database connection in the process. For database-bound or transactional programs, the one-to-one mapping of processes to database connections is ideal.

Pooling is also less attractive because modern databases can handle many connections. Oracle with MTS will run fine with just as many connections as you care to open. The application designer should study how many connections he realistically plans to open. If your application is bound by database performance, it makes sense to cap the number of clients, so you would not allow you applications to open too many connections. If your application is transactional, you don't have any choice but to give each processes its own dedicated connection. If your application is compute-bound, then your database is lightly loaded and you won't mind opening a lot of connections.

The summary is that if your application is database-bound, or is processing transactions, you don't need or even want connection pooling.

###################################

=> Security

It's a good idea to protect your various monitors like perl-status and alike by password. The less information you provide for intruders, the harder their break in task would be!!! (One of the biggest helps you can provide for these bad guys is showing them all the scripts you use if some of them are in public domain, while they can find out most of them by browsing your site. The moment they know the name of the script, they can grab the source of the script from the web (where the script has come from) and learn the source and probably find a few or even many security breaches. Security but obscurity doesn't really works against a determined intruder but it definitely helps to wave away some of the less determined malicious fellas.

e.g:

<Location /sys-monitor> SetHandler perl-script PerlHandler Apache::VMonitor AuthUserFile /home/httpd/perl/.htpasswd AuthGroupFile /dev/null AuthName "SH Admin" AuthType Basic <Limit GET POST> require user foo bar </Limit> </Location>

And the passwd file: /home/httpd/perl/.htpasswd: foo:1SA3h/d27mCp bar:WbWQhZM3m4kl

###################################

> THere's nothing wrong with Ralf's guide per se, but I think > you should mention in your Adding a proxy server section that > mod_rewrite might be necessary if dynamic content is intermixed > with static content.

That sounds reasonable indeed. I'll add it. Don't understand me wrong - I'm not against adding more things, I'm again duplication, which creates a mess. So once you made it clear, we need that - I'll certainly add it.

Would you add something about using mod_rewrite to handle my scenario to the guide?

Perhaps what you're looking for resembles this:

RewriteRule ^/(images|static)/ - [S=1] RewriteRule (.+) http://backend$1 [P,L]

John D Groenveld wrote: > > I've been using mod_proxy > to proxypass my static content away from my /modperl > directories. Now, I'd like to make my root > dynamic and thus pass everything except /images and > /static. > I've looked at the guide and tuning docs, as well > as the mod_proxy docs, but I must be missing > something.

###################################

###################################

###################################

###################################

###################################

###################################

###################################

###################################

Just a snippet to try...

try this (in the mod_perl-x.xx directory):

% make start_httpd % strace -o strace.out -p `cat t/logs/httpd.pid` & % make run_tests % grep open stace.out | grep .htaccess > send_to_modperl_list % make kill_httpd

and send us that file. I have the feeling there's a .htaccess in your tree that the process can't read.

###################################

Apache::RegistryNG is just waiting for more people to bang on it. so, if you make your module a sub-class of Apache::RegistryNG, that will help things move forward a bit :)

###################################

At the strategy sections put (first work on it):

REDUCING THE NUMBER OF LARGE PROCESSES

Unfortunately, simply reducing the size of each HTTPD process is not enough on a very busy site. You also need to reduce the quantity of these processes. This reduces memory consumption even more, and results in fewer processes fighting for the attention of the CPU. If you can reduce the quantity of processes to fit into RAM, your response time is increased even more.

The idea of the techniques outlined below is to offload the normal document delivery (such as static HTML and GIF files) from the mod_perl HTTPD, and let it only handle the mod_perl requests. This way, your large mod_perl HTTPD processes are not tied up delivering simple content when a smaller process could perform the same job more efficiently.

In the techniques below where there are two HTTPD configurations, the same httpd executable can be used for both configurations; there is no need to build HTTPD both with and without mod_perl compiled into it. With Apache 1.3 this can be done with the DSO configuration -- just configure one httpd invocation to dynamically load mod_perl and the other not to do so.

These approaches work best when most of the requests are for static content rather than mod_perl programs. Log file analysis become a bit of a challenge when you have multiple servers running on the same host, since you must log to different files.

TWO MACHINES

The simplest way is to put all static content on one machine, and all mod_perl programs on another. The only trick is to make sure all links are properly coded to refer to the proper host. The static content will be served up by lots of small HTTPD processes (configured not to use mod_perl), and the relatively few mod_perl requests can be handled by the smaller number of large HTTPD processes on the other machine.

The drawback is that you must maintain two machines, and this can get expensive. For extremely large projects, this is the best way to go.

TWO IP ADDRESSES

Similar to above, but one HTTPD runs bound to one IP address, while the other runs bound to another IP address. The only difference is that one machine runs both servers. Total memory usage is reduced because the majority of files are served by the smaller HTTPD processes, so there are fewer large mod_perl HTTPD processes sitting around.

This is accomplished using the httpd.conf directive BindAddress to make each HTTPD respond only to one IP address on this host. One will have mod_perl enabled, and the other will not.

USING ProxyPass WITH TWO SERVERS

To overcome the limitation of the alternate port above, you can use dual Apache HTTPD servers with just slight difference in configuration. Essentially, you set up two servers just as you would with the two port on same IP address method above. However, in your primary HTTPD configuration you add a line like this:

ProxyPass /programs http://localhost:8042/programs

Where your mod_perl enabled HTTPD is running on port 8042, and has only the directory programs within its DocumentRoot. This assumes that you have included the mod_proxy module in your server when it was built.

Now, when you access http://www.domain.com/programs/printenv it will internally be passed through to your HTTPD running on port 8042 as the URL http://localhost:8042/programs/printenv and the result relayed back transparently. To the client, it all seems as if it is just one server running. This can also be used on the dual-host version to hide the second server from view if desired.

The directory structure assumes that F is the C directory, and the the mod_perl programs are in F and F. I start them as follows: daemon httpd daemon httpd -f conf/httpd+perl.conf SQUID ACCELERATOR

Another approach to reducing the number of large HTTPD processes on one machine is to use an accelerator such as Squid (which can be found at http://squid.nlanr.net/Squid/ on the web) between the clients and your large mod_perl HTTPD processes. The idea here is that squid will handle the static objects from its cache while the HTTPD processes will handle mostly just the mod_perl requests once the cache is primed. This reduces the number of HTTPD processes and thus reduces the amount of memory used. To set this up, just install the current version of Squid (at this writing, this is version 1.1.22) and use the RunAccel script to start it. You will need to reconfigure your HTTPD to use an alternate port, such as 8042, rather than its default port 80. To do this, you can either change the F line C or add a C directive to match the port specified in the F file. Your URLs do not need to change. The benefit of using the C directive is that redirected URLs will still use the default port 80 rather than your alternate port, which might reveal your real server location to the outside world and bypass the accelerator. In the F file, you will probably want to add C and C to the C parameter so that these are always passed through to the HTTPD server under the assumption that they always produce different results. This is very similar to the two port, ProxyPass version above, but the Squid cache may be more flexible to fine tune for dynamic documents that do not change on every view. The Squid proxy server also seems to be more stable and robust than the Apache 1.2.4 proxy module. One drawback to using this accelerator is that the logfiles will always report access from IP address 127.0.0.1, which is the local host loopback address. Also, any access permissions or other user tracking that requires the remote IP address will always see the local address. The following code uses a feature of recent mod_perl versions (tested with mod_perl 1.16 and Apache 1.3.3) to trick Apache into logging the real client address and giving that information to mod_perl programs for their purposes. First, in your F file add the following code: use Apache::Constants qw(OK); sub My::SquidRemoteAddr ($) { my $r = shift; if (my ($ip) = $r->header_in('X-Forwarded-For') =~ /([^,\s]+)$/) { $r->connection->remote_ip($ip); } return OK; } Next, add this to your F file: PerlPostReadRequestHandler My::SquidRemoteAddr This will cause every request to have its C address overridden by the value set in the C header added by Squid. Note that if you have multiple proxies between the client and the server, you want the IP address of the last machine before your accelerator. This will be the right-most address in the X-Forwarded-For header (assuming the other proxies append their addresses to this same header, like Squid does.) If you use apache with mod_proxy at your frontend, you can use Ask Bjørn Hansen's mod_proxy_add_forward module from ftp://ftp.netcetera.dk/pub/apache/ to make it insert the C header. ################################### ################################### ################################### ################################### config.pod: use Eric's presentation: http://conferences.oreilly.com/cd/apache/presentations/echolet/contents.html ################################### mod_perl Humour. * mod_perl for embedded devices: Q: mod_perl for my Palm Pilot dumps core when built as a DSO, and the Palm lacks the memory to build statically, what should I do? A: you should get another Palm Pilot to act as a reverse proxy by Eric Cholet. ################################################# ################################################# DBI tips to improve performance: Need to work on the snippets below: What if the user_id has something that needs to be quoted? I speak of the general case. User data should not get anywhere *near* an SQL line... it should always be inserted via placeholders or very very careful consideration to quoting. Ahh, I see. I basically do the latter, with $dbh->quote. The contents of $Session are entirely system-generated. The user gives a ticket through the URL, yes, but that is parsed and validated and checked for presence in the DB before you even get to code that works like I had described. I agree - but you should always be aware of the issues with using placeholders for the database engine that you use. Sybase in particular has a deficient implementation, which tends to run out of space and creates locking contention. Using stored procs instead is a lot better (although it doesn't solve the quoting problems). OTOH, Oracle caches compiled SQL, and using placeholders means it's not caching SQL with specific data in it. The values can get bound into the compiled SQL just as easily, and it speeds things up by a noticable amount (factor of ~3 in my tests) If we are on this topic, I have a few questions. I've just read the DBI manpage, there is a prepare_cached() call. It's useless in mod_cgi if used only once with the same params across the script. If I use Apache::DBI, and replace all prepare statements (which include placeholders) with prepare_cached(). Does it mean that like with modules preloading , the prepare will be called only once per unique statements thru the whole life of the child? Otherwise a usage of placeholders is useless, if you do only one execute() call per unique prepare() statement. The only benefit is of DBI taking handle of quoting the values for you. I don't remember someone mentioned prepare_cached() ever. What's the verdict? Simply adding the "_cached" to "prepare()" in one of my utilities increased the performance eight fold (Oracle non-mod_perl environment). I don't know the fine points of if it is possible to share cached prepares across children (can you even fork with db connections?), but if your code is doing the same query(ies) over and over, definitly give it a try. Not necessarily; it depends on your database. Oracle does caching which persists until it needs the space for something else; if you're finding information about customers, it's much more efficinet for there to be one entry in the library cache like this: select * from customers where customer_id = :p1 than it is for there to be lots of them like: select * from customers where customer_id = 123 select * from customers where customer_id = 465 select * from customers where customer_id = 789 since Oracle has to parse, compile and cache each one separatley. I don't know if other databases do this kind of caching. Ok, this makes sense. I just read the MySQL manual - with all grief, it doesn't cache :( So, I still think to use prepare_cached() to cache on the DBI behalf, but it's said to work thru the life of $dbh and since my $dbh is my() lexicall variable, I don't understand whether I get this benefit or not? I know that Apache::DBI maintains a pool of connections, does it preserver the cache of prepare statements as well (I mean does it preserve the whole $dbh object )? If it does, I get a speedup at least a speedup for the whole life of a single connection. I think that the speedup is even better than the one you have been talking about, since if Oracle caches the prepare statement, DBI still reachs out for Oracle, if it's local cache we get a little more save ups. Anyone deployes the scenario I have tried to present here? Seems like a good candidate for a performance chapter of the guide if it really makes speed better... The statement cursors will be cached per $dbh, which Apache::DBI caches, so there is an extreme performance boost... as your application runs caching all its cursors, database queries will become execution speed, no query parsing will be involved anymore. On Oracle, the performance improvement I saw was 100% by using prepare_cached functionality. If you have just a small number of web servers, the caching difference between Oracle & MySQL will be small on the db end. Its when you have a lot of DBI handles that things might get inefficient. But I'm sure you are running a proxy front end, right Stas? :) Be warned: there are some pitfalls associated with prepare_cached(). It actually gives you a reference to the *same* cached statement handle, not just a similar copy. So you can't do this: my $sth1 = $dbh->prepare_cached('select name from table where id=?'); my $sth2 = $dbh->prepare_cached('select name from table where id=?'); $sth1 & $sth2 are now the same object! If you try to use them independently, they'll stomp all over each other. That said, prepare_cached() can be a huge win when using a slow database like Oracle. For mysql, it doesn't seem to help much, since mysql is so darn fast at preparing its statements. Sometimes you have to be careful about that, yes. For instance, I was repeatedly executing a statement to insert data into a varchar column. The first value to insert just happened to be a number, so DBD::mysql thought that it was a numeric column, and subsequent insertions failed using that same statement handle. I'm not sure what the correct solution should have been in that case, but I reverted back to calling $dbh->quote($val) and putting it directly into the SQL. My opinion is that mysql should do a better job of figuring out which fields are actually numeric and which are strings - i.e. get the info from the database, not from the format of the data I'm passing it. Actually, I'm a big fan of placeholders. I think they make the programming task a lot easier, since you don't have to worry about quoting data values. They can also be quite nice when you've got values in a nice data structure and you want to pass them all to the database - just put them in the bound-vars list, and forget about constructing some big SQL string. I believe mysql just emulates true placeholders by doing the quoting, etc. behind the scenes. So it's probably not much faster to use placeholders than direct embedded values. But I think placeholders are cleaner, generally, and more fun. In my experience, prepare_cached() is just a judgment call. It hasn't seemed to be a big performance win for mysql, so sometimes I use it, sometimes I don't. I always use it with Oracle, though. prepare_cached is implemented by the database handle (and really the database itself). For example, in Oracle it speeds things up. In MySQL, it is exactly the same as prepare() because DBD::mysql does not implement it because MySQL itself has no mechanism for doing this. As I said in a previous message, prepare_cached() don't cache anything under MySQL. However, you can implement your own statement handle caching scheme pretty easily by either subclassing DBI or writing a DB access module of your own (my preferred method). my $db = MyDB->new; my $sql = 'SELECT 1'; my $sth = $db->get_sth($sql); $sth->execute or die $dbh->errstr; my ($numone) = $sth->fetchrow_array; $sth->finish or die $dbh->errstr; # This is doubly necessary with this caching scheme! sub get_sth { my $self = shift; my $sql = shift; return $self->{sth_cache}->{$sql} if exists $self->{sth_cache}->{$sql}; $self->{sth_cache}->{$sql} = $self->{dbh}->prepare($sql) or die $self->{dbh}->errstr; return $self->{sth_cache}->{$sql}; } I've used that in a few situations and it appears to speed things up a bit. For mod_perl, we would probably want to make $self->{sth_cache} global. You know, I just benchmarked this on a machine running PostgreSQL and it didn't actually speed things up (or slow it down). However, I suspect that under mod_perl if this were something that were globally shared inside a child process it might make a difference. Plus it also depends on the database used. (Contributors: Randal L. Schwartz, Steve Willer, Michael Peppler, Mark Cogan, Eric Hammond, Russell D. Weiss, Joshua Chamas, Ken Williams, Peter Grimes) ################################################# As a quick side note, I actually found that it's faster to write the logs directly into a .gz, and read them out of the .gz, through pipes. It takes longer (significantly, by my experience) to read 100 megs from the drive than it does to compress or uncompress 5 megs of data. ################################################# ################################################# ################################################# performance.pod - extend on Apache::TimeIt package ################################################# Add a new section - contributing to the guide - with incentives and guidelines of contributions (diff against pod...) ################################################# ################################################# ################################################# ################################################# security.pod : add Apache:Auth* modules ################################################# ################################################# examples of Apache::Session::DBI code: use strict; use DBI; use Apache::Session::DBI; use CGI; use CGI::Carp qw(fatalsToBrowser); # Recommendation from mod_perl_traps: use Carp (); local $SIG{__WARN__} = \&Carp::cluck; [...] # Initiate a session ID my $session = (); my $opts = { autocommit => 0, lifetime => 3600 }; # 3600 is one hour # Read in the cookie if this is an old session my $r = Apache->request; my $no_cookie = ''; my $cookie = $r->header_in('Cookie'); { # eliminate logging from Apache::Session::DBI's use of `warn' local $^W = 0; if (defined($cookie) && $cookie ne '') { $cookie =~ s/SESSION_ID=(\w*)/$1/; $session = Apache::Session::DBI->open($cookie, $opts); $no_cookie = 'Y' unless defined($session); } # Could have been obsolete - get a new one $session = Apache::Session::DBI->new($opts) unless defined($session); } # Might be a new session, so let's give them a cookie back if (! defined($cookie) || $no_cookie) { local $^W = 0; my $session_cookie = "SESSION_ID=$session->{'_ID'}"; $r->header_out("Set-Cookie" => $session_cookie); } ################################################# ################################################# ########################################################################## ################################################################# ################################################################## ######################################################################## ######################################################################## ########################################################################

3 POD Errors

The following errors were encountered while parsing the POD:

Around line 23:

'=item' outside of any '=over'

Around line 1068:

You forgot a '=back' before '=head1'

Around line 1230:

Non-ASCII character seen before =encoding in 'Bjørn'. Assuming CP1252