Changes for version 0.09003_03
- POE
- Note: Changes for POE engine contained in this release are relatively critical. If you were having problems before, you probably should check this release out.
- Be smarter how dispatch() gets called. Now we do a more effective invocation of the dispatch state so that we don't waste cycles just trying to dispatch requests.
- Allow "0" setting in keepalive.keep_alive. This is a very important parameter if you're using Gungho through a proxy. If you enable this while under a proxy, PoCo::Client::Keepalive will think that you should be using the cached connection to the proxy and so Gungho will lose all parallism.
- Allow setting the number of PoCo::Client::HTTP to be spawned via client.spawn parameter. This is required if you're dealing with relatively large amounts of URLs at once. Otherwise, PoCo::Client::HTTP will tend to jam up after a while.
Changes for version 0.09003_02
- Throttle
- Fix Throttling to delegate throttling decisions. This allows you to stack throttlers.
- Update prerequisite for Data::Throttler::Memcached
Changes for version 0.09003_01
- General
- Upload blunder. I meant to upload this as 0.09002_01, but I forgot to rename the file. I don't wish for 0.09002 to be a general release, so heres' 0.09003_01 with no code changes.
Changes for version 0.09002_01
- General
- DNS will not be resolved by Gungho if you do one of the following:
- specify dns => { disable => 1 } in your config
- specify client => { proxy => ... } in your config (POE engine only)
- specify HTTP_PROXY in the environment (POE engine only)
- DNS will not be resolved by Gungho if you do one of the following:
- Tests
- Add more tests
Documentation
An Extensible, High-Performance Web Crawler Framework
Modules
Yet Another High Performance Web Crawler Framework
Base Class For Various Gungho Objects
Base For Classes That Won't Be Instantiated
Component Base Class For Gungho
Base Class For WWW Authentication
Add Basic Auth To Gungho
Block Requests With Private IP Address
Use Cache In Your App
Gungho Core Methods
Respect robots.txt
A Rule Object
RobotRules Storage Base Class
Cache Storage For RobotRules
DB_File Storage For RobotRules
Automatically Parse Robots META
Web::Scraper From Within Gungho
Routines To Setup Gungho
Base Class To Throttle Requests
Throttle By Domain
Throttle By Number Of Requests
Data::Throttler Based Throttling
Base Class For Gungho Engine
Gungho Engine Using Danga::Socket
IO::Async Engine
POE Engine For Gungho
Gungho Exceptions
Base Class For Gungho Handlers
Write Out Fetched Contents To File
A Handler That Does Nothing
Inline Your Providers And Handlers
Log Base Class For Gungho
Log::Dispatch-Based Log For Gungho
Simple Gungho Log Class
Gungho Plugin Base Class
Stop Execution In Long-Running Processes
Log Requests
Keep Track Of Time To Finish Request
Gather Crawler Statistics
Format Statistics As XML
Base Class For Gungho Prividers
Provide Requests From A Simple File
An In-Memory, Simple Provider
Specify requests in YAML format
A Gungho Request Object
HTTP specific utilities
Gungho HTTP Response Object
Provides
in lib/Gungho/Engine/IO/Async.pm
in lib/Gungho/Inline.pm
in lib/Gungho/Plugin/Apoptosis.pm
in lib/Gungho/Inline.pm
Examples
- examples/robotrules/simple.yml
- examples/simple-file/simple-file.yml
- examples/simple-file/url.txt
- examples/simple-log-dispatch/simple.yml
- examples/simple-write-to-file/simple-write-to-file.yml
- examples/simple/simple.yml
- examples/throttle-simple/throttle-simple.yml
- examples/throttle-simple/url.txt
- examples/yaml/config.yml
- examples/yaml/url.yml