Revision history for WWW-Crawler-Mojo
0.26 2019/11/13
0.25 2019/11/13
- Reduce memory usage.
0.24 2019/08/16
- Fixed a bug where wrong schemas in links would not be omitted.
0.23 2019/04/02
- Now enqueue methods returns the actual jobs
- Now it's breadth-first on memory capacity control
0.22 2019/03/27
- Fixed default error callback
0.21 2019/03/06
- Adjusted to latest Mojolicious release.
- Fix a bug where url including white spaces haven't been treated propery.
- Form submition now includes passwords and dates.
- Started Appveyor to test against windows.
0.20 2017/01/17
- Added an ability to modify html handlers for scraping.
- Improved form submission emulator to respect unnamed submit-type elements.
0.19 2016/05/23
- Updated tests in favor of Mojo::Home recent changes
0.18 2016/03/16
- Added cap attribute for queue to limit the length
0.17 2016/03/15
- Fixed form submition emulation to work
0.16 2016/01/14
- Added req event for such as request modification (gfdev).
0.15 2015/11/04
- Fixed a bug where Mysql queue has not skipped correctly.
- Fixed a bug where form submition emulation has failed on certain condition.
0.14 2015/10/30
- Added experimental MySQL support for queue (harshals).
- Improved documentations.
- Added some example codes.
- Improved form submittion emulation on unselected and multi selected options.
0.13 2015/03/25
- Removed referrer_url attribute from job class.
- Changed scrape API significantly.
- Added context option for scrape method.
- Improved documentations.
- Improved examples.
0.12 2015/02/27
- Updated dependency to Mojolicious v6.0.
0.11 2015/02/19
- Updated dependency to Mojolicious v5.79.
- Fixed a bug where connection count didn't detected for redirected urls.
- Fixed a bug on checkbot example.
- Fixed small bug on form submittion.
- Fixed small bug on base tag detection.
- Fixed small bug on charset detection.
- Removed refer event in favor of callback for scraper.
- Removed peeping server feature at all.
- Deprecated resolved_uri attribute of Job class in favor of url.
- Deprecated original_uri in favor of original_url.
- Improved to crawl sitemap.xml too.
- Improved html document detection by accepting more html like mime types.
- Improved to use less memory.
- Improved internal codes.
- Added clock_speed attribute.
0.10 2015/02/08
- Removed collect_urls_html method in favor of scrape.
- Improved documentaion.
- Improved url detection in CSSs.
- Improved element handlers that is now well customizable with CSS selectors.
0.09 2015/02/03
- Removed depth option.
- Changed URL collecting method to instance-methods.
- Improved form manipulation for emulating manual submition.
- Improved documentation.
0.08 2015/01/29
- Renamed browse method to scrape.
- Improved documents.
0.07 2015/01/26
- Removed additional_props on job class.
- Renamed discover method to browse.
- Removed peeping attribute in favor of peeping_port.
- Improved error event API.
- Added a feature for auto stop crawling when queue get empty.
- Improved checkbot example.
- Improved tests.
- Improved documents.
0.06 2014/09/20
- Improved URL detection.
0.05 2014/09/15
- Fixed a bug on request body generation.
0.04 2014/09/07
- Fixed class and method terminology.
0.03 2014/09/07
- Improved that resolved URIs are alway a Mojo::URL instance.
- Added original_uri to retrieve original uri from redirect history.
- Added requeue method to re-try in case an error occured.
0.02 2014/09/07
- Recovering failing release
0.01 2014/09/07
- initial release