YiJing::0x11 - Gu\/ (Web Crawler)


"Beware: There is only a thin line between a crawler and a worm!"



Web crawler: Nice and fun. Suitable to sift in the great Data Flow. Test run for three days before sending it out; analyze the data for three days before sending it out again.



This hexagram is emblematic of the trouble that you would face in writing or managing a web crawler: the program has to strike out on its own and unobtrusively sift great gobs of data in any number of messy formats.

It needs testing and retesting, planning and monitoring. It has to follow old standards and accept new ones, and tolerate sites that don't follow standards at all. To do its work, the web crawler has to get as much data from as many sites as it can, without bothering any webmasters in the process.

It has to be efficient, but deliberate. It is a matter of contradictory goals -- a situation that comes up in all sorts of systems besides web crawlers.



Gathering Data under Standards, is the Image of a Web Crawler. A wise hacker makes careful use of it to provide people with interesting information while maintaining the proper ethics.


  • 初六。干父之蠱。有子。考無咎。厲終吉。

    Crawling and the Data.

    ... The web crawler will harvest some bad data. Make sure it can recover well and move on correctly.

  • 九二。干母之蠱。不可貞。

    Crawling and the Network.

    ... The web crawler should back off from network trouble, wait, compromise, and improvise.

  • 九三。干父小有晦。無大咎。

    Fine tuning.

    ... You'll have to fix some mistakes that shouldn't have been made, but it's no big deal.

  • 六四。裕父之蠱。往見吝。

    Obvious and oblivious.

    ... The implementation is nice and simple, and dangerously wrong. Watch it upset everyone!

  • 六五。干父之蠱。用譽。

    Public attention.

    ... Make it clear that you're listening to what people say. The web crawler depends on the kindness of strangers.

  • 上九。不事王侯。高尚其事。

    Stepping back.

    ... You should act on principle depite the authority's demands, so that you can serve a higher goal.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 11:

Non-ASCII character seen before =encoding in '蠱。元。亨。利涉大川。先甲三日。后甲三日。'. Assuming UTF-8