2012年5月3日星期四

crawler 0.1.0

crawler 0.1.0 : Python Package Index

crawler 0.1.0

python crawler.

Latest Version:
0.1.2

python crawler.
=====
## Example
=====

from crawler.crawler import Crawler

mycrawler = Crawler()
seeds = ['http://www.example.com/'] # list of url
mycrawler.add_seeds(seeds)
url_patterns = ['^(.+example\.com)(.+)

] # list of regular expression for urls that crawler will work on. mycrawler.start(url_patterns) # start crawling ################# data files ################# three database (Berkeley DB) files will be generated. queue.db webpage.db duplcheck.db

 

没有评论:

发表评论