![]() |
I've got similar problems. I'm trying to index several sites. Sites that have literally thousands of links.
one in particular I was able to index with a different search script (that I didn't like nearly as much) and received over 20,000 links (then I stopped it after I realized I liked this one better). Anyway, the most I can get out of that same site with phpdig is 357 pages. |
last night I manually submitted 4445 pages to phpdig (which had a total of 90,000 + links contained within them).
The script placed all 4445 links in tempspider, but then deleted them one by one without indexing them. First it would change the valus in "index" to '1', then it would delete the page all together (in tempspider). What am I doing wrong? |
Try setting LIMIT_TO_DIRECTORY to false and PHPDIG_IN_DOMAIN to true (both in the config file) and then, from the admin panel, use a large search depth, set links per to zero, and use the no option.
|
thanks for the reply.
trying that now. |
BTW. This is a fantastic script. I'm sure I'm going to love it once I get it working!
|
I must be missing something huge.
It found 312 more links (I tested at a low level). Then said: Optimizing tables... Indexing complete ! but there's still only 311 links for that particular site. in other words, it didn't index the new pages (although some were dupes). |
If any of the links are in heavy JavaScript then PhpDig won't follow them. Try setting a larger search depth, use links per of zero, and the no option. You can increase search depth beyond twenty by changing SPIDER_MAX_LIMIT in the config file. Also, if there are any META revisit after HTML tags, PhpDig attempts to obey those times. Tip: start indexing the site from the sitemap if present and check out this thread.
|
ok, I'll check that out.
at first the links were a problem as they weren't visible to the spider script. But then I dug them out myself to index. sooo... now I've deleted the site and I'm attempting to start over... |
yep. one of my major problems here is the java links.
no way around it, eh? |
DOH! :bang:
I first set phpdig up in a /test/ dir then I moved it to the main dir and forgot to set my permissions. I think this is probably my problem (I hope). As far as that java link thing though. It'd be great if there were a way to get around that issue. Some of the sites I'm indexing are going to be a real problem with that. Thanks again... |
PhpDig tries to follow links in window.open() or window.location() JavaScript but nothing complex. If you want PhpDig to try and follow other JavaScript type links, such as window.navigate(), try editing the following line in the robot_functions.php file, but note that edits to this line won't parse heavy JavaScript like a browser can do:
Code:
while (eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;[[:blank:]]*url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*[\'\"]?((([a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*($allowed_link_chars\[?$allowed_link_chars\]?$allowed_link_chars))(#[.a-zA-Z0-9-]*)?[\'\" ]?",$eval,$regs)) { |
All times are GMT -8. The time now is 06:50 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.