![]() |
|
![]() |
#1 |
Green Mole
Join Date: Sep 2005
Posts: 8
|
I've got similar problems. I'm trying to index several sites. Sites that have literally thousands of links.
one in particular I was able to index with a different search script (that I didn't like nearly as much) and received over 20,000 links (then I stopped it after I realized I liked this one better). Anyway, the most I can get out of that same site with phpdig is 357 pages. |
![]() |
![]() |
![]() |
#2 |
Green Mole
Join Date: Sep 2005
Posts: 8
|
last night I manually submitted 4445 pages to phpdig (which had a total of 90,000 + links contained within them).
The script placed all 4445 links in tempspider, but then deleted them one by one without indexing them. First it would change the valus in "index" to '1', then it would delete the page all together (in tempspider). What am I doing wrong? |
![]() |
![]() |
![]() |
#3 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Try setting LIMIT_TO_DIRECTORY to false and PHPDIG_IN_DOMAIN to true (both in the config file) and then, from the admin panel, use a large search depth, set links per to zero, and use the no option.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#4 |
Green Mole
Join Date: Sep 2005
Posts: 8
|
thanks for the reply.
trying that now. |
![]() |
![]() |
![]() |
#5 |
Green Mole
Join Date: Sep 2005
Posts: 8
|
BTW. This is a fantastic script. I'm sure I'm going to love it once I get it working!
|
![]() |
![]() |
![]() |
#6 |
Green Mole
Join Date: Sep 2005
Posts: 8
|
I must be missing something huge.
It found 312 more links (I tested at a low level). Then said: Optimizing tables... Indexing complete ! but there's still only 311 links for that particular site. in other words, it didn't index the new pages (although some were dupes). |
![]() |
![]() |
![]() |
#7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
If any of the links are in heavy JavaScript then PhpDig won't follow them. Try setting a larger search depth, use links per of zero, and the no option. You can increase search depth beyond twenty by changing SPIDER_MAX_LIMIT in the config file. Also, if there are any META revisit after HTML tags, PhpDig attempts to obey those times. Tip: start indexing the site from the sitemap if present and check out this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#8 |
Green Mole
Join Date: Sep 2005
Posts: 8
|
ok, I'll check that out.
at first the links were a problem as they weren't visible to the spider script. But then I dug them out myself to index. sooo... now I've deleted the site and I'm attempting to start over... |
![]() |
![]() |
![]() |
#9 |
Green Mole
Join Date: Sep 2005
Posts: 8
|
yep. one of my major problems here is the java links.
no way around it, eh? |
![]() |
![]() |
![]() |
#10 |
Green Mole
Join Date: Sep 2005
Posts: 8
|
![]()
DOH!
![]() I first set phpdig up in a /test/ dir then I moved it to the main dir and forgot to set my permissions. I think this is probably my problem (I hope). As far as that java link thing though. It'd be great if there were a way to get around that issue. Some of the sites I'm indexing are going to be a real problem with that. Thanks again... |
![]() |
![]() |
![]() |
#11 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
PhpDig tries to follow links in window.open() or window.location() JavaScript but nothing complex. If you want PhpDig to try and follow other JavaScript type links, such as window.navigate(), try editing the following line in the robot_functions.php file, but note that edits to this line won't parse heavy JavaScript like a browser can do:
Code:
while (eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;[[:blank:]]*url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*[\'\"]?((([a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*($allowed_link_chars\[?$allowed_link_chars\]?$allowed_link_chars))(#[.a-zA-Z0-9-]*)?[\'\" ]?",$eval,$regs)) {
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
indexing pdf problems | hetest | Troubleshooting | 1 | 01-25-2008 03:21 PM |
again : Indexing problems : no link flloweb | sylvain | Troubleshooting | 1 | 10-26-2005 10:33 PM |
problems indexing site. | sfbell | Troubleshooting | 1 | 09-30-2004 09:36 AM |
Indexing problems - IIS on XP | darrenm | Script Installation | 1 | 05-07-2004 03:30 AM |
Problems Indexing | obscure | Troubleshooting | 1 | 02-12-2004 10:32 AM |