PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 09-08-2005, 05:42 PM   #1
ashyra
Green Mole
 
Join Date: Sep 2005
Posts: 8
I've got similar problems. I'm trying to index several sites. Sites that have literally thousands of links.

one in particular I was able to index with a different search script (that I didn't like nearly as much) and received over 20,000 links (then I stopped it after I realized I liked this one better).

Anyway, the most I can get out of that same site with phpdig is 357 pages.
ashyra is offline   Reply With Quote
Old 09-09-2005, 06:09 AM   #2
ashyra
Green Mole
 
Join Date: Sep 2005
Posts: 8
last night I manually submitted 4445 pages to phpdig (which had a total of 90,000 + links contained within them).

The script placed all 4445 links in tempspider, but then deleted them one by one without indexing them. First it would change the valus in "index" to '1', then it would delete the page all together (in tempspider).

What am I doing wrong?
ashyra is offline   Reply With Quote
Old 09-09-2005, 06:15 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Try setting LIMIT_TO_DIRECTORY to false and PHPDIG_IN_DOMAIN to true (both in the config file) and then, from the admin panel, use a large search depth, set links per to zero, and use the no option.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 09-09-2005, 06:26 AM   #4
ashyra
Green Mole
 
Join Date: Sep 2005
Posts: 8
thanks for the reply.

trying that now.
ashyra is offline   Reply With Quote
Old 09-09-2005, 06:28 AM   #5
ashyra
Green Mole
 
Join Date: Sep 2005
Posts: 8
BTW. This is a fantastic script. I'm sure I'm going to love it once I get it working!
ashyra is offline   Reply With Quote
Old 09-09-2005, 07:03 AM   #6
ashyra
Green Mole
 
Join Date: Sep 2005
Posts: 8
I must be missing something huge.

It found 312 more links (I tested at a low level). Then said:

Optimizing tables...
Indexing complete !

but there's still only 311 links for that particular site.

in other words, it didn't index the new pages (although some were dupes).
ashyra is offline   Reply With Quote
Old 09-09-2005, 08:31 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
If any of the links are in heavy JavaScript then PhpDig won't follow them. Try setting a larger search depth, use links per of zero, and the no option. You can increase search depth beyond twenty by changing SPIDER_MAX_LIMIT in the config file. Also, if there are any META revisit after HTML tags, PhpDig attempts to obey those times. Tip: start indexing the site from the sitemap if present and check out this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 09-09-2005, 09:57 AM   #8
ashyra
Green Mole
 
Join Date: Sep 2005
Posts: 8
ok, I'll check that out.

at first the links were a problem as they weren't visible to the spider script. But then I dug them out myself to index.

sooo... now I've deleted the site and I'm attempting to start over...
ashyra is offline   Reply With Quote
Old 09-09-2005, 10:41 AM   #9
ashyra
Green Mole
 
Join Date: Sep 2005
Posts: 8
yep. one of my major problems here is the java links.

no way around it, eh?
ashyra is offline   Reply With Quote
Old 09-10-2005, 04:48 AM   #10
ashyra
Green Mole
 
Join Date: Sep 2005
Posts: 8
Exclamation

DOH!

I first set phpdig up in a /test/ dir

then I moved it to the main dir and forgot to set my permissions.

I think this is probably my problem (I hope).

As far as that java link thing though. It'd be great if there were a way to get around that issue. Some of the sites I'm indexing are going to be a real problem with that.

Thanks again...
ashyra is offline   Reply With Quote
Old 09-11-2005, 07:02 AM   #11
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
PhpDig tries to follow links in window.open() or window.location() JavaScript but nothing complex. If you want PhpDig to try and follow other JavaScript type links, such as window.navigate(), try editing the following line in the robot_functions.php file, but note that edits to this line won't parse heavy JavaScript like a browser can do:
Code:
         while (eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;[[:blank:]]*url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*[\'\"]?((([a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*($allowed_link_chars\[?$allowed_link_chars\]?$allowed_link_chars))(#[.a-zA-Z0-9-]*)?[\'\" ]?",$eval,$regs)) {
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
indexing pdf problems hetest Troubleshooting 1 01-25-2008 03:21 PM
again : Indexing problems : no link flloweb sylvain Troubleshooting 1 10-26-2005 10:33 PM
problems indexing site. sfbell Troubleshooting 1 09-30-2004 09:36 AM
Indexing problems - IIS on XP darrenm Script Installation 1 05-07-2004 03:30 AM
Problems Indexing obscure Troubleshooting 1 02-12-2004 10:32 AM


All times are GMT -8. The time now is 01:12 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.