PDA

View Full Version : Can't Index Here Either


smirk
04-11-2005, 03:39 AM
I have this same problem trying to index my site at:
http://socialwave.net/forums/index.php

It's a rather complex site and just about all of the URLs are dynamic calls. I don't know if that has anything to do with the difficulties with indexing. The version of PHPdig that's on my site right now is actually an older version. I'm testing the stable release on my site via a testing server and it's just not having much luck.

PHPdig only seems to like a few of the pages that I ask it to start indexing from. I have good luck indexing the one above, except that it's really cumbersome to index my entire site from the main index. I have to set the recursion to 5 and links to 60 for it to actually get into any meaningful content.

I really want to index from a series of search index pages that would send PHPdig right to the meat of the new content, but it doesn't like any of these search index pages that I feed it. I get the No Link in Temporary Table error:

SITE : http://socialwave.org/
Exclude paths :
- @NONE@
1:http://socialwave.org/searchindex/
(time : 00:00:06)
No link in temporary table

My search index URLs are not publicly accessible, but here's some sample HTML from one of the pages (/searchindex/index_posts.php)


<HTML>
<HEAD><TITLE>Social Wave Search Index Page</TITLE>
<meta name="ROBOTS" content="INDEX, FOLLOW">
</HEAD>
<BODY>

<A HREF="http://socialwave.net/forums/index.php?showtopic=1355">We all live in California now. Gas avg. over $2/gal.</A>
<br />
<A HREF="http://socialwave.net/forums/index.php?showtopic=1418">Review: Hyperlinked Web Services(Prof Svcs)</A>
<br />
<A HREF="http://socialwave.net/forums/index.php?showtopic=1417">Review: Hyperlinked Web Services(Prof Svcs)</A><br />


... You get the idea. It's nothing special, but I can't index it.

I've checked my server settings and fopen_url is enabled and LIMIT_TO_DIRECTORY is set to false. I modified the permissions to all the admin directories accordingly too.

HELP! How do I get PHPDig to index my site from the indexing points that I want it to start at instead of always starting over at the front door of my site each time I want to index?

Charter
04-11-2005, 06:05 AM
> It's nothing special, but I can't index it.

See section 2.4 of the documentation (http://www.phpdig.net/navigation.php?action=doc#toc2) for how to index content protected by an htaccess file: http://username:password@www.domain.com

smirk
04-11-2005, 01:46 PM
Thanks Chart, but the site I'm trying to index wasn't protected by an .htaccess file at the time I tried to index it. I tried it again. I unprotected the site, and asked PHPdig to start indexing from several different URIs. It had varying levels of success.

The problems that its having on my site is actually my secondary concern right now. As a first step, I'm just trying to understand why PHPdig is having trouble indexing the following HTML file below. It's pretty simple. It indexes just the page and doesn't follow any of the links. I get a a "No Link Found in Temporary Table" error whenever I try to index it.

<HTML>
<HEAD><TITLE>Social Wave Search Index Page</TITLE>
<meta name="ROBOTS" content="INDEX, FOLLOW">
</HEAD>
<BODY>
<TABLE BORDER='0' cellpadding='4'><TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1355">We all live in California now. Gas avg. over $2/gal.</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1418">Review: Hyperlinked Web Services(Prof Svcs)</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1417">Review: Hyperlinked Web Services(Prof Svcs)</A>

<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1413">Groups: Social Groups User Guide</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1412">Calendar: Events Calendar Userguide</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1411">Email: Preventing Email Services from Marking Social Wave Email as Spam</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1410"> What is Social Wave?</A>

<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1405">Email Notifications</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1403">sadfsdf adsf</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1402">adfasdf sadf</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1382">Review: dfasd(Misc)</A>
<br /></TD></TR>

<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1381">Review: dsfdasf(Entertainment)</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1380">Review: SDAADSF(Dining)</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1379">Review: blue blue blue "(Phone/Data)</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1378">dfs d</A>
<br /></TD></TR>
<TR>

<A HREF="http://socialwave.net/forums/index.php?showtopic=1354">S.F. International Asian American Film Festival</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1344">Cinequest Film Festival</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=384">NHL Hockey Lockout Next Season?</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1352">Review: Wild Parrots of Telegraph Hill(Movies)</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1350">Better Business Bureau, Worth the $$$ to Join?</A>

<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=549">College Hockey</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1349">Review: Born into Brothels(Movies)</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1348">Review: Be Cool(Movies)</A>
<br /></TD></TR>
<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1339">Who has the Best Burgers in the South Bay?</A>
<br /></TD></TR>

<TR>
<A HREF="http://socialwave.net/forums/index.php?showtopic=1300">CSET Test Advice</A>
<br /></TD></TR>
</TABLE>
<br /><br /><br /></body>
</HTML>

smirk
04-11-2005, 01:47 PM
BTW, I have tried indexing one of my other sites which is located on a virtual host on the same server and it seemed to be fine for it. That other site is http://smirking.com. I just had it start at the homepage and it crawled through the number of levels and links the way I expected it to.

smirk
04-11-2005, 01:57 PM
Scuse the multiple posting. I'm just trying a few other things to see if it works. I tried running it on another Invision board that I manage and it doesn't seem to work at all.

On http://valiantclan.org/forums/index.php

Spidering stops immediately after I start it. Once the spider starts running I get this message with no delay:

Spidering in progress...
Optimizing tables...
Indexing complete !

Levels and Links were set to 10 or above.

Charter
04-12-2005, 08:31 AM
Set PHPDIG_IN_DOMAIN to true in the config file. When that constant is false, xyz.domain.com versus domain.com are treated differently. Also try indexing with and without the www in the links you enter in the text box. Some servers are okay with http://domain.com while others still require the full http://www.domain.com. It's a matter of how the DNS is set.