PDA

View Full Version : Site not updating - head requests?


guinessec
11-30-2004, 01:53 PM
Hi,

When I try an spider a site again from the admin control panel, no new links are found (even though I know there are some there).
The site did successfuly index a portion of the site initially, but form this point on, no further pages were indexed then, or on any further spider attempt.

Output:
SITE : http://www.mydomain.com/
Exclude paths :
- @NONE@
<SNIP>
No link in temporary table
links found : 0

My 'LIMIT_DAYS' is set to 0 and my php allow_url_fopen is set to On, safemode is off, and I am trying to respider by clicking on the 'green checkmark' next to the 'root' in the update section of the phpdig contol panel.

I have also tried a fresh attempt by using these instructions from another thread:

# empty all the PhpDig database tables
# delete all files that may be in the temp dir
# delete all files in the text_content dir except keepalive.txt
# run spider.php from a browser

When I try to telnet, as suggested in an older thread, I do recieve a "Connection closed by foreign host." which seems to suggest that 'head requests' may be a problem???? Er, maybe?

So my question, how does the server admin allow head request, and are there any security implications?

My site has over 35000 pages, but I'm sure that isn't an issue for phpdig.

Thanks,

Dale.

guinessec
12-01-2004, 11:14 AM
Quick update on this thread:

I have manage to force updates using this command from shell:

php -f path/spider.php forceall

As an estimate, it will take about 4 days to crawl my whole site.

I was thinking too, maybe I've been searching for a more complicated answer to my query.

Because I have "<meta name="Revisit-after" content="5 Days">" on all my web pages, would this prevent phpdig from spidering a second time from the browser admin interface?

Thanks,

Dale.

darjanp
12-14-2004, 10:16 AM
I have the same problem. Spider won't index new link in my index.shtml, instead it tries to index some other pages which allready exist in index (returning: File date unchanged). Even if this files are not linked to index.shtml.

I have tried everything... just can't get it to work right.
It' the same if I use versin 1.8.5 or 1.8.4

Also, on result page I can't get it to display meta tags.

Older versions 1.6.x were better on this issues and I'm thinking to install old version of phpDig again.

Charter
12-14-2004, 12:03 PM
PhpDig tries to follow META revisit-after unless doing a force. The update is for updating what is already there. If you don't want a page, click the delete icon. If you want to reindex, use the textbox. If you want to index across directories, set LIMIT_TO_DIRECTORY to false in the config file. If you want it to index many pages, set search depth to a large number and set links per to zero. If you want to increase the max search depth number, change the *_MAX_LIMIT constants in the config file. If you use an old version of PhpDig, you may find yourself exploited. If you want META description and META keywords in search results, set APPEND_TITLE_META to true, again in the config file.

darjanp
12-14-2004, 09:41 PM
I allredy did all the things you written before.
I have APPEND_TITLE_META true, and DESCRIPTION on true, snipets on false.
Still, on result paegs, there is no Meta description, instead it always displays text from body - the same as the snippets does.

As for updating... I have very dinamic page (new movies) and I add upt to 10 pages every day. Now, in the version 1.6.x all I have to do was to update index.shtml, where all the links to newly added movies are. Spider found all the new html pages and stop.

In the version 1.8.5 spider doesn't find new pages, even if links are on index.shtml. No matter in what way I try. I tried from text box (puting link to index.shtml in), I have LIMIT_TO_DIRECTORY to false, I have tried depth from 2 to 20, link depth to 0... It doesn't work. Spider always tries to spider some other htmls, which are not even linked to index.shtml in any way (and they were allready indexed some time ago).

Now, if I put all the new htmls in text box manually, then it works.

Charter
12-15-2004, 03:00 AM
Things used to update like this (http://www.phpdig.net/forum/showthread.php?t=1161) but now they are different. Maybe this (http://www.phpdig.net/forum/showthread.php?t=1586) might help you with META tags. Maybe run the "clean" options too.