PDA

View Full Version : How to index one page and nothing else


kristian
02-07-2005, 05:21 AM
Hi

I would like to control the indexing process when I do indexing of my dynamic pages. Basically I have generated a list with all the URL's that I would like to index:

(...)
http://localhost/anatomi/index.php?vis=Aa1-1&placering=praeparater
http://localhost/anatomi/index.php?vis=Aa1-10&placering=praeparater
http://localhost/anatomi/index.php?vis=Aa1-100&placering=praeparater
http://localhost/anatomi/index.php?vis=Aa1-101&placering=praeparater
(...)

but when I paste these into the box and start spidering it finds lots of dublicate pages that have allready been indexed. I have set the "search depth" to 0 and the "link per" to 0.

Please if anyone can help me with this...

Charter
02-07-2005, 09:30 AM
Use zero, zero, and also choose no.

kristian
02-07-2005, 10:19 AM
Thanks for the quick reply! I tried with zero, zero, but I'm not sure about the No-option. I will try it later this week.

kristian
02-11-2005, 12:03 AM
I get the same problem with Use zero, zero, and "no" in the "Use values from Update sites table if present and use default values if values absent from table" option.

It still checks all the other links that have been indexed previously. Could it be some other setting? In the config.php maybe?

kristian
02-11-2005, 03:10 AM
Just to clarify: I would like to index one file only and not update all the other files/url's.

Example URL: http://localhost/anatomi/index.php?vis=Aa1-109&placering=praeparater

There's 1000's of pages and it takes very long time if it has to check/update all the url's that have already been indexed. I know that they haven't changend anyway.

I'm using command line as it seems to be more stable.

Charter
02-11-2005, 06:03 AM
Is your tempspider table empty?

kristian
02-11-2005, 06:46 AM
Yes it's empty.

It also says "Temporary table : 0 Entries"

Info:
I'm using PhpDig v.1.8.7
Safe-mode: Off
allow_url_fopen is enabled

Charter
02-11-2005, 06:53 AM
>> I'm using command line as it seems to be more stable.

Missed that.. try the following config options.

define('SPIDER_MAX_LIMIT',0); //max recurse levels in spider
define('LINKS_MAX_LIMIT',0); //max links per each level

kristian
02-11-2005, 07:28 AM
I have tried with these settings:

define('SPIDER_MAX_LIMIT',0); //max recurse levels in spider
define('RESPIDER_LIMIT',0); //recurse respider limit for update
define('LINKS_MAX_LIMIT',0); //max links per each level
define('RELINKS_LIMIT',0); //recurse links limit for an update

Same result.

__________________

Another question:
At some point I will need to spider some pages with iframes. I got that to work earlier, when I set the depth to 1 and links per to 10. I was using the web-interface... and i have also modified config.php so i can dig iframes.

Now I can't really use the web-interface because it want's to index/update everything all the time. And when i does it crashes/stops (sometimes with an apache error). Otherwise it just stops. I doesn't do that with command line.

I have tried to get PhpDig to index the content of the iframes using command line and these settings: define('SPIDER_MAX_LIMIT',1); and define('LINKS_MAX_LIMIT',10); in config.php. But it didn't index the iframes. Should I try other settings or is it not possible to do from command line?

Any help is very welcome. I'm sorry I think this i probably a hard case...

Charter
02-11-2005, 07:56 AM
Go to the admin panel, and click the update sites link. Make sure that links and depth are both zero. Also, there is a mod here (http://www.phpdig.net/forum/showthread.php?t=1095) that you may find useful, although it could need tweaking. You might just try updating one page from the admin panel: click the site, update button, blue arrow, and then green check mark next to the page. PhpDig doesn't index iframe tags. Maybe you modded the robot_functions.php file to include iframe?

kristian
02-17-2005, 03:01 AM
I found a way to do the iframe content indexing, by indexing the folders where the iframe content files are located plus putting some JavaScript in the content-files. The JavaScript redirects the user to the right page.

The problem with indexing one page only might be due to the fact the the URL's in my site-list have query-string in them + maybe some config-settings. I don't know...
This is not a big problem for me now as I have indexed all the pages. Thanks for all the help and for a great search-tool!