|
02-10-2004, 09:23 AM | #1 |
Green Mole
Join Date: Feb 2004
Posts: 7
|
Indexation problems
Hi,
I encounter 3 different problems: 1) Most of my urls look like this : http://resoform/services/index.php?s...a-liste&m=tous the script which is called is always index.php. For each page found the spider tries to index index.php without the query string. That makes no sense. The complete URIs are well indexed but the url "index.php" (without query string) is also found each time and identified as a doublon. Is it possible to change this ? 2) I have some listing pages where the navigation operates with links through the different pages. Because links are provided to the first, last, next or previous page the number of levels required to visit all the items of the list may be very important (perhaps greater than the phpDig limit). Is there a solution to this problem ? 3) I used phpDig html comments to exclude and include some parts of the html code. However I saw that some links which should have been excluded where visited. I don't feel that is normal. Does the exclude comment stop the content being indexed and the links being followed ? Thanks for your help, and sorry to ask 3 questions at a time Régis
__________________
Régis |
02-11-2004, 01:58 AM | #2 |
Orange Mole
Join Date: Feb 2004
Posts: 47
|
hello rpiel,
try: 1) config.php line 97: define('PHPDIG_DEFAULT_INDEX',false); set false to true 2) config.php line 84-86: define('SPIDER_MAX_LIMIT',100); define('SPIDER_DEFAULT_LIMIT',100); define('RESPIDER_LIMIT',100); set limit eg. to 100 or more 3) you have to use the expression set in line 92 and 94 in config.php: default <!-- phpdigExclude --><!-- phpdigInclude --> in your html code use exclusive lines eg. <html> <body> text to be searched.... <!-- phpdigExclude --> text not to be searched.... <!-- phpdigInclude --> ...... </body> </html> hope this helps a little tomas |
02-11-2004, 04:49 AM | #3 |
Green Mole
Join Date: Feb 2004
Posts: 7
|
Hello Tomas,
Thank you for your answer. 1) defining PHPDIG_DEFAULT_INDEX to true did'nt solve my problem : now the url http://resodorm/services/ is indexed x times. This takes a few seconds each time, after which the spider sees that there is a doublon. Anyway, in my system the script "index.php" isn't a page by itself but only with dynamic inclusion of other scripts and templates. It does'nt make sense to crawl it. 2) OK, putting the limit very high seems to be a solution. However when I do this the same pages are indexed a lot of times ans the whole process takes several hours when it should takes about 15 minutes... I think it would be a more reliable solution to maintain (or compute) a page containing all the links to index. This might solve the problem of requiring multiple levels to go through the items of a list. 3) I did use <!-- phpdigExclude --><!-- phpdigInclude --> comments in my pages, but sometimes the result was not what I expected. sincerely Régis |
02-11-2004, 05:24 AM | #4 |
Green Mole
Join Date: Feb 2004
Posts: 7
|
About <!-- phpdigExclude --><!-- phpdigInclude --> comments :
They appear to work in indexing or not indexing the content of a page. The words in excludes parts of the document are not indexed. However it seems to me that the links in these parts are followed. That is exactly what I want to avoid ! Has anyone dealed with this issue ? Thanks by advance, Régis |
02-11-2004, 12:44 PM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. For one, try the code in this post but replace:
PHP Code:
PHP Code:
For three, this thread may help.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-11-2004, 11:52 PM | #6 |
Green Mole
Join Date: Feb 2004
Posts: 7
|
Hi Charter,
Thank you for your answer. I get the better results in building a special page for the indexing process. This page has a robots meta tag with "noindex, follow" content. Because the parts of the site I want to index can be retrieved by a simple query on my database, building this special page was easy. Now I only need a depth of 1 for spidering process and it runs very fast. All the pages I want to be indexed are retrieved just one time. I think it is always (when possible), a good solution to build special pages in order to index the site. The path can be simplified and this will avoid to test a lot of pages to see if there are doublons. For three I have to make some tests. Sincerely, Régis |
Thread Tools | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
trouble indexation pdf | lolodev | External Binaries | 0 | 07-17-2008 01:47 PM |
Indexation localhost | roothotgic | Troubleshooting | 1 | 06-08-2005 08:45 AM |
indexation pdf doc et xls | yoann | Mod Submissions | 0 | 09-26-2003 08:49 AM |