|
03-21-2004, 09:05 AM | #1 |
Green Mole
Join Date: Mar 2004
Posts: 6
|
more indexing problems: No link in temporary table
Hi.
I've installed phpdig 1.8.0 but when I attempt to index my site, i get: SITE : http://www.deco-dreams.com/ Exclude paths : - - @NONE@ No link in temporary table links found : 0 ...Was recently indexed Optimizing tables... Indexing complete ! My remote host is running PHP 4.3.3 and MySQL 3.23.49 any help or suggestions? many thanks |
03-21-2004, 09:43 AM | #2 |
Green Mole
Join Date: Mar 2004
Posts: 6
|
i've sinced discovered that when I have my robots.txt in place, indexing fails. If I remove it, indexing works. My robots.txt has these contents:
# /robots.txt file for http://www.deco-dreams.com/ # mail webmaster@deco-dreams.com for constructive criticism User-agent: * Disallow: /unused Disallow: /admindeco Disallow: /decoMyAdmin Disallow: /Connections Disallow: /FX_DataCounter Disallow: /gallery Disallow: /rcdstnav Disallow: /rv Disallow: /rayform11 Disallow: /rayform11b Disallow: /mm Disallow: /rv Disallow: /uploads Disallow: index2.php Disallow: info.php Disallow: results1.php Disallow: results.php Disallow: results2.php Disallow: sp_images.php Am I making an obvious mistake? Thanks |
03-23-2004, 12:08 AM | #3 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Perhaps try the following.
In robot_functions.php is the phpdigReadRobotsTxt function. In this function, replace: PHP Code:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-23-2004, 01:22 AM | #4 |
Green Mole
Join Date: Mar 2004
Posts: 6
|
Hi
Thanks for the reply, Charter. I tried replacing the line but am getting parse errors. Here is the function from the original robots_functions.php: PHP Code:
Thanks again. |
03-23-2004, 01:50 AM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. In the block of code you posted, replace:
PHP Code:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-23-2004, 02:35 AM | #6 |
Green Mole
Join Date: Mar 2004
Posts: 6
|
sorry you had to repeat yourself...the first time i was getting errors, but when i got the fromatting right, the page loaded ok (no parse error) but unfortunately the output is still:
SITE : http://www.deco-dreams.com/ Exclude paths : - - @NONE@ No link in temporary table links found : 0 ...Was recently indexed Optimizing tables... Indexing complete ! |
03-23-2004, 05:34 AM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Do you make your robots.txt file on a Mac? It reads in as only one key value pair. If you have PHP 4.3.0+ then the ini_set function can be used to correct this.
Try replacing the whole phpdigReadRobotsTxt function with the following, as you see it onscreen: Code:
//================================================= //search robots.txt in a site function phpdigReadRobotsTxt($site) { //don't forget the end backslash if (phpdigTestUrl($site.'robots.txt') == 'PLAINTEXT') { @ini_set("auto_detect_line_endings","1"); // needs PHP 4.3.0+ $robots = @file($site.'robots.txt'); while (list($id,$line) = @each($robots)) { if ((strpos(trim($line),"#") === 0) || (trim($line) == "")) continue; if (ereg('^user-agent:[ ]*([a-z0-9*]+)',strtolower($line),$regs)) { if ($regs[1] == "*") { $user_agent = "'$regs[1]'"; } else { $user_agent = $regs[1]; } } if (eregi('[[:blank:]]*disallow:[[:blank:]]*([/]?([a-z0-9_/*+%.-]*))',$line,$regs)) { if ($regs[1] == '/') { $exclude[$user_agent]['@ALL@'] = 1; } else { $exclude[$user_agent][str_replace('*','.*',str_replace('+','\+',str_replace('.','\.',$regs[2])))] = 1; } } elseif (($user_agent == 'phpdig') && (eregi('[[:blank:]]*disallow:[[:blank:]]*',$line,$regs))) { $exclude['@NONE@'] = 1; return $exclude; } } if (isset($exclude['phpdig']) && is_array($exclude['phpdig'])) return $exclude['phpdig']; elseif (isset($exclude['\\'*\\'']) && is_array($exclude['\\'*\\''])) return $exclude['\\'*\\'']; } $exclude['@NONE@'] = 1; return $exclude; } With PHP 4.3.0+ the robots.txt file should now read in as multiple key value pairs, and the other changes in the function should let you index your site, so at a search depth of one you should get the following: SITE : http://www.deco-dreams.com/ Exclude paths : - unused - admindeco - decoMyAdmin - Connections - FX_DataCounter - gallery - rcdstnav - rv - rayform11 - rayform11b - mm - uploads - index2\.php - info\.php - results1\.php - results\.php - results2\.php - sp_images\.php 1:http://www.deco-dreams.com/ (time : 00:00:10) + + + + + + + + level 1... 2:http://www.deco-dreams.com/privacy.php (time : 00:00:28) 3:http://www.deco-dreams.com/links.php (time : 00:00:36) 4:http://www.deco-dreams.com/aboutus.php (time : 00:00:44) 5:http://www.deco-dreams.com/index.php?pageNum_Recordset1=123& (time : 00:00:53) 6:http://www.deco-dreams.com/index.php?pageNum_Recordset1=1& (time : 00:01:02) 7:http://www.deco-dreams.com/buy.php?vartab1_id=676 (time : 00:01:10) 8:http://www.deco-dreams.com/ordering.php (time : 00:01:18) 9:http://www.deco-dreams.com/contactus.php (time : 00:01:29) No link in temporary table -------------------------------------------------------------------------------- links found : 9 http://www.deco-dreams.com/ http://www.deco-dreams.com/privacy.php http://www.deco-dreams.com/links.php http://www.deco-dreams.com/aboutus.php http://www.deco-dreams.com/index.php?pageNum_Recordset1=123& http://www.deco-dreams.com/index.php?pageNum_Recordset1=1& http://www.deco-dreams.com/buy.php?vartab1_id=676 http://www.deco-dreams.com/ordering.php http://www.deco-dreams.com/contactus.php Optimizing tables... Indexing complete ! Remember to remove any "word" wrapping in the above code.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-23-2004, 09:02 AM | #8 |
Green Mole
Join Date: Mar 2004
Posts: 6
|
Well spotted!
I am indeed using a mac (MacOS 10.3.3 Server), and when I looked at my robots.txt I saw that I had created a .txt file with macintosh line endings, as you suggested. I've now saved as unix and indexing is working correctly. Superb! Thanks Charter. |
03-23-2004, 09:04 AM | #9 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Glad it's working. Are you using the new phpdigReadRobotsTxt function from a couple of posts before this post?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-23-2004, 09:33 AM | #10 |
Green Mole
Join Date: Mar 2004
Posts: 6
|
I didn't need to because as soon as I saved as Unix style (I use BBEdit, a mac/unix text editor) and it started indexing ok.
But I've now tested it with the new phpdigReadRobotsTxt function. I created a different robots.txt saved with macintosh style line endings and it seems that your new function does the trick. It's now indexing perfectly. Thank you so much for considering the needs of us mac heads and our eccentric line endings! |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
No link in temporary table yet again... | funeral | Troubleshooting | 2 | 04-06-2005 01:45 PM |
Help Please: No link in temporary table | SystemX | Troubleshooting | 5 | 06-27-2004 10:20 PM |
No link in temporary table | Steve Joynt | Troubleshooting | 1 | 06-10-2004 01:05 AM |
No link in temporary table (yet another one) | renehaentjens | Troubleshooting | 7 | 03-30-2004 10:46 PM |
No link in temporary table | michabis101 | Troubleshooting | 20 | 03-29-2004 01:08 PM |