PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   index intershop-sites? (http://www.phpdig.net/forum/showthread.php?t=710)

comko 03-23-2004 10:28 AM

index intershop-sites?
 
Hi folk,

i'm successfull installed phpdig on my linux server (LAMP) - great work, guys!!! thanks for it.
no probs with the most sites, i will search for.
but, only one won't: www.junfermann.de is the site, generated with INTERSHOP, and that's my problem:
all pages where contained, are WITHOUT any "pages". it's true, e.g. one page is : http://www.junfermann.de/cgi-bin/junfermann.storefront/DE/Catalog/1011/
Q: how can i index this site? :bang:

thanks a lot for help:
Ingo

Charter 03-24-2004 03:15 PM

Hi. First, download the ZIP file in this thread and replace robot_functions.php with the one in the ZIP file.

Next, in the new robot_functions.php file, search for "<frame" (without the quotes) and on this line add in a [[:blank:]]* so that:
PHP Code:

while (blah blah *content=['\"][0-9]+;url blah blah) { 

becomes the following:
PHP Code:

while (blah blah *content=['\"][0-9]+;[[:blank:]]*url blah blah) { 

Last, the server for that site does not return a content-type for certain pages. You would need to force the content-type for these pages so, in the new robot_functions.php file, search for:
PHP Code:

if (!eregi('[a-z0-9]+',$answer)) { 

and right before that line add:
PHP Code:

// THIS CODE IS ONLY FOR WHEN CONTENT-TYPE IS NOT RETURNED
// IT IS NOT FOR GENERAL INCLUSION IN THE CORE PHPDIG CODE
elseif (!eregi("Content-Type: *([a-z]+)/([a-z.-]+)",$answer,$regs)) {
   
$status 'HTML'// no content-type so set to html


Remember to remove any "word" wrapping in the above code.

comko 03-24-2004 11:33 PM

Yikes, it works!!
thanks for geat work!

Ingo

malieut 03-30-2004 06:09 AM

hello Charter,
I followed your suggestion,but got this screen as the following when i click the dig this button.

------------------------------
Spidering in progress...

------------------------------
Nothing happened else, the spidering is end.

Charter 03-30-2004 08:22 AM

Hi. Recheck the mods and make sure to refresh the admin/index.php page before indexing. Only apply the last bit of code if a content type is not returned, which is generally not the case.


All times are GMT -8. The time now is 08:33 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.