PDA

View Full Version : index intershop-sites?


comko
03-23-2004, 11:28 AM
Hi folk,

i'm successfull installed phpdig on my linux server (LAMP) - great work, guys!!! thanks for it.
no probs with the most sites, i will search for.
but, only one won't: www.junfermann.de (http://www.junfermann.de) is the site, generated with INTERSHOP, and that's my problem:
all pages where contained, are WITHOUT any "pages". it's true, e.g. one page is : http://www.junfermann.de/cgi-bin/junfermann.storefront/DE/Catalog/1011/ (http://www.junfermann.de/cgi-bin/junfermann.storefront/DE/Catalog/1011)
Q: how can i index this site? :bang:

thanks a lot for help:
Ingo

Charter
03-24-2004, 04:15 PM
Hi. First, download the ZIP file in this (http://www.phpdig.net/showthread.php?threadid=573) thread and replace robot_functions.php with the one in the ZIP file.

Next, in the new robot_functions.php file, search for "<frame" (without the quotes) and on this line add in a [[:blank:]]* so that:

while (blah blah *content=['\"][0-9]+;url blah blah) {

becomes the following:

while (blah blah *content=['\"][0-9]+;[[:blank:]]*url blah blah) {

Last, the server for that site does not return a content-type for certain pages. You would need to force the content-type for these pages so, in the new robot_functions.php file, search for:

if (!eregi('[a-z0-9]+',$answer)) {

and right before that line add:

// THIS CODE IS ONLY FOR WHEN CONTENT-TYPE IS NOT RETURNED
// IT IS NOT FOR GENERAL INCLUSION IN THE CORE PHPDIG CODE
elseif (!eregi("Content-Type: *([a-z]+)/([a-z.-]+)",$answer,$regs)) {
$status = 'HTML'; // no content-type so set to html
}

Remember to remove any "word" wrapping in the above code.

comko
03-25-2004, 12:33 AM
Yikes, it works!!
thanks for geat work!

Ingo

malieut
03-30-2004, 07:09 AM
hello Charter,
I followed your suggestion,but got this screen as the following when i click the dig this button.

------------------------------
Spidering in progress...

------------------------------
Nothing happened else, the spidering is end.

Charter
03-30-2004, 09:22 AM
Hi. Recheck the mods and make sure to refresh the admin/index.php page before indexing. Only apply the last bit of code if a content type is not returned, which is generally not the case.