PDA

View Full Version : converted from html pages to php pages now no pages will index!!! help!!


bigals
04-01-2004, 02:17 AM
I have recently converted all my html pages into php pages and now php dig will not index any of them at all!

the pages are extremely important and need indexing so how do i sort this out, also the pages are only little bits of code that link to a template page which is then populated with data, so phpdig doesn't seem to be able to spider these pages now, can anyone explain a way round this???

cheers,

Alex.

Charter
04-01-2004, 02:34 AM
Hi. What is the code from one of these PHP files? Does it have a header redirect? If so, try the ZIP file in this (http://www.phpdig.net/showthread.php?threadid=573) thread.

bigals
04-01-2004, 02:37 AM
the code from the files are as follows:

<?php
// Strip the path from the current script location
$path = dirname($_SERVER['PHP_SELF']);
// Explode out the directors from the path
$dirs = explode("/", $path);
$numdirs = count($dirs) - 1;
// Directory closest to the php page
$region = $dirs[$numdirs];
// Directory before dir1
$country = $dirs[$numdirs - 1];
// Set Status

$url="http://www.mysite.com/templates/region_template.php";
$url.="?region=".urlencode($region)."&country=".urlencode($country);
$file_output=file_get_contents($url);
echo $file_output;
?>

thats all that is in each index.php file, so how can these be indexed?

I'm not very knowledgable so could you possibly explain?

thanks.

Charter
04-01-2004, 03:12 AM
Hi. Are there links to these PHP files?

bigals
04-01-2004, 03:20 AM
the links to the pages are generated by the pages themselves...

its a directory of the UK

ie. an index page placed in a county folder will create the links to all the towns/cities within that, these links would be somethhing like

leicestershire/leicester/index.php

so the links only exist after the php page has been compiled, i think i read somewhere that phpdig compiles all php then spiders it afterwards.

the index.php pages become html in content but only when compiled

hope that helps explain it!

Charter
04-01-2004, 03:26 AM
Hi. I mean when you spider, are you spidering a page that has links to these PHP files, like a directory listing?

BTW, PhpDig doesn't compile PHP; it's compiled server-side. PhpDig checks and reads what is output from the server. ;)

bigals
04-01-2004, 03:33 AM
no i'm just spidering the main folder, i.e.

http://www.mysite.com/database/world/uk/

the first page it will find will be a index.php page this page will display the countries within the uk its laid out like so:

world/
--------uk/index.php
--------uk/england/index.php
--------uk/england/west_midlands/index.php

Charter
04-01-2004, 03:48 AM
Hi. So does the http://www.mysite.com/database/world/uk/england/west_midlands/index.php page call up the http://www.mysite.com/templates/region_template.php?region=west_midlands&country=england page? If so, what do you get when you uncomment //print $answer."<br>\n"; from the robot_functions.php file and then index?

bigals
04-01-2004, 04:03 AM
yes that is what happens, bang on...

i tried uncommenting that line and it indexed just the main html pages as before, missing the entire database folder out (as this only consists of these index.php file and folders)

it was returning strange info like this:

5:http://www.mysite.com/features/featurepage.html
(time : 00:00:17)
HTTP/1.1 404 Not Found
HTTP/1.1 200 OK
Date: Thu, 01 Apr 2004 12:56:48 GMT
Server: Apache/1.3.20 Sun Cobalt (Unix) mod_jk mod_ssl/2.8.4 OpenSSL/0.9.6 PHP/4.3.0 FrontPage/5.0.2.2510 mod_auth_pam_external/0.1 mod_perl/1.26
Last-Modified: Mon, 29 Mar 2004 11:21:42 GMT
ETag: "180433c-2156-406806c6"
Accept-Ranges: bytes
Content-Length: 8534
Content-Type: text/css
HTTP/1.1 404 Not Found
HTTP/1.1 200 OK
Date: Thu, 01 Apr 2004 12:56:48 GMT
Server: Apache/1.3.20 Sun Cobalt (Unix) mod_jk mod_ssl/2.8.4 OpenSSL/0.9.6 PHP/4.3.0 FrontPage/5.0.2.2510 mod_auth_pam_external/0.1 mod_perl/1.26
Last-Modified: Mon, 29 Mar 2004 11:21:42 GMT
ETag: "180433c-2156-406806c6"
Accept-Ranges: bytes
Content-Length: 8534
Content-Type: text/css
HTTP/1.1 404 Not Found
HTTP/1.1 404 Not Found
HTTP/1.1 404 Not Found
HTTP/1.1 404 Not Found
HTTP/1.1 404 Not Found
HTTP/1.1 404 Not Found

each time saying HTTP/1.1 404 Not Found afew times at the end of each of these blocks, as you can see above.

i've commented that line back to the way it was now.

Charter
04-01-2004, 04:16 AM
Hi. The 404 means PhpDig is not finding the pages. Are you using a base href tag? If so, there is some code in this (http://www.phpdig.net/showthread.php?threadid=364) thread to account for base href tags.

bigals
04-01-2004, 04:24 AM
no i'm not using base h ref i don't think, i'm not sure what that means exactly, but if a search for <BASE HREF in my template pages nothing is returned so that isn't in any of my pages.

argh this is getting confusing!!

Charter
04-01-2004, 04:43 AM
Hi. It seems that there may be a mislink somewhere in the new PHP code, maybe dealing with the $_SERVER['PHP_SELF'] variable. What do you get onscreen when you try the following?

In robot_functions.php right after:

//print $answer."<br>\n";

stick the following:

echo "Page: ".$host.$path."<br>\n";

and see what pages are generating the 404s on index.

bigals
04-01-2004, 05:18 AM
i get all of this stuff happening: that double// looks a bit suspicious, and then it goes back to one /

Page: www.mysite.com//
Page: www.mysite.com//
Page: www.mysite.com//
Page: www.mysite.com//
Page: www.mysite.com//
Page: www.mysite.com//
Page: www.mysite.com//
Page: www.mysite.com//
Page: www.mysite.com//
Page: www.mysite.com/
Page: www.mysite.com/
Page: www.mysite.com/
Page: www.mysite.com/
Page: www.mysite.com/
Page: www.mysite.com/
Page: www.mysite.com/
Page: www.mysite.com/
Page: www.mysite.com/
+ Page: www.mysite.com/database/world//
Page: www.mysite.com/database/world//
Page: www.mysite.com/database/world//
Page: www.mysite.com/database/world//

Charter
04-01-2004, 05:40 AM
Hi. The double slash is okay. It's removed when it needs to be removed. Maybe the thing to notice is that none of the pages have things like uk/england/west_midlands/index.php in them. Without actually seeing/testing your site, I doubt that I can get this narrowed down.

bigals
04-01-2004, 05:49 AM
ok, well heres one of the pages that an index.php page is replaced with:

this might be more help to you, as you can then see how things are accessed by my pages n stuff, all the templates are the same in dynamics...

i hope this can help!!!! :)

see attachment...its a php file

Charter
04-01-2004, 05:57 AM
Hi. All the index.php files are that same short PHP file you posted previously? The URLS you want to index are the ones listed in the attached file between the option tags like below?

<option value="http://www.mysite.com/database/world/uk/england/south_west/avon/index.php">Avon</option>

There is no other place that has these URLs other than the attached file?

bigals
04-01-2004, 06:20 AM
the options tags are for a drop down county list, like a quick way of jumping to a county, the the addressses need to be provided there, but they are only addresses for the counties, not all the other pages (towns, cities, regions, etc.)

and yes the short php file is the same for all index.php's just the template name changes depending on where the file resides in the site.

the addresses i want to link to are generated down on line 334 ish i think

Charter
04-01-2004, 06:43 AM
Hi. The index.php file calls up the template, but what page calls up the index.php file besides things generated in the template?

bigals
04-01-2004, 06:49 AM
nothing, all the pages are generated by these index pages, the folder names that the index.php filesreside in are used to provide location information and are relayed to a locations table to get the root of the said county, region or town you see.

the only place links appear are in the county drop down list:

<option value="http://www.mysite.com/database/world/uk/england/south_west/avon/index.php">Avon</option>

but these are just for the counties only,

EVERYTHING IS DYNAMIC, i think i went quite overboard with the dynamicness didn't i lol!

Charter
04-01-2004, 07:06 AM
Hi. What do you get when you crawl the full http://www.mysite.com/database/world/uk/england/south_west/avon/ link? My guess is that content will be found. If so, you need to make a list of location URLs to crawl, as PhpDig cannot find the location URLs because there are no links to them.

bigals
04-01-2004, 07:32 AM
yeah that worked, it indexes a '-', this is the folder index

and then the results are available in the search engine,

is there no other way of it generating these links, without my creating them, if i create them then i'll need to keep updating my location list whenever I add a new town!!!

that could be a potentially huge mistake generator on my behalf!!!, making it not as dynamic as a thought,

is there no way round this?

bigals
04-01-2004, 07:42 AM
i am grabbing all my data from a big mysql table like so:



kingdom | country | region | county | town
uk england east_midlands leicestershire leicester


i can save this as an excel table and then is it possibe to put slashs at the end of each entry and put in a new table column called root on the left containing http://www.mysite.com/database/world


that would then create a line of :

http://www.mysite.com/database/world/uk/england/east_midlands/leicestershire/leicester/

for each location...

do you know how to do that in excel, or is it not possible?

Charter
04-01-2004, 07:44 AM
Hi. PhpDig follows links: no links, no index. ;)

Wouldn't the info in the database tables also need updating when adding a new town, so why not query the tables to generate a list of links?

bigals
04-01-2004, 07:51 AM
yeah thats what i just posted above, do you know how to do that!!!!!!?????

that would be great!

bigals
04-01-2004, 09:34 AM
Hey man! something mad has happened!

I made a html document, just a big list of these:

<a href="http://www.mysite.com/database/world/uk/england/east_anglia/cambridgeshire/"</a>

for all the counties, regions countries and one for the uk....

and now when I spider it also spiders all the town folders!

The php is being created and spidered, its like its had a kick up the arse and a map as to where to go!!!

HAHAHA fantastic! Any idea why this has happened?

cheers for all the help, much appreciated,
hopefully this will be a trick that other people can use!

Alex.