PDA

View Full Version : Title of the results - how to change from <phpdig:page_link/>


bforsyth
07-11-2004, 06:30 AM
Hi - I have installed the app and have indexed the site. The one thing that I would like to be able to change is the the text that is held in the <phpdig:page_link/>. Currently it diplays the relative URL to the resourve that the search returns. EG:


25. [43.30 %] ?page=issueView&issueid=62

Strangely, if the search term is on the home (index) page, then the title tage of the page is displayed in its place.

I would like for each of the returned results to display the contents of the title tag. Any ideas why this is the case?

All of the pages are generated by dynamically including another page in the index page - > eg: www.mysite.com/index.php?page=articleView etc. however the Title tags for each page are dynamically generated, so they are always different.

Many thanks in advance

Ben

Charter
07-11-2004, 07:03 AM
Hi. In robot_functions.php titles are found with this bit of code:

//extracts title
if ( eregi("<title *>([^<>]*)</title *>",$text,$regs) ) {
$title = $regs[1];
}
else {
$title = "";
}

Also in robot_functions.php titles are set with this bit of code:

//set the title in order <title>, filename, or unknown
if (isset($doc_title) && $doc_title) {
$titre_resume = $doc_title;
}
elseif (isset($file) && $file) {
$titre_resume = $file;
}
else {
$titre_resume = "Untitled";
}

Perhaps check your dynamic titles and see if they are found.

bforsyth
07-14-2004, 09:04 PM
The strangest thing was happening. On inspection of the first_words field of the digSpider table - I found that the spider wasn't actuallly crawling each page. It took ages to find out why. To make the pages W3C compliant, i was writing the dynamic URL's like:

index.php?page=articleView& amp; articleId=336 - now a browser knows to render & amp; as & , but the spider does not - so it was just getting my built in 'Page cannot be found error' for every article.

Charter
07-15-2004, 08:15 AM
Hi. What version of PhpDig are you using?

bforsyth
07-15-2004, 09:08 AM
Using 1.8.0

Charter
07-15-2004, 09:12 AM
Do you still have a page using & amp ; so I can test on it?

bforsyth
07-15-2004, 09:21 AM
Sure thing - I have reverted the code back for you:

See:
http://cgasson.truth.posiweb.net/new/index.php?page=issueView&issue=current

All of the links on that page have & amp; between the GET parameters.

I will probably revert back to the working version in 24 hours or so.

Charter
07-15-2004, 10:11 AM
Thanks, test over... This & amp ; issue was fixed as of version 1.8.1, I believe that's the version. Using version 1.8.3, when indexing http://cgasson.truth.posiweb.net/new/index.php with LIMIT_TO_DIRECTORY to true, search depth one, and links per twenty, the output and table content follow (note it finds a max of [search depth * links per + 1] links all within the new/ directory):

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://cgasson.truth.posiweb.net/
Exclude paths :
- @NONE@
1:http://cgasson.truth.posiweb.net/new/index.php
(time : 00:00:08)
+ + + + + + + + + + + + + + + + + + + +
level 1...
2:http://cgasson.truth.posiweb.net/new/index.php?page=eventView
(time : 00:00:28)

3:http://cgasson.truth.posiweb.net/new/index.php?page=reportSelect
(time : 00:00:35)

4:http://cgasson.truth.posiweb.net/new/index.php?page=freeTrial
(time : 00:00:43)

5:http://cgasson.truth.posiweb.net/new/index.php?page=subscribe
(time : 00:00:49)

6:http://cgasson.truth.posiweb.net/new/index.php?page=projects
(time : 00:00:56)

7:http://cgasson.truth.posiweb.net/new/index.php?page=archiveView
(time : 00:01:02)

8:http://cgasson.truth.posiweb.net/new/index.php?page=issueView&issue=current
(time : 00:01:09)

9:http://cgasson.truth.posiweb.net/new/index.php?page=about
(time : 00:01:16)

10:http://cgasson.truth.posiweb.net/new/index.php?page=advertise
(time : 00:01:22)

11:http://cgasson.truth.posiweb.net/new/index.php?page=press
(time : 00:01:29)

12:http://cgasson.truth.posiweb.net/new/index.php?page=links
(time : 00:01:35)

13:http://cgasson.truth.posiweb.net/new/index.php?page=articleSearch
(time : 00:01:44)

14:http://cgasson.truth.posiweb.net/new/index.php?page=articleView&articleId=336
(time : 00:01:50)

15:http://cgasson.truth.posiweb.net/new/index.php?page=articleView&articleId=349
(time : 00:01:57)

16:http://cgasson.truth.posiweb.net/new/index.php?page=articleView&articleId=332
(time : 00:02:03)

17:http://cgasson.truth.posiweb.net/new/index.php?page=userPwdReminder
(time : 00:02:10)

18:http://cgasson.truth.posiweb.net/new/index.php?page=contact
(time : 00:02:16)

19:http://cgasson.truth.posiweb.net/new/index.php?page=privacy
(time : 00:02:23)

20:http://cgasson.truth.posiweb.net/new/index.php?page=terms
(time : 00:02:31)

21:http://cgasson.truth.posiweb.net/new/index.php?page=copyright
(time : 00:02:37)

No link in temporary table

--------------------------------------------------------------------------------

links found : 21
http://cgasson.truth.posiweb.net/new/index.php
http://cgasson.truth.posiweb.net/new/index.php?page=eventView
http://cgasson.truth.posiweb.net/new/index.php?page=reportSelect
http://cgasson.truth.posiweb.net/new/index.php?page=freeTrial
http://cgasson.truth.posiweb.net/new/index.php?page=subscribe
http://cgasson.truth.posiweb.net/new/index.php?page=projects
http://cgasson.truth.posiweb.net/new/index.php?page=archiveView
http://cgasson.truth.posiweb.net/new/index.php?page=issueView&issue=current
http://cgasson.truth.posiweb.net/new/index.php?page=about
http://cgasson.truth.posiweb.net/new/index.php?page=advertise
http://cgasson.truth.posiweb.net/new/index.php?page=press
http://cgasson.truth.posiweb.net/new/index.php?page=links
http://cgasson.truth.posiweb.net/new/index.php?page=articleSearch
http://cgasson.truth.posiweb.net/new/index.php?page=articleView&articleId=336
http://cgasson.truth.posiweb.net/new/index.php?page=articleView&articleId=349
http://cgasson.truth.posiweb.net/new/index.php?page=articleView&articleId=332
http://cgasson.truth.posiweb.net/new/index.php?page=userPwdReminder
http://cgasson.truth.posiweb.net/new/index.php?page=contact
http://cgasson.truth.posiweb.net/new/index.php?page=privacy
http://cgasson.truth.posiweb.net/new/index.php?page=terms
http://cgasson.truth.posiweb.net/new/index.php?page=copyright
Optimizing tables...
Indexing complete !

Table content:

+-----------------+------------------------------------------+
| path | file |
+-----------------+------------------------------------------+
| new/ | index.php |
| new/ | index.php?page=eventView |
| new/ | index.php?page=reportSelect |
| new/ | index.php?page=freeTrial |
| new/ | index.php?page=subscribe |
| new/ | index.php?page=projects |
| new/ | index.php?page=archiveView |
| new/ | index.php?page=issueView&issue=current |
| new/ | index.php?page=about |
| new/ | index.php?page=advertise |
| new/ | index.php?page=press |
| new/ | index.php?page=links |
| new/ | index.php?page=articleSearch |
| new/ | index.php?page=articleView&articleId=336 |
| new/ | index.php?page=articleView&articleId=349 |
| new/ | index.php?page=articleView&articleId=332 |
| new/ | index.php?page=userPwdReminder |
| new/ | index.php?page=contact |
| new/ | index.php?page=privacy |
| new/ | index.php?page=terms |
| new/ | index.php?page=copyright |
+-----------------+------------------------------------------+

Charter
07-15-2004, 10:48 AM
The test did bring about another issue though... blank titles in the search results.

In robot_functions.php find:

//extracts title
if ( eregi("<title *>([^<>]*)</title *>",$text,$regs) ) {
$title = $regs[1];
}
else {
$title = "";
}

and replace with:

//extracts title
if ( eregi("<title *>([^<>]*)</title *>",$text,$regs) ) {
$title = trim($regs[1]);
}
else {
$title = "";
}

bforsyth
07-15-2004, 03:08 PM
Hey Charter - I have to hand it to you, your enthusiasm for this product is amazing - just a re-affirmation of everything I love about the open source community.

To hit the code that is producing the & amp; problem, You would have had to set your search depthe to 3 or 4. This will index about 370 links. I will leave the offending code up for another day or so if you want to try and replicate the problem that I was having.

Thanks for the heads up on the empty title tags - I had only produced the code to dynamically generate the < title > for the article pages. The rest will be done shortly.

Charter
07-15-2004, 05:05 PM
Hi. The page at http://cgasson.truth.posiweb.net/new/index.php has the following link in it:

<a href="index.php?page=issueView&amp;amp;issue=current">Current Issue</a>

This link was followed, the content indexed, and the link stored in the database table as:

http://cgasson.truth.posiweb.net/new/index.php?page=issueView&issue=current

Upgrade and the problem should go away. ;)

bforsyth
07-15-2004, 08:43 PM
Hey Charter - thanks. I will have a go at upgrading to the 1.8.3 version and test .

Incidentally, I was looking at the code change that you posted to deal with untitled documents (3 posts up ^). The way that it seems to work now is that if there is no title, then the URL is displayed in place of the title in the search results. :

//extracts title
extracts title
if ( eregi("<title *>([^<>]*)</title *>",$text,$regs) ) {
$title = trim($regs[1]);
}
else {
$title = "";
}


If I were to change this to:

//extracts title
extracts title
if ( eregi("<title *>([^<>]*)</title *>",$text,$regs) ) {
$title = trim($regs[1]);
}
else {
$title = "Untitled";
}

would this display "untitled" as the title in the search results? (only asking because I am away from my computer and can't test it!

Thanks again for all of your support.

Charter
07-15-2004, 08:53 PM
//extracts title
if ( eregi("<title *>([^<>]*)</title *>",$text,$regs) ) {
$title = trim($regs[1]);
}
else {
$title = "Untitled";
}
if (!($title)) { $title = "Untitled"; } // account for regex match