.TEXT Search – How To

I’ve just completed an “advanced searching” module for .TEXT

Just before Christmas I had added an elementary searching feature to my blogs that
simply looked for a substring in the title and body of blog posts. Over
the holiday I found time to enhance search to be a little more
intelligent, so as to ignore text inside HTML tags. I cover the
implementation specifics below….

I should start by indicating that my advanced search feature is
somewhat involved. I wanted to author a solution that would perform
most of the searching in SQL Server but still tie in nicely with the
.TEXT framework. I achieved this by writing a SQL server extended
stored procedure in C++ that called a C# in-process COM engine to
perform the searching work.

The search source code can be downloaded here, and the code changes to .TEXT are illustrated below.


RobGarrett.Com.BlogSearch.dll – Extended Stored Procedure
.

Anyone who tells you that Extended Stored Procedures has written a few of them before. ESP’s are typically written in
native C++ and can be fiddly to
get right. Debugging them can be a real pain
because SQL Server retains
a handle to ESP DLLs when loaded and will not let go of them until the
server
is stopped, which means pulling down SQL Server for each new compile
step after debugging. Fortunately, I had written an ESP before and
knew what I was doing, so I installed a local copy of SQL Server, complete with .TEXT blog
database, which I could tear down as often as I liked.

Extended Stored Procedures work much like regular stored procedures,
they usually involve a query to data tables in the database and return or
update a set of rows before completing. ESP’s are stored and executed as
DLLs, whereas regular stored procedures are stored and compiled with
the database. In this case, I wanted an ESP that would return me a list
of post identifiers from the blog_Content table, which matched
a passed search term. My ESP is passed the blog identifier
(each hosted blog in .TEXT has a unique identifier) along with the
search term.

Once passed a valid search term and blog ID, my ESP will search the
database for all posts containing the search term in the title or text
body. Of those posts returned, the ESP will search the title and body
looking for the search term in text that is not contained in tags
braces
(‘<‘ and ‘>’) – which is where the C# in-process COM DLL comes
in. When my algorithm is satisfied that the post contains the search
term in viewable text it returns the ID of the post as a data row,
otherwise it skips the post and moves on to the next.

I could have written the entire search operation in the ESP, but chose to call
out to a C# in-proc COM server to perform the physical database query and text searching, for a couple of  reasons:

* Since all
ESP code runs in the process of SQL Server, any bad code will cause the
server to shutdown – rendering my blogs off line. It’s very easy to
write unintentional bugs in C++ and to do bad things with memory pointers. C# provides more of a safety
net, the framework takes better care of memory management.

* Executing SQL queries
against the database in C++ requires using DBLIB (if not using a third
party DAL or ODBC library dependency), which can make code complicated and prone to
crash, when done wrong. C# provides much nicer access to the database through ADO.NET.

Calling out to a C# assembly required some work – I provided a COM
interface wrapper around the assembly using .NET COM Interop and then
called the assembly using IDispatch (more on calling C# from COM in a
later post).

RobGarrett.Com.BlogSearchCom.dll – In-Proc COM DLL.

The code in this DLL was real easy. I used ADO.NET to connect with SQL
Server and execute a stored procedure to return the title, body
and ID of blog posts that contained a search term (anywhere in the
post). I filter each post returned from the executed stored procedure,
based on whether the search term was found in
side HTML tag braces of the title or body text or not. If the search
term exists my algorithm appends the post ID to a comma separated list,
which is returned to the ESP and converted to a set of row data results.

In addition to the search filtering code, I also included code to self
register the assembly as a COM component in the registry, add  the
assembly to the GAC and create a type library. This code is executed
when when passing the assembly
path as an argument to InstallUtil.exe.

* After compiling both BlogSearch and BlogSearchCom DLLs, copy them to the Binn directory of the SQL Server installation.
* Run InstallUtil.exe RobGarrett.Com.BlogSearchCom.dll
* Add a new extended stored procedure to the master database, called
xp_BlogSearch, with path to RobGarrett.Com.BlogSearch.dll in the SQL
Server Binn directory.
* Give the database user running .TEXT, execute permission on the ESP just created.

The in-proc DLL requires a registry entry to access to the .TEXT database:

* Create the registry entry: HKEY_LOCAL_MACHINESOFTWARERobGarrett.ComDotTextWeb.
* Add the string value ConnectionStr to the above key. Set the value of
the connection string to the same as that used by .TEXT, in the
Web.config file.

blog_SimpleSearch – Called from In-Proc DLL
.

This stored procedure searches the posts and comments for entries
containing the search term anywhere in the text body or title. This SP
also searches comments and returns the ID of the original post. Add it
to the .TEXT database by executing the following SQL:

CREATE PROC dbo.blog_SimpleSearch

(

 @SearchTerm NVARCHAR(50),

 @BlogID INT

)

AS

 

CREATE TABLE #Results (ID INT)

INSERT INTO #Results

SELECT blog_Content.ID FROM blog_Content

WHERE blog_Content.PostType=1 AND blog_Content.BlogID = @BlogID AND blog_Content.PostConfig & 1 = 1

      AND (blog_Content.Title LIKE ‘%’ + @SearchTerm + ‘%’ OR

           blog_Content.Text LIKE ‘%’ + @SearchTerm + ‘%’)

INSERT INTO #Results

SELECT blog_Content.ParentID FROM blog_Content

WHERE blog_Content.PostType=3 AND blog_Content.BlogID = @BlogID AND blog_Content.PostConfig & 1 = 1

      AND (blog_Content.Title LIKE ‘%’ + @SearchTerm + ‘%’ OR

           blog_Content.Text LIKE ‘%’ + @SearchTerm + ‘%’) AND blog_Content.ParentID NOT IN

(SELECT
DISTINCT
ID FROM #Results)

 

SELECT blog_Content.ID,
blog_Content.Title, blog_Content.Text FROM

#Results, blog_Content where #Results.ID = blog_Content.ID

ORDER BY blog_Content.DateAdded DESC

 

DROP TABLE #Results


blog_Search – Called from .TEXT
.

This stored procedure calls the ESP and is called from .TEXT, add it to the .TEXT database by executing the following SQL:

CREATE PROC dbo.blog_Search
— ”,0

(

      @SearchTerm NVARCHAR(50),

      @BlogID INT

)

AS

 

CREATE TABLE #results (ID INT)

INSERT INTO #results (ID)

EXEC master..xp_BlogSearch
@SearchTerm, @BlogID

 

SELECT blog_Content.BlogID,
blog_Content.[ID], blog_Content.Title, blog_Content.DateAdded,
blog_Content.[Text], blog_Content.[Description],

blog_Content.SourceUrl,
blog_Content.PostType, blog_Content.Author, blog_Content.Email,
blog_Content.SourceName, blog_Content.DateUpdated, blog_Content.TitleUrl,

blog_Content.FeedBackCount,
blog_Content.ParentID, blog_Content.PostConfig,

blog_Content.EntryName FROM blog_Content, #results

WHERE blog_Content.PostType=1 AND blog_Content.ID = #results.ID AND blog_Content.PostConfig & 1 = 1

AND blog_Content.BlogID =
@BlogID

ORDER BY blog_Content.DateAdded DESC

 

DROP TABLE #results

Changes to .TEXT.

The following changes are required to be made to the .TEXT source code, I am using version  0.95.2004.102 as my baseline.

Add the following code to the Cacher class in Dottext.CommonDataCacher.cs:

private static readonly string EntrySearchKey =
“EntrySearch:Terms{0}Blog{1}”;
public static EntryCollection GetSearch(string searchTerms, CacheTime ct, HttpContext
context)
{
    string key = string.Format(EntrySearchKey,searchTerms.Replace(”
“, “_”),BlogID(context));
    EntryCollection search = null;
    try
    {
        search = (EntryCollection)context.Cache[key];
    }
    catch
(InvalidCastException)
    {
        // Not sure why
it’s doing this, but what the hell.

        search = null;
    }
    if(search == null)
    {
        search =
Entries.GetPostsBySearch(searchTerms);
        if(search != null)
        {
            Cacher.CacherCache(key,context,searchTerms,ct);
        }
    }
    return search;
}

Add the following code to DataDTOProvider class in Dottext.FrameworkDataDataDTOProvider.cs:

public EntryCollection
GetEntriesBySearch(string searchTerms)
{
    IDataReader reader = DbProvider.Instance().GetPostsBySearch(searchTerms);
    try
    {
        EntryCollection ec =
DataHelper.LoadEntryCollection(reader);
        return ec;
    }
    finally   
    {
        reader.Close();
    }
}

Add the following code to the IDbProvider interface in Dottext.FrameworkDataIDbProvider.cs:

IDataReader
GetPostsBySearch(string searchTerms);

Add the following code to the IDTOProvider interface in Dottext.FrameworkDataIDTOProvider.cs:

EntryCollection
GetEntriesBySearch(string searchTerms);

Add the following code to the SqlDataProvider class in Dottext.FrameworkDataSqlDataProvider.cs:

public IDataReader
GetPostsBySearch(string searchTerms)
{
    SqlParameter[] p =
    {
         
SqlHelper.MakeInParam(“@SearchTerm”,SqlDbType.NVarChar,50,searchTerms),BlogIDParam};
          return GetReader(“blog_Search”,p);   

        }
}

Add the following code to the Entries class in Dottext.FrameworkEntries.cs::

public static
EntryCollection GetPostsBySearch(string
searchTerms)
{
    return
DTOProvider.Instance().GetEntriesBySearch(searchTerms);
}

Create a Search.ascx file in each of the skin directories in DottextWebSkins, with the following code:

<%@ Control Language=”c#”
AutoEventWireup=”false”
Inherits=”Dottext.Web.UI.Controls.Search” %>
<%@ Register TagPrefix=”uc1″
TagName=”EntryList” Src=”EntryList.ascx” %>

Create Search.cs in DottextWebUIControls with the following code:

namespace Dottext.Web.UI.Controls
{
      using System;
      using Dottext.Common.Data;

      ///
     
/// Search Control.
     
///
     
public  class Search
: Dottext.Web.UI.Controls.BaseControl
      {          
            protected
Dottext.Web.UI.Controls.EntryList Results; 
            protected override void
OnLoad(EventArgs e)
            {
                  base.OnLoad (e);
                
                  string searchTerms
= Request[“q”];
                  if (null != searchTerms &&
searchTerms.Trim().Length > 0)
                  {
                       
Results.EntryListItems =
Cacher.GetSearch(searchTerms,CacheTime.Short,Context);
                       
Results.EntryListTitle = string.Format(“Search Results for {0} “,
searchTerms.Trim());
                       
Dottext.Web.UI.Globals.SetTitle(“Blog
Search Results”, Context);
                  }
            }
      }

Add the following code to the UIText class in DottextWebUIUIText.cs:

public static
string Search
{
    get
   
{
        return
GetSafeConfig(“Search”,”Search”);
    }
}

Add the following line to the Web.config file in the HttpConfigurationHttpHandlers section:

<
HttpHandler pattern = “^(?:/w+/(w|s|.)+/search.aspx)$” controls=”Search.ascx” />

* Recompile the complete .TEXT solution.
* Replace the installation on your server with the binaries and content files from this new hybrid version.

Make the following changes to the CSS and
New/Announcements options in your blog admin. This is a little naughty,
but it works, and it’s easy to reconfigure the search box without
making code future code changes.

CSS:

.BlogSearch
{
    border-style: solid;
    border-width: 1px;
    border-color: #1649B0;
    width: 100px;
}

News/Announcements:
(Change the URL in
the redirect function to reflect the path in your blog – search.aspx
will not exists, an Http handler kicks in when this URL is requested).

<H3>Search</H3>

<table cellpadding=”5″ cellspacing=”2″ width=”100%” ID=”Table1″>

<tr>

<td align=”center” valign=”middle”>

<input class=”BlogSearch” type=”text” name=”searchBox” value=”” ID=”searchBox” onkeypress=”return KeyPress(event);” maxlength=”50″> 

<input type=”button” value=”Go” onclick=”blogSearch();” ID=”Button1″ NAME=”Button1″></td></tr></table>

<script type=”text/javascript”>

function KeyPress(evt)

{

var keyCode;

if (evt)

keyCode = evt.keyCode ? evt.keyCode : evt.which;

else if (window.event)

keyCode = window.event.keyCode;

if (keyCode == 13)

{ blogSearch(); return false; }

else

return true;

}

function blogSearch()

{

var input = document.getElementById(‘searchBox’);

redirect(input.value);

}

function redirect(terms)

{

if (Trim(terms).length > 0)

window.location = ‘/blogs/rant/search.aspx?q=’ + terms;

}

function Trim(s)

{

// Remove leading spaces and carriage returns

while ((s.substring(0,1) == ‘ ‘) || (s.substring(0,1) == ‘n’) || (s.substring(0,1) == ‘r’))

{

s = s.substring(1,s.length);

}

// Remove trailing spaces and carriage returns

while ((s.substring(s.length-1,s.length) == ‘ ‘) || (s.substring(s.length-1,s.length) == ‘n’) || (s.substring(s.length-1,s.length) == ‘r’))

{

s = s.substring(0,s.length-1);

}

return s;

}

</script>

——-

That about covers it, this solution is not
the most elegant, but it works. Feel free to drop me a comment or email
me with questions or better suggestions.

2 thoughts on “.TEXT Search – How To

Comments are closed.