Pre-Search Facets in MOSS 2007

Search facets offer a powerful entry point into data exploration, especially in cases where data is categorized or tagged effectively.  With modern search and content retrieval mechanisms, such as the search engine in MOSS (and new FAST search engine for SharePoint) the traditional method of browsing for content in SharePoint using static navigation hierarchies can now take on a whole new approach.

The new thinking behind content storage is to put all artifacts in one large bucket, tag the artifacts, and then leverage search and facets to surface relevant content.  Think about how Google revolutionized mail by discarding the folder structure approach in favor of a search and label paradigm.

Anyone who has played with Faceted searching in SharePoint probably knows that, aside of the commercial tools like BA Insight, the only real free option is to use the Codeplex Faceted Additions.

fs3.png

The Codeplex offering assumes “post” search faceting, in that the web parts determine the relevant facet headings and count based on the current executed result set.  This approach makes good for filtering search results and allowing users to drill down on with restricted queries, but what about pre-facet browsing, similar to the functionality on sites like Best Buy?

Here is the problem – to provide the user with a dynamic tree view if hierarchical data based on facet categorization, the hierarchy generation method needs to know about all potential facet values ahead of time.  Take the following example:

An organization tags all their documents with a document type and department.  Let’s assume we wanted to provide a dynamic list of departments, which the user could choose, and then a list of document types available for the selected department.  After selecting the document type we’d like the user to see all documents of the selected type that sourced from the selected department.

Aside of issuing a general search, and then filtering the result set by department and document type, the Codeplex faceted search web parts do not appear to offer a mechanism to provide dynamic table-of-content like behavior.

So I got to thinking – search facets in MOSS are no more than managed properties that exist in the search index.  Surely the object model must enable me a way to query distinct values of a given managed property?  It turns out that you can query the search API for this information, and with a little code magic you can obtain the results desired:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using Microsoft.Office.Server.Search;
using Microsoft.SharePoint;
using Microsoft.Office.Server.Search.Query;

namespace SearchFacets
{
    /// <summary>
    /// Faceted search querying.
    /// </summary>
    class Program
    {
        static readonly string SRCURL = "http://server/";

        /// <summary>
        /// Entry point.
        /// </summary>
        /// <param name="args"></param>
        static void Main(string[] args)
        {
            using (SPSite srcSite = new SPSite(SRCURL))
            {
                string query = "SELECT DocumentType FROM Scope() WHERE "SCOPE" = 'My Scope Documents'";
                string[] values = GetDistinctSearchResults(srcSite, query, 1000);
                foreach (string s in values)
                    Console.WriteLine(s);
            }
        }

        /// <summary>
        /// Get some search results using full text.
        /// </summary>
        /// <param name="context">Site.</param>
        /// <param name="searchQuery">Query.</param>
        /// <param name="searchLimit">Limit results.</param>
        /// <returns>Results.</returns>
        static string[] GetDistinctSearchResults(SPSite context, string searchQuery, int searchLimit)
        {
            using (var fullTextQuery = new FullTextSqlQuery(context))
            {
                fullTextQuery.ResultTypes = ResultType.RelevantResults;
                fullTextQuery.QueryText = searchQuery;
                fullTextQuery.KeywordInclusion = KeywordInclusion.AnyKeyword;
                fullTextQuery.EnableStemming = false;
                fullTextQuery.TrimDuplicates = false;
                fullTextQuery.RowLimit = searchLimit;

                ResultTableCollection resultsCollection = fullTextQuery.Execute();
                ResultTable resultsTable = resultsCollection[ResultType.RelevantResults];
                return ReturnDistinct(resultsTable);
            }
        }

        /// <summary>
        /// Return distinct list.
        /// </summary>
        /// <param name="rtWins">Restult set.</param>
        /// <returns>Distinct values.</returns>
        static string[] ReturnDistinct(ResultTable rtWins)
        {
            DataTable dtWins = null;
            Dictionary<String, int> pairs = new Dictionary<string, int>();
            List<String> lstWins = new List<string>();
            dtWins = new DataTable("dtWINS");
            dtWins.Load(rtWins);

            foreach (DataRow drWin in dtWins.Rows)
            {
                string fieldName = drWin[0].ToString();
                if (pairs.ContainsKey(fieldName))
                    pairs[fieldName]++;
                else
                    pairs.Add(fieldName, 0);
            }

            foreach (KeyValuePair<String, int> pair in pairs)
                lstWins.Add(String.Format("{0} ({1})", pair.Key, pair.Value));
            return lstWins.ToArray();
        }
    }
}

You might be thinking “Hey, you’re just executing a search”, and you’d be right.  Since the facet values (managed property values map to crawled properties) live in the search indexes we have no choice but to perform a search to get at these values.

The key in the above code is to limit the search results returned (1000 in above case) and take advantage of relevancy.  In all likelihood; any search results beyond 1000 hits will not likely produce facet values that map to many results of value to the end user.

Clearly, the above code is just a starting point and has potential for many improvements, such as caching, making use of parent child relationships etc, but you get the idea…

Site Collection Restore Error

If you ever run into the following error when performing a site collection restore, via STSADM, and you know space is not an issue, try the steps below before diving deep into troubleshooting mode:

The site collection could not be restored. If this problem persists, please make sure the content databases are available and have sufficient free space.

  • Stop and start the SharePoint Timer Service
  • IISRESET
  • Restore to a new content database

Balsamiq Mockups makes for easy UI design

We’ve been using Balsamiq Mockups for some time at my day job, but until recently I hadn’t used the tool heavily on any of the projects I’d been working.  Today I needed to shell out an example search results page for a project I am architecting in SharePoint.

Think of Balsamiq Mockups as Visio for the layman – it’s light, easy to use, not cluttered with unnecessary functionality, and runs on Adobe Air.  The presentation is something akin to what you might mockup on a whiteboard in the office and the end result, although a signature of the Balsamiq development team, is crisp and ideal for any document deliverable.

What I like about the tool most is that I was able to complete a mockup, which is functional for discussion purposes and doubles for architecture documentation, and it took me a fraction of the time I’d have spent in Visio – this gave me time to write this blog post.

At a price of $79, the product is a steal for the time it’ll save you.

Check out my finished page mockup:

Global Search Results

Top SharePoint

Check out http://www.topsharepoint.com for the latest and greatest list of sites implemented on the SharePoint platform. 

A number of the sites listed were implemented by my current employer – Portal Solutions – who are also listed because our site is hosted in SharePoint.  A couple of the sites were managed and architected by yours truly 🙂

“TopSharepoint.com showcases some of the world’s best designed SharePoint based web sites from around the world. We carefully select SharePoint web sites based on their usability, design, creativity and ability to incorporate modern techniques. Anyone can submit a SharePoint web site for free and the only requirements we ask for is that your web site is built on SharePoint platform, well designed and original. So, if you developed, or found a SharePoint web site that might meet the above requirements we want to know about it! Please go ahead and submit your website to the TopSharepoint showcase gallery.”

Here’s my offspring, listed in the top 5:

http://www.topsharepoint.com/?s=conservation

Working from the Cloud

My employer uses laptops pretty much exclusively (as I do at home) for all employees, with the exception of a few, to promote flexibility and portability in our work environment.  I made the mistake yesterday of forgetting my laptop and turning up to the office with no computer to work with.  As was debating on turning my car around (I got all the way to the office) and heading back home when I got thinking.  In today’s connected environment, did I really have a dependency on a single computer to work?  The short answer is no.

When I thought about it some more, having no laptop didn’t mean I couldn’t continue my work day as normal.  I spoke nicely to one of our IT people and asked for a temporary laptop for the morning, hooked it up to the network, logged in, and continued working as normal – how?

The answer is in the tools that I use.  Granted, I’ve moved on from localized development and no longer require a host of specialized tools to work, which makes life easier. Also, I’ve always had a healthy paranoia about keeping work files on portable devices that may inadvertently fall in the parking lot and break into a million pieces, so wove redundancy into my personal workspace some time ago, meaning I was already in great shape for using another computer for work. 

With Internet speeds getting faster and online storage becoming cheaper, there is a definite shift in mentality to store files in the cloud.  I realized this about a year ago.  The following is a list of applications and approaches I use to enable portability in the my day-to-day work:

Hosted Virtual Machines

My job involves SharePoint development, so I cannot escape the need for a development environment.  Many of us still develop on Virtual Machine images using portable devices.  Fortunately, my employer saw this as non-scalable solution and set up virtual servers for all out development.  Our IT infrastructure includes backups, and I can access the servers from any location using secure VPN.

Outlook Web Access and Gmail

All my company email sits on an Exchange server, which comes complete with a web client for accessing my email from any web browser.  If I insist on the thick client, Outlook is installed on most of the company laptops and configuration of my account is 5 minute exercise.  I use Gmail for all my personal mail and never have to worry about loosing my email or servers going down.  On the rare occasion that the company Exchange server goes down, I have my personal email to fall back on if I need to (who doesn’t?).

GTD with ClearContext

I use ClearContext to arrange my inbox within Outlook.  CC uses folders within my inbox, so I don’t have to worry about carting around backups of my settings.  If I choose not to install CC on a loaner laptop, I can still work with my email because filed messages live in Exchange folders and I can put aside new inbox email for filing later when I get back to my laptop – left at home.

Evernote

I am never ever caught out talking to a client without notes from previous meetings.  I know a lot of people like to use One Note, but if you use EN your notes are available on the web, phone, or any other computer (Windows and Mac) that you choose to install the application.  My notes synchronize in a few minutes and I’m up and running.

Drop Box

A well thought out product that synchronizes files across computers and in the cloud.  I use this application on all my computers, and the UI is a simple folder on my desktop – I drag all my files to the special drop box folder and have peace of mind that my files are available on all other computers, or via the web interface.

IM

Using both Communicator (corporate) and MSN (personal), I am able to stay in touch with clients, colleagues and friends.  Both applications install in minutes and require no setup for me to get back online.

SharePoint and Colligo Contributor

My work primarily involves SharePoint, so I would be amiss if I didn’t eat my own dog food.  My employer has a nice extranet where I can always access client work in progress, RFP work etc – it’s policy that all work is stored on our extranet.  With Colligo Contributor – an application that works much like Groove, only better – I can keep a cached version of files on any PC, so if I loose network access I can carry on working on a local copy of my files stored in SharePoint.

Pandora

Say what?

A work day in the office is a little dull if I cannot listen to my favorite tunes whilst working.  Using Pandora – an Internet streaming radio service – I can continue listening from any web enabled computer.

X-Lite

X-Lite is a SIP VOIP client, an my employer uses VOIP.  So if I want to take a call from Starbucks, the road (using mobile broadband), abroad, or a client office, it’s no big deal.  The recipient of my call doesn’t know I’m not calling from the office.

So… Flexibility in a nutshell.  If you’ve not done so already, it’s time to cut the chord from your working computer and get into a portable mentality.  You’ll need support from your employer (something to consider asking in your next job interview), but if you can convince them and it’ll make you more productive – it’s worth any overhead.

SharePoint 2010 Posts Coming

image

It’s been quiet on my blog for some time, the usual excuses blah blah.  The good news is that I intend to post plenty about SharePoint Server 2010, which was recently released to partners as a private beta. 

I am currently honoring my NDA agreement with Microsoft, so no posts until the beta goes public later in the year, but that’s not to say I won’t be busy writing in the background ready for the announcement date.

Watch my blog for more details…

Ctrl-F Crash in Visual Studio

I have Resharper 4 installed into Visual Studio 2008.  On 64-bit the CTRL-F functionality crashes the application, which has driving me nuts.  My colleague Anand posted a solution to our company Intranet, so I stole his post for my blog for future reference. 

Thanks Anand 😉

“Visual Studio might crash when using the Find feature on a 64 bit system. This msdn article explains the issue. KB947841 I uncovered this problem after installing resharper on a 64-bit system, however, it is not related to resharper. Installing this add-in simply uncovers this bug in visual studio. You need to request this hotfix ”

Blog moved to WordPress.com

On a whim decision in middle of yesterday evening I decided to move my blog from Community Server 2.1 to WordPress.com.

Please update syndication URL to http://blog.robgarrett.com/feed/

image

Community Server and the team at Data Research Group, where my blog was hosted, have been great and I thank both DRG for their support and free hosting; and the Community Server guys for the wonderful platform I’ve been using for the past 3-4 years.

My decision to move last night wasn’t an agonizing one (hence “whim”) and nothing to do withy the CS platform or hosting, but because I am moving my life in the direction of “less maintenance for Rob.”

I chose to move RobGarrett.com to WordPress.com because WP offers a clean, slick, easy to use interface – and the best part, I don’t have to maintain it.  It’s taken me a while to comprehend that the more services one is responsible the more headaches one has to deal with (not that my blog was ever a huge burden).  WP affords me the ability to concentrate on blog posting, and never do I have to worry about backing up data, checking in a site errors, or making changes inline with infrastructure changes at my hosting org.

I did consider several other blog engines, especially SharePoint, since this is the focus of my career, but settled on WP because it was free, they offer 3GB of space, and configuration is simple.

The following is a list of pros and cons I have evaluated in the 24 hours since I moved to WP:

Pros

-  Easy to use administration interface
-  Stock templates – get bored with look and feel, I can just change it
-  iPhone application available
-  Never going away (hopefully), infrastructure maintained by WP team
-  Never have to worry about backups again
-  3GB of storage space (can pay for addition)
-  Stable platform, should never error out

Cons

-  Limited customization ability
-  No Google ads
-  No Google analytics
-  Have to pay extra for custom CSS

Moving my blog posts from CS 2.1 wasn’t as painful as I thought it would be.  I followed a great post from Rob Walling, which led me to use the CS BLOGML export tool from CodePlex, to export all my posts to BLOGML.  Once I exported my content, I was then able to massage the content, convert to WordPress.com WXR format using Damien G’s XSLT (and Visual Studio 2008), and then import the content directly into WP – presto, posts and comments.

The above process did some hand-holding.  Trawling the web, I found some claims to developed tools that would do the complete migration in one step, but never found a so called solution that worked.  With some knowledge of ASP.NET (debugging the CS export tool) and XSLT (for WXR convert) I was able to weed out posts causing difficulty in the conversion process and pull over a clean set.

I’m not sure if WP has fixed the importer recently, but I read many exasperating complaints about the WXR importer timing out.  I was able to import 300+ posts (about a 2MB file) with no issue.

So… enjoy the location, and send me feedback about anything you see broken, something you don’t like, or praise for the move 😉