Sunday, 28 November 2010

FAST Search for SharePoint 2010 JDBC Database Connector

Searching Database Content with Fast Search for SharePoint 2010 – Part 2
FAST Search for SharePoint 2010 JDBC Database Connector

FAST Search has a JDBC Database Connector that can be used to crawl and search across data from a database. As explained in Part 1, this connector is a port from FAST ESP. You can also use the BCS with FAST Search to crawl database content, which will be detailed in Part 3.

This examples was produced on the 2010 Information Worker Demonstration and Evaluation Virtual Machine (RTM) downloaded from - http://www.microsoft.com/downloads/en/details.aspx?FamilyID=751fa0d1-356c-4002-9c60-d539896c66ce&displaylang=en

If you don’t have your own FAST environment setup, and you want to run through these examples, download the IW Demo VM as it contains a completely functioning FAST Search server.

The IW VM also has a series of SQL Databases that you can have a play with. To work through these examples I’ve selected the Contoso_Retail_DW database, this database contains a table called DimProduct with 2517 rows of product data. The Products fall into the following main categories:

  • Audio
  • TV and Video
  • Computers
  • Cameras and camcorders
  • Cell phones
  • Music, Movies and Audio Books
  • Games and Toys
  • Home Appliances
For search purposes I always like to create a denormalised SQL View or SQL Statement. The following SQL Statement only returns the columns of data I want to use for this demonstration, it’s typically worth creating a stored procedure or SQL View at this point.

SELECT
p.ProductKey,
p.ProductName,
p.ProductDescription,
p.BrandName,
p.ClassName,
p.ColorName,
p.UnitPrice,
psc.ProductSubcategoryName,
pc.ProductCategoryDescription
FROM Contoso_Retail_DW.dbo.DimProduct p
LEFT OUTER JOIN Contoso_Retail_DW.dbo.DimProductSubcategory psc
on p.ProductSubcategoryKey = psc.ProductSubcategoryKey
LEFT OUTER JOIN Contoso_Retail_DW.dbo.DimProductCategory pc
on psc.ProductCategoryKey = pc.ProductCategoryKey

The data is quite handy for demoing FAST Search capabilities, as we should be able to setup some nice Refiners for Brand, Class, Colour and Price.

So now we have our SQL Statement, the next step involves getting FAST to execute this SQL and then feed it into its search Index so we can search over it.

Create a New Content Collection

This step isn’t mandatory, you can choose to use the default sp content collection that is setup by default in FAST Search for SharePoint 2010. When crawling content outside of SharePoint using a different connector, I like to create a new Content Collection, as you can clear this Content Collection and still leave the default content collection intact. You may have millions of items already crawled in the sp content collection.

To create a new Content Collection use the New-FASTSearchContentCollectionPowershell cmdlet in the FAST Search Powershell console.
Note: FAST Search has its own Powershell console for executing FAST Search related cmdlets, make sure you use the Microsoft FAST Search Server 2010 for SharePoint shell for all FAST Search related configuration.

PS C:\FASTSearch\bin> New-FASTSearchContentCollection -name ContosoProducts -Description "A collection for Contoso Retail Product content from the Contoso_Retail_DW SQL Server 2008 database"

Name : ContosoProducts

Created : 2010-11-20T23:06:59

Cleared : 2010-11-20T23:06:59

LastInput :

DocumentCount : 0

Description : A collection for Contoso Retail Product content from the Contoso_Retail_DW SQL Server 2008 database

Pipeline : Office14 (webcluster)



Note that the maximum name length for a Content Collection is only 16.

The next step involves creating a JDBC Connector configuration file. Navigate to the \etc folder in your FAST install directory. This is C:\FAST\etc in the IW VM. The etc folder contains a series of configuration files for configuring FAST Search, there are a few example files that can be used for connectors, the one we are after is called jdbctemplate.xml. Make a copy of this file and rename it, eg. ContosoProduct_jdbctemplate.xml.

Now edit the file in your XML editor of choice. Some of the settings you want to change are:

The connector can connect to a variety of database servers, enter the Microsoft SQL Server specific driver in the JDBCDriver setting.

JDBCDriver com.microsoft.sqlserver.jdbc.SQLServerDriver

Enter the DB Server and Database in the JDBCUrl setting

JDBCURL jdbc:sqlserver://DEMO2010A:1433;database=Contoso_Retail_DW

Enter the SQL Select statement for the JDBCSQL setting.

JDBCSQL

p.ProductKey,

p.ProductName,

p.ProductDescription,

p.Manufacturer,

p.BrandName,

p.ClassName,

p.ColorName,

p.UnitCost,

p.UnitPrice,

psc.ProductSubcategoryName,

pc.ProductCategoryDescription

FROM Contoso_Retail_DW.dbo.DimProduct p

LEFT OUTER JOIN Contoso_Retail_DW.dbo.DimProductSubcategory psc

on p.ProductSubcategoryKey = psc.ProductSubcategoryKey

LEFT OUTER JOIN Contoso_Retail_DW.dbo.DimProductCategory pc

on psc.ProductCategoryKey = pc.ProductCategoryKey

]]>


Now specify the primary key column to use

JDBCPrimaryKeyField ProductKey

and the content collection we created earlier.

Collection ContosoProducts

For testing purposes you can also make the connector output FAST XML files containing the data of the content crawled by setting ExportToXML to true.

ExportToXML True


Now run the connector by executing the jdbcconnector.bat file as follows:

PS C:\FASTSearch\bin> jdbcconnector start -f ../etc/ContosoProduct_jdbctemplate.xml

Copyright (c) Microsoft Corporation. All rights reserved.

23:46:03,516 INFO [JDBCConnector] Starting the connector!

23:46:03,523 INFO [JDBCConnector] Validating config.......

23:46:03,742 WARN [JDBCConnector] Empty parameter Input/JDBCPassword in config

file

23:46:03,926 INFO [JDBCConnector] Testing connections to external systems

23:46:03,961 WARN [JDBCConnector] Empty parameter Input/JDBCPassword in config

file

23:46:03,994 INFO [JDBCConnector] Checking if connections to source and target

work....

23:46:04,617 INFO [JDBCAdapter] Opened JDBC Connection

23:46:04,621 INFO [JDBCConnector] Connection made to source system

23:46:04,632 INFO [CCTKDocumentFeeder] Publisher :Initializing: com.fastsearch.

esp.cctk.publishers.CCTKDocumentFeeder

23:46:04,750 INFO [CCTKDocumentFeeder] Using Content Distributor(s): demo2010a.

contoso.com:13390

23:46:04,764 INFO [FASTXMLOutputHandler] Creating output Directory: ..\var\jdbc

connector\ContosoProducts\fastxml

23:46:05,121 INFO [CCTKDocumentFeeder] Initiated CCTK Document Feeder:

23:46:05,122 INFO [CCTKDocumentFeeder] FASTSearchSubmit/ActuallyPublish:true

23:46:05,123 INFO [CCTKDocumentFeeder] XMLExport/ExportToXML:true

23:46:05,125 INFO [CCTKDocumentFeeder] FASTSearchSubmit/Collection:ContosoProdu

cts

23:46:05,128 INFO [CCTKDocumentFeeder] FASTSearchSubmit/MaxActiveDocuments(MB):

10

23:46:05,130 INFO [CCTKDocumentFeeder] FASTSearchSubmit/MaxBatchSize(KB):1000

23:46:05,132 INFO [CCTKDocumentFeeder] FASTSearchSubmit/BatchSubmissionTimeoutS

ec:30

23:46:05,134 INFO [CCTKDocumentFeeder] FASTSearchSubmit/TimeOut:60

23:46:05,136 INFO [CCTKDocumentFeeder] FASTSearchSubmit/MaxDocsInBatch:100

23:46:05,138 INFO [CCTKDocumentFeeder] FASTSearchSubmit/MaxRetries:3

23:46:05,139 INFO [CCTKDocumentFeeder] FASTSearchSubmit/ContentDistributors:dem

o2010a.contoso.com:13390

23:46:05,143 INFO [CCTKDocumentFeeder] FASTSearchSubmit/CallbackType:Secured

23:46:05,145 INFO [JDBCConnector] Connection made to target system

23:46:05,147 INFO [JDBCConnector] Initializing.......

23:46:05,195 INFO [JDBCConnector] Starting crawl.......

23:46:05,199 INFO [JDBCConnector] Will start up publishers......

23:46:05,202 INFO [CCTKDocumentFeeder] Publisher :Initializing: com.fastsearch.

esp.cctk.publishers.CCTKDocumentFeeder

23:46:05,229 INFO [CCTKDocumentFeeder] Using Content Distributor(s): demo2010a.

contoso.com:13390

23:46:05,239 INFO [FASTXMLOutputHandler] Creating output Directory: ..\var\jdbc

connector\ContosoProducts\fastxml

23:46:05,338 INFO [CCTKDocumentFeeder] Initiated CCTK Document Feeder:

23:46:05,339 INFO [CCTKDocumentFeeder] FASTSearchSubmit/ActuallyPublish:true

23:46:05,341 INFO [CCTKDocumentFeeder] XMLExport/ExportToXML:true

23:46:05,342 INFO [CCTKDocumentFeeder] FASTSearchSubmit/Collection:ContosoProdu

cts

23:46:05,345 INFO [CCTKDocumentFeeder] FASTSearchSubmit/MaxActiveDocuments(MB):

5

23:46:05,348 INFO [CCTKDocumentFeeder] FASTSearchSubmit/MaxBatchSize(KB):1000

23:46:05,350 INFO [CCTKDocumentFeeder] FASTSearchSubmit/BatchSubmissionTimeoutS

ec:30

23:46:05,353 INFO [CCTKDocumentFeeder] FASTSearchSubmit/TimeOut:60

23:46:05,355 INFO [CCTKDocumentFeeder] FASTSearchSubmit/MaxDocsInBatch:100

23:46:05,356 INFO [CCTKDocumentFeeder] FASTSearchSubmit/MaxRetries:3

23:46:05,358 INFO [CCTKDocumentFeeder] FASTSearchSubmit/ContentDistributors:dem

o2010a.contoso.com:13390

23:46:05,360 INFO [CCTKDocumentFeeder] FASTSearchSubmit/CallbackType:Secured

23:46:05,362 INFO [JDBCConnector] Starting publisher thread 0

23:46:05,390 INFO [JDBCAdapter] Opened JDBC Connection

23:46:05,411 INFO [JDBCAdapter] Running PRESQL

23:46:05,413 INFO [JDBCAdapter] Finished running PRESQL

23:46:05,440 INFO [JDBCAdapter] Running SQL: SELECT

p.ProductKey,

p.ProductName,

p.ProductDescription,

p.Manufacturer,

p.BrandName,

p.ClassName,

p.ColorName,

p.UnitCost,

p.UnitPrice,

psc.ProductSubcategoryName,

pc.ProductCategoryDescription

FROM Contoso_Retail_DW.dbo.DimProduct p

LEFT OUTER JOIN Contoso_Retail_DW.dbo.DimProductSubcategory psc

on p.ProductSubcategoryKey = psc.ProductSubcategoryKey

LEFT OUTER JOIN Contoso_Retail_DW.dbo.DimProductCategory pc

on psc.ProductCategoryKey = pc.ProductCategoryKey

23:46:05,465 INFO [JDBCAdapter] The time taken to execute SQL is : 0m 0s

23:46:09,089 INFO [CCTKDocumentFeeder] Publisher instance: 0 published: 1000 (3

33.33334 docs/sec)

23:46:09,097 INFO [FASTXMLOutputHandler] Publishing 1000 documents to: fastxml5

969913899608785081.xml

23:46:15,557 INFO [CCTKDocumentFeeder] Publisher instance: 0 published: 2000 (2

00.0 docs/sec)

23:46:15,564 INFO [FASTXMLOutputHandler] Publishing 1000 documents to: fastxml7

360425706418517311.xml

23:46:21,197 INFO [JDBCAdapter] Running postSQL

23:46:21,199 INFO [JDBCAdapter] Finished running postSQL

23:46:21,201 INFO [JDBCAdapter] Closed JDBC connection

23:46:21,621 INFO [CCTKDocumentFeeder] Publisher 0: Last document in queue has

been read

23:46:21,622 INFO [CCTKDocumentFeeder] Publisher 0: Shutting down: com.fastsear

ch.esp.cctk.publishers.CCTKDocumentFeeder

23:46:21,625 INFO [CCTKDocumentFeeder] Waiting for FAST Search feeder to comple

te.....

23:47:45,411 INFO [JDBCConnector] All tasks done. Telling publishers to stop wa

iting for items....

23:50:19,078 INFO [CCTKDocumentFeeder] Feeding to FAST Search completed.....

23:50:19,376 INFO [FASTXMLOutputHandler] Publishing 517 documents to: fastxml65

8648051932195231.xml

23:50:19,657 INFO [CCTKDocumentFeeder] At shutdown, the following statistics ha

ve been collected:

23:50:19,660 INFO [CCTKDocumentFeeder] Total number of extracted documents...:2

517

23:50:19,663 INFO [CCTKDocumentFeeder] Rate of extracted documents...........:9

.890447

23:50:19,665 INFO [CCTKDocumentFeeder] Total number of published documents...:2

517

23:50:19,668 INFO [CCTKDocumentFeeder] Rate of published documents...........:9

.890252

23:50:19,671 INFO [CCTKDocumentFeeder] Total number of successful callbacks..:2

517

23:50:19,674 INFO [CCTKDocumentFeeder] Total number of failed callbacks......:0


23:51:05,415 INFO [JDBCConnector] Calling shutdown method on adapters

23:51:05,417 INFO [JDBCConnector] All adapters have completed

23:51:05,419 INFO [JDBCConnector] Stop listening thread has completed

23:51:05,422 INFO [JDBCConnector] Crawl finished

23:51:05,425 INFO [JDBCConnector] Finished running the connector!

PS C:\FASTSearch\bin>


That should be it folks, based on the output the crawling of the database has completed successfully. Now look for the Output folder setting in the XML file and inspect the output folder.

C:\FASTSearch\var\jdbcconnector\ContosoProducts\log
Will contain a log file with similar output to the Powershell console.

C:\FASTSearch\var\jdbcconnector\ContosoProducts\fastxml
Will contain the FIXML files with crawled content.


Inspecting these files shows the data being successfully crawled.

You can also check the content collection for documents by executing the Get-FASTContentCollection Powershell command

PS C:\FASTSearch\bin> Get-FASTSearchContentCollection -Name ContosoProducts


Name : ContosoProducts

Created : 2010-11-20T23:06:59

Cleared : 2010-11-20T23:06:59

LastInput : 11/20/2010 11:50:10 PM

DocumentCount : 2517

Description : A collection for Contoso Retail Product content from the Contos

o_Retail_DW SQL Server 2008 database

Pipeline : Office14 (webcluster)


On inspection of the ContosoProducts Content Collection we now have 2517 items in it, that’s great news!

Now lets perform a searches to see if the DB data is returned in a FAST Search Centre.





So it looks like our database data has been crawled and indexed, and it’s being returned in search results in a FAST Search Centre.

The job’s not done. The search experience could do with a bit of work:

The ProductKey primary key is being used for the Title field in search results.

The Search interface could do with Refiners

Search result URL targets are also of the following format http://intranet.contoso.com/search/Pages/2401 you could resolve this by adding column to your sql query to create the desired URL to a product display page. This is another benefit of the BCS which helps generate this for you.

Mapping Managed Properties to Crawled Properties.
To map properties and create refiners, go to FAST Query SSA Service Application in SharePoint Central Administration, Click on FAST Search Administration in the left hand menu, if you get an error at this stage, the central administration application pool account might not have FAST Administrative priveliges and the account needs to be added to the FASTSearchAdministrators group on the FAST Search Server.

First we need to map a crawled property to the Title managed property such as the Product Name. Find the Title managed property in the list and edit it.

Map a crawled property by clicking the 'add mapping' button.



If you select the crawled property category and perform a search it makes it easier to find the right property.

Now that the Title property is mapped, map the productdescription crawled property to the Description managed property. This will improve search relevance as the managed property has stemming enabled.

next we will create new Managed Properties for category, subcategory, brandname, colour & price.





Be sure to tick the Refiner Property and Deep Refiner options for these Managed Properties so you can use them as refiners.

After creating the properties and setting the mappings, execute the connector again for the mappings to be picked up.

The next step is to navigate back to your FAST search center. Refiners do not automatically show up in a search center, they need to be added to the refinement panel web part.

So edit the page and sitting in the Left Zone edit the properties of the Refinement Panel web part.



It's important to untick the 'use default configuration' option if editing these web parts for you changes to take place.







Edit the Filter Category Definition XML.

You can copy existing refiner definitions and changed the MappedProperty setting to your own. The following Category nodes were added to the XML and the page was checked in.



Now when searches are performed, refiners based on the Managed Properties and the Crawled Properties they've mapped to will display in search results.

Below are some test searches and screenshots of the custom Refiners in action!










Searching Database Content with Fast Search for SharePoint 2010 – Part 1

Searching Database Content with Fast Search for SharePoint 2010 – Part 1

The search story is much better in 2010 for SharePoint. There’s a new Search Engine available deploy with SharePoint Server 2010 – FAST Search for SharePoint 2010 (FS4SP). I really could have used FAST Search for SharePoint in my past Enterprise SharePoint Search projects. It produces significantly better relevant results than the MOSS 2007 search engine. It has many more options available for tweaking the search UI and search engine. It also has better options available for developers developing and extending custom search solutions.

Why is this quality search engine now available to deploy with SharePoint? Microsoft acquired FAST Search & Transfer and its FAST ESP product for $US 1.2 Billion in early 2008. This is a common Microsoft approach at getting ahead in a market, acquire an innovative company with a quality product. The same happened with MCMS 2002 (the forerunner to SharePoint WCM) which was based on a product gained from the NCompass Labs acquisition.

Microsoft Announces Offer to Acquire Fast Search & Transfer.

Microsoft Completes Tender Offer for FAST Search & Transfer

Artistitic re-enactment of the acquisition.

Since the FAST acquisition, Microsoft has had the time to bake in a version of FAST ESP into the next version of SharePoint, and it’s time to make use of this awesome new enhancement to the platform.

Options for Searching Database Content with FAST Search for SharePoint 2010

So you want to search custom SQL Server databases using FAST?

If you have never indexed anything other than SharePoint Sites with FS4SP, I recommend first checking out this page on TechNet:

Plan and design for content collection (FAST Search Server 2010 for SharePoint)

It gives a great summary of the options available to you, and links to further TechNet pages containing step-by-step details on connecting FAST to File Shares, Exchange, People Profiles, Web sites, Lotus Notes and more.

This page on TechNet lists two options for using FS4SP to crawl and search databases:

Business Data Catalog-based indexing connectors

  • Use if the preferred configuration method is using the Microsoft SharePoint Designer 2010.
  • Use when you want to use time stamp based change detection for incremental database crawls.
  • Use when the preferred operation method is using the Microsoft SharePoint Server 2010 Central Administration.
  • Use when you want to enable crawling based on the change log. This can be achieved by directly modifying the connector model file and creating a stored procedure in the database.

FAST Search database connector

  • Use when the preferred configuration method is using SQL queries.
  • Use when you want advanced data joining operation options through SQL queries.
  • Use when you want to use advanced incremental update features. FAST Search database connector uses checksum based change detection for incremental crawls if there is no update information available. The connector also supports time stamp based change detection and change detection based on update and delete flags.

That’s right, you’ve got not just one, but two approaches that you can use with FAST Search for SharePoint to make this possible. Why are there two ways to do this?

The preferred method of integrating database content and LOB systems with SharePoint 2010 is using the BCS. You can also use the BCS to crawl databases to use in SharePoint 2010 Search and FS4SP. It also provides extra goodies for use in SharePoint such as external lists. It also provides a nicer user interface to get the job done.

The second option available to use is a port of the FAST ESP JDBC database connector. FAST ESP had to have a way to connect to databases before a port of FAST ESP was implemented in SharePoint, it had the JDBC connector. In comparison to the BCS, the JDBC connector doesn’t have as nice a gui to setup the connection and crawling. It also requires installation of the 32 bit Java JRE, That’s right folks, you need to have a JVM on your FAST server to get the JDBC connector to work! A few other ‘FAST Search specific’ connectors include the web connector and Lotus Notes connector.

More on these 'FAST Search Specific' connectors:

Plan to deploy FAST Search specific connectors (FAST Search Server 2010 for SharePoint)



Parts 2 of this series details steps to get FS4SP to Index a SQL Server 2008 products database using the FAST Search JDBC Database Connector.

Part 3 of this series detail steps to get FS4SP to Index a SQL Server 2008 products database using the BCS Connector.

Part 4 Comparison of the BCS and FAST Search database connector. [stay tuned...]

Friday, 19 November 2010

Working on FAST Search for SharePoint 2010 - Watch this space

I've been busy on a number of FAST Search for SharePoint 2010 (FS4SP) projects recently and I'm planning on blogging about some of my experiences in the comming weeks and months, so stay tuned folks.

Monday, 14 June 2010

Perth SharePoint User Group Webcasts!


We've been recording our recent Perth SharePoint User Group meeting presentations.

Jeremy Thake is encoding and uploading these to the Internet for the world to see what we're getting up to in our user group here in Perth, Western Australia.

Upgrading from MOSS 2007 to SharePoint Server 2010 Sezai Komur

Build SharePoint 2010 Solutions with Visual Studio 2010 Michael Hanes

Annual Leave Form - InfoPath 2010 BCS SPD 2010 Workflow and SP2010 Matt Menezes

SharePoint 2007 and HP TRIM 7 Integration Overview Jeremy Thake

You can view a variety of other web casts on the SharePointDevWiki here: http://www.sharepointdevwiki.com/display/SPPodCasts/Home


Wednesday, 28 April 2010

My New Job at CSG

I've just started a new job as a SharePoint Solutions Architect working at CSG http://www.csg.com.au

CSG is a large Australian IT company and I'm looking forward to implementing some big SharePoint Server 2010 projects here.

I'll also try and focus more time on blogging and community stuff which I've been neglecting in recent months.

Wish me luck!

Tuesday, 15 September 2009

Perth SharePoint User Group - September Presentation

I presented at the Perth SharePoint User Group September meeting.

The PowerPoint for the presentation can be downloaded here.

It was a blast! thanks to everyone who attended and the questions about SPDocGen, Database Management, Creating Explicit Managed Paths, Database Autogrowth settings and SharePoint Object Model Disposal.

Thursday, 3 September 2009

SPDocGen - SharePoint Documentation Generator - Full Source Code Uploaded

I've done some minor tweaks to SPDocGen, added some console output and a little error handling.

I've also uploaded the full source code for the solution, so feel free to modify it as you need.

SPDocGen - SharePoint Documentation Generator

Here's some ideas I have for further improving and building on the solution:

1. Implement command line parameters to let users specify what is output.
2. Create a winforms UI to allow users to specify what is output, and specify the XML, XSLT and DOC filenames.
3. Document even more Farm level information...
4. SPDocGen currently produces Farm Level documentation, the next step is to produce detailed Site Collection level documentation with detailed reporting of Site Collection specific info such as Content Types, Site Columns, Lists etc. etc.

Post a comment here if you're interested in being involved and collaborating on this further.

Tuesday, 1 September 2009

SPDocGen - SharePoint Documentation Generator

I've just created this project on CodePlex http://spdocgen.codeplex.com/


SPDocGen aka. SharePoint Documentation Generator is a console app that generates WSS 3.0 and MOSS 2007 farm documentation or reports using the SharePoint object model api. It produces an XML file, which is then transformed into a .doc file using XSLT and WordML.

I started this project to save myself time, because I've been tasked with documenting SharePoint Deployments quite a few times, and its usually a time consuming process of clicking through Central Admin, running batch commands of STSADM or using SharePoint Manager then copy and pasting the info into a word document. Service Pack 2 also has a preupgradecheck command that can produce some documentation for you. It might also be worth having a look at the SharePoint Admin Toolkit, SPSReport, SPDiag or MOSSRAP

Hope it saves everyone some documentation time so you can get back to spending your time on more worthwhile tasks like SharePoint dev or SharePoint admin.

Get SPDocGen

Tuesday, 14 July 2009

SharePoint 2010 Sneak Peek

Microsoft have released some sneak peek info on SharePoint 2010!

Go check out the movies and info here: http://sharepoint.microsoft.com/2010/Sneak_Peek

Microsoft plan to release a lot of info at the SharePoint conference in Vegas in October, http://www.mssharepointconference.com it's nice to get this sneak peek info while we wait for the conference.