Lucene.Net.IndexProvider

The Lucene Index Provider is a simple service that helps abstract common operations when interacting with Lucene.NET indexes. The index provider exposes methods for managing indexes as well as full CRUD operations. It tries to enforce proper usage of Lucene by providing a read-only mode along with a read/write mode. Search operations are also abstracted into a fluent search term builder, which helps to ease the pain of building complex queries.

How to Install

nuget add Lucene.Net.IndexProvider

For the full source and contribution history, visit the GitHub repository.


Setup

Wire up the required services using dependency injection:

services.AddLuceneDocumentMapper();
services.AddLuceneProvider();
services.AddScoped<ILocalIndexPathFactory, IndexLocalPathFactory>();
services.AddScoped<ILuceneDirectoryFactory, LuceneDirectoryFactory>();

Note: ILocalIndexPathFactory is soon to be deprecated. It is currently only used when swapping indexes. Since certain indexes can live in memory or blob storage, this interface will eventually become obsolete.

After wiring up the services, configure each index (or one shared configuration for all) on application startup:

var luceneConfig = new LuceneConfig
{
  BatchSize = 10000,
  Indexes = new List<string>()
  {
      nameof(Document),
      nameof(BlogPost),
      nameof(Category),
      nameof(Tag),
      nameof(WebPage)
  },
  LuceneVersion = LuceneVersion.LUCENE_48,
  ReadOnly = true
};

_indexConfigurationManager.AddConfiguration(luceneConfig);

Implement ILuceneDirectoryFactory to control where index data is stored. The example below connects to Azure Blob Storage, with support for both managed identity and connection string authentication:

public class LuceneDirectoryFactory : ILuceneDirectoryFactory
{
  private readonly IConfiguration _configuration;

  public LuceneDirectoryFactory(IConfiguration configuration)
  {
      _configuration = configuration;
  }

  public Directory GetIndexDirectory(string indexName)
  {
      string storageUrlSTring = _configuration["Storage:Url"];
      Uri storageUrl = new Uri(storageUrlSTring);
      var managedIdentityClientId = _configuration["ManagedIdentity:ClientId"];
      BlobServiceClient blobServiceClient;
      if (!string.IsNullOrEmpty(managedIdentityClientId))
      {
          blobServiceClient = new BlobServiceClient(storageUrl, new ManagedIdentityCredential(managedIdentityClientId));
      }
      else
      {
          string connStr = _configuration["Storage:ConnectionString"];
          blobServiceClient = new BlobServiceClient(connStr);
      }

      AzureDirectory azureDirectory = new AzureDirectory(blobServiceClient, $"indexes/{indexName}", new RAMDirectory());
      return azureDirectory;
  }
}

The index provider uses a fluent builder pattern for constructing search queries. A basic search looks like this:

var listResult =
  await _indexProvider.Search()
      .Must(() => new TermQuery(new Term("Tags.Name", "my-test-tag")))
      .ListResult<BlogPost>();

You can chain as many Must, Should, and MustNot clauses as needed to build up complex boolean queries.

Sorting and paging are also supported as part of the fluent chain:

var blogPosts =
  (await _indexProvider.Search()
      .Must(() => new TermQuery(new Term(nameof(BlogPost.IsPublished), Boolean.TrueString)))
      .Sort(() => new SortField(nameof(BlogPost.PublishedDate), SortFieldType.STRING, true))
      .Paged(page, pageSize)
      .ListResult<BlogPost>());

Because the builder pattern is used, you can break up query construction across multiple statements for more complex scenarios — for example, building fuzzy multi-field queries per search term:

var documentSearch = _indexProvider.Search();
var blogSearch = _indexProvider.Search();
if (model.QueryParams.TryGetValue("q", out var query) && !string.IsNullOrEmpty(query.Trim()))
{
  var terms = query.ToLowerInvariant().Split(' ', StringSplitOptions.RemoveEmptyEntries);

  var documentBooleanQuery = new BooleanQuery();
  var blogPostBooleanQuery = new BooleanQuery();

  foreach (var term in terms)
  {
      documentBooleanQuery.Add(new FuzzyQuery(new Term(nameof(Document.Body), term), maxEdits: 2), Occur.SHOULD);
      documentBooleanQuery.Add(new FuzzyQuery(new Term(nameof(Document.Name), term), maxEdits: 2), Occur.SHOULD);

      blogPostBooleanQuery.Add(new FuzzyQuery(new Term(nameof(BlogPost.Body), term), maxEdits: 2), Occur.SHOULD);
      blogPostBooleanQuery.Add(new FuzzyQuery(new Term(nameof(BlogPost.Name), term), maxEdits: 2), Occur.SHOULD);
  }

  documentSearch.Must(() => documentBooleanQuery);
  blogSearch.Must(() => blogPostBooleanQuery);
}

var documents = documentSearch.ListResult<Document>();
var blogPosts = blogSearch.ListResult<BlogPost>();

CRUD Operations

When in READ/WRITE mode, the IndexSessionManager creates and manages both a SearcherManager and an IndexWriter.

After any Delete, Add, or Update operation, you must call Commit on the IndexSessionManager to persist changes. Once committed, MaybeRefresh is automatically called on the SearcherManager to keep reads consistent.

If you have two separate instances — one READ and one WRITE — writing works the same as above, but the read instance requires an explicit mechanism to refresh the index when it goes stale. The following example shows a webhook endpoint that handles this:

private readonly IIndexSessionManager _indexSessionManager;
private readonly IDirectoryManager _directoryManager;

public WebHookController(IIndexSessionManager indexSessionManager, IDirectoryManager directoryManager)
{
    _indexSessionManager = indexSessionManager;
    _directoryManager = directoryManager;
}

[Route(HttpVerbs.Put, "/sync")]
public async Task Put()
{
    foreach (var contextSession in _indexSessionManager.ContextSessions)
    {
        _indexSessionManager.AddLock(contextSession.Key);
        _indexSessionManager.CloseSession(contextSession.Key);
        _directoryManager.DisposeDirectory(contextSession.Key);
        _indexSessionManager.ReleaseLock(contextSession.Key);
    }
}

It is important to acquire a lock before closing and refreshing the index. The lock forces any in-flight reads to wait until the refresh is complete, preventing reads against a partially refreshed or disposed index.