Lucene’s Field options (Store and Index), aka RTFM :)

To make a long story short, if you want to make a field act as a database primary key (so you could update/delete a document), create a field with this parameters:

document.add(new Field("id", object.getId(), Store.YES, Index.NOT_ANALYZED));

This was a big RTFM moment for me. I’ve lost a lot of time with this one, and it was really simple. So I guess you are wondering why I didn’t check this :) Well in another project, it was Index.NO instead of Index.NOT_ANALYZED, and I thought that was ok, and checked EVERYTHING else, except that line you see above this text.

So, let’s read the manual together, first the Store options

  • Store.YES – Value is stored in index, so it can be retrieved by an IndexReader.
  • Store.NO – Not stored :)

Now the Index options:

  • Index.No – not indexed, so not searchable (… and in my case, if you don’t index the “ID” field, than it can’t be searchable, which means it can’t be deleted and/or updated)
  • Index.ANALYZED – Field will be indexed, and it will be analyzed (saved as tokens that will be searchable)
  • Index.NOT_ANALYZED – Field will be indexed but in it’s original form (good for things that should be searchable in original form, ID anybody? :))

So, if I got everything right, here are some good combinations:

Store.YES + Index.ANALYZED = In index and analyzed, good for not so big content, such as a title and some short intro text (first few lines you see in a blog, before you go to that post)

Store.NO + Index.ANALYZED = Not stored in index, but analyzed and searchable, so pretty good for big text (content)

Store.YES + Index.NOT_ANALYZED = Stored in index in it’s original form, great for IDs :)

If I got something wrong, feel free to point that out (in comments or hate mail), also if you got lucene tips, please share :D

Advertisements

3 comments

  1. document.add(new Field(field.getKeyword(), value, Field.Store.NO, Field.Index.ANALYZED));

    We are indexing some keyword fields ( but sometimes the keyword values are in other language than English) hence the search results are not retrieving it. Any suggestions on what can be done?

  2. Field constructor in Lucene 4.2.1 is now deprecated in favor of “sugar subclasses: IntField, LongField, FloatField, DoubleField, BinaryDocValuesField, NumericDocValuesField, SortedDocValuesField, StringField, TextField, StoredField” (taken literally from Javadoc). What would be the correct Store.YES+Index.NOT_ANALYZED equivalent class to make a field to act as a database primary key?

    Thanks!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s