Search This Blog

Wednesday, 11 February 2015

Indexes and MongoDB

I have never given too  much thought to database indexing. Its a concept that works silently in the background of database tables and views. Most databases provide indexes by default for the primary key and unique keys.
If it comes to performance concerns, a DBA steps in and does his Index magic. As a java developer it wasn't something I had to worry about. So now that I am learning MongoDB at my own sweet pace, I decided to look at indexing.
So what is an index ? That is from where I decided to start. From the SQL Server docs:
An index is an on-disk structure associated with a table or view that speeds retrieval 
of rows from the table or view. An index contains keys built from one or more columns 
in the table or view. These keys are stored in a structure (B-tree) that enables SQL 
Server to find the row or rows associated with the key values quickly and efficiently.
This is more or less how all databases are using indexes (I think.) I tried to go all diagram on it
For this database I have created a simple table of user. I have also added an index on the firstname property. All this was done via the database IDE. At a higher level.
At the lower (physical) level we are dealing with files and binary data. The index table for firstname (This data is in a file, I am guessing) provides the database with a short cut to reach the records quickly. So if a query involved something like "where firstname like 'John' ", the database looks up the index, locates the entry for 'John'. The associated value for 'John' tells the database to look at the record No 3 in the data file. A quick find of the query result. In the absence of the indexes, the database would have to scan all the records in users and located the result.
MongoDB also has indexes. Big data without indexing would be a nightmare ! In fact the _id attribute of the document has an index associated with it by default. We can also create additional indexes to our collection.
Here I first executed the getIndexes method on the collection to get the available indexes.As seen here for the fruits collection only the _id column has an index. I then added an index using the ensureIndex method. The -1 value indicates the sort order to be used. -1 implies descending sort while 1 is for ascending. There are additional settings available for index creation (detailed here.)
By default, creating an index blocks all other operations on a database. When building an index on a 
collection, the database that holds the collection is unavailable for read or write operations until 
the index build completes. Any operation that requires a read or write lock on all databases 
(e.g. listDatabases) will wait for the foreground index build to complete.
For potentially long running index building operations, consider the background operation so that the 
MongoDB database remains available during the index building operation.
Accordingly I decided to build an index in the background:
>db.Players.ensureIndex( { name: 1}, {background: true,unique:false} )
The unique property if passed as true will treat the property as a unique one. Its false by default.
Background indexing operations run in the background so that other database operations can run 
while creating the index. However, the mongo shell session or connection where you are creating 
the index will block until the index build is complete. To continue issuing commands to the database,
open another connection or mongo instance.
This pretty much renders my cmd screen useless. ( The perfect excuse for me to stop my exploring too :P )
What I have gone over is just the basics on indexing in MongoDB. Its whole power hasn't been explored yet. The above create index/ view indexes operations can also be done in Java:
public static void main(String[] args) throws UnknownHostException {
      Mongo mongoClient = new Mongo();
      DB targetDB = mongoClient.getDB(DB_NAME);

      DBCollection collection = targetDB.getCollection(COLLECTION_NAME);
      List<DBObject> indexes = collection.getIndexInfo();
      System.out.println("Indexes on " + collection.getFullName() + " are : ");
      for (DBObject index : indexes) {
         System.out.println(index);
      }     
   }
The output is :
Indexes on fruits.fruits are : 
{ "v" : 1 , "key" : { "_id" : 1} , "ns" : "fruits.fruits" , "name" : "_id_"}
{ "v" : 1 , "key" : { "name" : -1.0} , "ns" : "fruits.fruits" , "name" : "name_-1"}
To create an index is also pretty straight forward.
public static void main(String[] args) throws UnknownHostException {
      Mongo mongoClient = new Mongo();
      DB targetDB = mongoClient.getDB(DB_NAME);

      DBCollection collection = targetDB.getCollection(COLLECTION_NAME);
      collection.createIndex(new BasicDBObject("rating", 1).append("unique", true));  // create unique index, ascending
      List<DBObject> indexes = collection.getIndexInfo();
      System.out.println("Indexes on " + collection.getFullName() + " are : ");
      for (DBObject index : indexes) {
         System.out.println(index);
      }
      
   }
If I run this code than:
Indexes on fruits.fruits are : 
{ "v" : 1 , "key" : { "_id" : 1} , "ns" : "fruits.fruits" , "name" : "_id_"}
{ "v" : 1 , "key" : { "name" : -1.0} , "ns" : "fruits.fruits" , "name" : "name_-1"}
{ "v" : 1 , "key" : { "rating" : 1 , "unique" : true} , "ns" : "fruits.fruits" , "name" : "rating_1_unique_"}
As indexing is a means of speeding up query execution, this snippet from the MongoDB docs is something to keep in mind when writing queries:
Some query operations are not selective. These operations cannot use indexes  effectively 
or cannot use indexes at all. The inequality operators $nin and $ne are not very selective, 
as they often match a large portion of the index. As a result, in most cases, a $nin or $ne 
query with an index may perform no better than a $nin or $ne query that must scan all 
documents in a collection. Queries that specify regular expressions, with inline JavaScript 
regular expressions or $regex operator expressions, cannot use an index with one exception. 
Queries that specify regular expression with anchors at the beginning of a string can use an 
index. 

1 comment:

  1. BlueHost is ultimately one of the best hosting company with plans for any hosting requirements.

    ReplyDelete