Hibernate.orgCommunity Documentation

Chapter 5. Datastores

5.1. Infinispan
5.1.1. Configure Infinispan
5.1.2. Manage data size
5.1.3. Clustering: deploy multiple Infinispan nodes
5.1.4. Transactions
5.1.5. Storing a Lucene index in Infinispan
5.2. Ehcache
5.2.1. Configure Ehcache
5.2.2. Transactions
5.3. MongoDB
5.3.1. Configuring MongoDB
5.3.2. Storage principles
5.3.3. Transactions
5.3.4. Queries
5.4. Neo4j
5.4.1. How to add Neo4j integration
5.4.2. Configuring Neo4j
5.4.3. Storage principles
5.4.4. Transactions
5.5. CouchDB
5.5.1. Configuring CouchDB
5.5.2. Storage principles
5.5.3. Transactions
5.5.4. Queries

Currently Hibernate OGM supports the following datastores:

More are planned, if you are interested, come talk to us (see Chapter 1, How to get help and contribute on Hibernate OGM).

Hibernate OGM interacts with NoSQL datastores via two contracts:

The main thing you need to do is to configure which datastore provider you want to use. This is done via the hibernate.ogm.datastore.provider option. Possible values are

Note

When bootstrapping a session factory or entity manager factory programmatically, you should use the constants declared on OgmProperties to specify configuration properties such as hibernate.ogm.datastore.provider.

In this case you also can specify the provider in form of a class object of a datastore provider type or pass an instance of a datastore provider type:

Map<String, Object> properties = new HashMap<String, Object>();


// pass the type
properties.put( OgmProperties.DATASTORE_PROVIDER, MyDatastoreProvider.class );
// or an instance
properties.put( OgmProperties.DATASTORE_PROVIDER, new MyDatastoreProvider() );
EntityManagerFactory emf = Persistence.createEntityManagerFactory( "my-pu", properties );

You also need to add the relevant Hibernate OGM module in your classpath. In maven that would look like:


<dependency>
    <groupId>org.hibernate.ogm</groupId>
    <artifactId>hibernate-ogm-infinispan</artifactId>
    <version>4.0.0-SNAPSHOT</version>
</dependency>

The module names are hibernate-ogm-infinispan, hibernate-ogm-ehcache, hibernate-ogm-mongodb, hibernate-ogm-neo4j and hibernate-ogm-couchdb. The map datastore is included in the Hibernate OGM engine module.

By default, a datastore provider chooses the best grid dialect transparently but you can manually override that setting with the hibernate.ogm.datastore.grid_dialect option. Use the fully qualified class name of the GridDialect implementation. Most users should ignore this setting entirely and live happy.

Infinispan is an open source in-memory data grid focusing on high performance. As a data grid, you can deploy it on multiple servers - referred to as nodes - and connect to it as if it were a single storage engine: it will cleverly distribute both the computation effort and the data storage.

It is trivial to setup on a single node, in your local JVM, so you can easily try Hibernate OGM. But Infinispan really shines in multiple node deployments: you will need to configure some networking details but nothing changes in terms of application behaviour, while performance and data size can scale linearly.

From all its features we’ll only describe those relevant to Hibernate OGM; for a complete description of all its capabilities and configuration options, refer to the Infinispan project documentation at infinispan.org.

Two steps basically:

Hibernate OGM will not use a single Cache but three and is going to use them for different purposes; so that you can configure the Caches meant for each role separately.

We’ll explain in the following paragraphs how you can take advantage of this and which aspects of Infinispan you’re likely to want to reconfigure from their defaults. All attributes and elements from Infinispan which we don’t mention are safe to ignore. Refer to the Infinispan User Guide for the guru level performance tuning and customizations.

An Infinispan configuration file is an XML file complying with the Infinispan schema; the basic structure is shown in the following example:


The global section contains elements which affect the whole instance; mainly of interest for Hibernate OGM users is the transport element in which we’ll set JGroups configuration overrides.

In the namedCache section (or in default if we want to affect all named caches) we’ll likely want to configure clustering modes, eviction policies and CacheStores.

In its default configuration Infinispan stores all data in the heap of the JVM; in this barebone mode it is conceptually not very different than using a HashMap: the size of the data should fit in the heap of your VM, and stopping/killing/crashing your application will get all data lost with no way to recover it.

To store data permanently (out of the JVM memory) a CacheStore should be enabled. The infinispan-core.jar includes a simple implementation able to store data in simple binary files, on any read/write mounted filesystem; this is an easy starting point, but the real stuff is to be found in the additional modules found in the Infinispan distribution. Here you can find many more implementations to store your data in anything from JDBC connected relational databases, other NoSQL engines, to cloud storage services or other Infinispan clusters. Finally, implementing a custom CacheStore is a trivial programming exercise.

To limit the memory consumption of the precious heap space, you can activate a passivation or an eviction policy; again there are several strategies to play with, for now let’s just consider you’ll likely need one to avoid running out of memory when storing too many entries in the bounded JVM memory space; of course you don’t need to choose one while experimenting with limited data sizes: enabling such a strategy doesn’t have any other impact in the functionality of your Hibernate OGM application (other than performance: entries stored in the Infinispan in-memory space is accessed much quicker than from any CacheStore).

A CacheStore can be configured as write-through, committing all changes to the CacheStore before returning (and in the same transaction) or as write-behind. A write-behind configuration is normally not encouraged in storage engines, as a failure of the node implies some data might be lost without receiving any notification about it, but this problem is mitigated in Infinispan because of its capability to combine CacheStore write-behind with a synchronous replication to other Infinispan nodes.


In this example we enabled both eviction and a CacheStore (the loader element). LIRS is one of the choices we have for eviction strategies. Here it is configured to keep (approximately) 2000 entries in live memory and evict the remaining as a memory usage control strategy.

The CacheStore is enabling passivation, which means that the entries which are evicted are stored on the filesystem.

Warning

You could configure an eviction strategy while not configuring a passivating CacheStore! That is a valid configuration for Infinispan but will have the evictor permanently remove entries. Hibernate OGM will break in such a configuration.

Tip

Currently with Infinispan 5.1, the FileCacheStore is neither very fast nor very efficient: we picked it for ease of setup. For a production system it’s worth looking at the large collection of high performance and cloud friendly cachestores provided by the Infinispan distribution.

The best thing about Infinispan is that all nodes are treated equally and it requires almost no beforehand capacity planning: to add more nodes to the cluster you just have to start new JVMs, on the same or different physical server, having your same Infinispan configuration and your same application.

Infinispan supports several clustering cache modes; each mode provides the same API and functionality but with different performance, scalability and availability options:

To use the replication or distribution cache modes Infinispan will use JGroups to discover and connect to the other nodes.

In the default configuration, JGroups will attempt to autodetect peer nodes using a multicast socket; this works out of the box in the most network environments but will require some extra configuration in cloud environments (which often block multicast packets) or in case of strict firewalls. See the JGroups reference documentation, specifically look for Discovery Protocols to customize the detection of peer nodes.

Nowadays, the JVM defaults to use IPv6 network stack; this will work fine with JGroups, but only if you configured IPv6 correctly. It is often useful to force the JVM to use IPv4.

It is also useful to let JGroups know which networking interface you want to use; especially if you have multiple interfaces it might not guess correctly.


Note

You don’t need to use IPv4: JGroups is compatible with IPv6 provided you have routing properly configured and valid addresses assigned.

The jgroups.bind_addr needs to match a placeholder name in your JGroups configuration in case you don’t use the default one.

The default configuration uses distribution as cache mode and uses the jgroups-tcp.xml configuration for JGroups, which is contained in the Infinispan jar as the default configuration for Infinispan users. Let’s see how to reconfigure this:


In the example above we specify a custom JGroups configuration file and set the cache mode for the default cache to distribution; this is going to be inherited by the ENTITIES and the ASSOCIATIONS caches. But for IDENTIFIERS we have chosen (for the sake of this example) to use replication.

Now that you have clustering configured, start the service on multiple nodes. Each node will need the same configuration and jars.

Tip

We have just shown how to override the clustering mode and the networking stack for the sake of completeness, but you don’t have to!

Start with the default configuration and see if that fits you. You can fine tune these setting when you are closer to going in production.

Hibernate Search, which can be used for advanced query capabilities (see Chapter 7, Query your entities), needs some place to store the indexes for its embedded Apache Lucene engine.

A common place to store these indexes is the filesystem which is the default for Hibernate Search; however if your goal is to scale your NoSQL engine on multiple nodes you need to share this index. Network sharing filesystems are a possibility but we don’t recommended that. Often the best option is to store the index in whatever NoSQL database you are using (or a different dedicated one).

Tip

You might find this section useful even if you don’t intend to store your data in Infinispan.

The Infinispan project provides an adaptor to plug into Apache Lucene, so that it writes the indexes in Infinispan and searches data in it. Since Infinispan can be used as an application cache to other NoSQL storage engines by using a CacheStore (see Section 5.1.2, “Manage data size”) you can use this adaptor to store the Lucene indexes in any NoSQL store supported by Infinispan:

  • Cassandra
  • Filesystem (but locked correctly at the Infinispan level)
  • MongoDB
  • HBase
  • JDBC databases
  • JDBM
  • BDBJE
  • A secondary (independent) Infinispan grid
  • Any Cloud storage service supported by JClouds

How to configure it? Here is a simple cheat sheet to get you started with this type of setup:

  • Add org.hibernate:hibernate-search-infinispan:4.5.0.CR1 to your dependencies
  • set these configuration properties:

    • hibernate.search.default.directory_provider = infinispan
    • hibernate.search.default.exclusive_index_use = false
    • hibernate.search.infinispan.configuration_resourcename = [infinispan configuration filename]

The referenced Infinispan configuration should define a CacheStore to load/store the index in the NoSQL engine of choice. It should also define three cache names:


This configuration is not going to scale well on write operations: to do that you should read about the master/slave and sharding options in Hibernate Search. The complete explanation and configuration options can be found in the Hibernate Search Reference Guide

Some NoSQL support storage of Lucene indexes directly, in which case you might skip the Infinispan Lucene integration by implementing a custom DirectoryProvider for Hibernate Search. You’re very welcome to share the code and have it merged in Hibernate Search for others to use, inspect, improve and maintain.

When combined with Hibernate ORM, Ehcache is commonly used as a 2nd level cache, so caching data which is stored in a relational database. When used with Hibernate OGM it is not "just a cache" but is the main storage engine for your data.

This is not the reference manual for Ehcache itself: we’re going to list only how Hibernate OGM should be configured to use Ehcache; for all the tuning and advanced options please refer to the Ehcache Documentation.

Two steps:

MongoDB is a document oriented datastore written in C++ with strong emphasis on ease of use.

This implementation is based upon the MongoDB Java driver. The currently supported version is 2.10.1.

The following properties are available to configure MongoDB support:

MongoDB datastore configuration properties

hibernate.ogm.datastore.provider
To use MongoDB as a datastore provider, this property must be set to mongodb
hibernate.ogm.option.configurator
The fully-qualified class name or an instance of a programmatic option configurator (see Section 5.3.1.2, “Programmatic configuration”)
hibernate.ogm.datastore.host
The hostname of the MongoDB instance. The default value is 127.0.0.1.
hibernate.ogm.datastore.port
The port used by the MongoDB instance. The default value is 27017.
hibernate.ogm.datastore.database
The database to connect to. This property has no default value.
hibernate.ogm.datastore.username
The username used when connecting to the MongoDB server. This property has no default value.
hibernate.ogm.datastore.password
The password used to connect to the MongoDB server. This property has no default value. This property is ignored if the username isn’t specified.
hibernate.ogm.mongodb.connection_timeout
Defines the timeout used by the driver when the connection to the MongoDB instance is initiated. This configuration is expressed in milliseconds. The default value is 5000.
hibernate.ogm.datastore.document.association_storage
Defines the way OGM stores association information in MongoDB. The following two strategies exist (values of the org.hibernate.ogm.datastore.document.options.AssociationStorageType enum): IN_ENTITY (store association information within the entity) and ASSOCIATION_DOCUMENT (store association information in a dedicated document per association). IN_ENTITY is the default and recommended option unless the association navigation data is much bigger than the core of the document and leads to performance degradation.
hibernate.ogm.mongodb.association_document_storage

Defines how to store assocation documents (applies only if the ASSOCIATION_DOCUMENT association storage strategy is used). Possible strategies are (values of the org.hibernate.ogm.datastore.mongodb.options.AssociationDocumentType enum):

  • GLOBAL_COLLECTION (default): stores the association information in a unique MongoDB collection for all associations
  • COLLECTION_PER_ASSOCIATION stores the association in a dedicated MongoDB collection per association
hibernate.ogm.mongodb.write_concern
Defines the write concern setting to be applied when issuing writes against the MongoDB datastore. Possible settings are (values of the com.mongodb.WriteConcern enum): ERRORS_IGNORED, ACKNOWLEDGED, UNACKNOWLEDGED, FSYNCED, JOURNALED, NONE, NORMAL, SAFE, MAJORITY, FSYNC_SAFE, JOURNAL_SAFE, REPLICAS_SAFE. For more information, please refer to the official documentation. This option is case insensitive and the default value is ACKNOWLEDGED.

Note

When bootstrapping a session factory or entity manager factory programmatically, you should use the constants accessible via MongoDBProperties when specifying the configuration properties listed above. Common properties shared between (document) stores are declared on OgmProperties and DocumentStoreProperties, respectively. To ease migration between stores, it is recommended to reference these constants directly from there.

In addition to the annotation mechanism, Hibernate OGM also provides a programmatic API for applying store-specific configuration options. This can be useful if you can’t modify certain entity types or don’t want to add store-specific configuration annotations to them. The API allows set options in a type-safe fashion on the global, entity and property levels.

When working with MongoDB, you can currently configure the following options using the API:

To set these options via the API, you need to create an OptionConfigurator implementation as shown in the following example:


The call to configureOptionsFor(), passing the store-specific identifier type MongoDB, provides the entry point into the API. Following the fluent API pattern, you then can configure global options and navigate to single entities or properties to apply options specific to these.

Options given on the property level precede entity-level options. So e.g. the animals association of the Zoo class would be stored using the in-entity strategy, while all other associations of the Zoo entity would be stored using separate association documents.

Similarly, entity-level options take precedence over options given on the global level. Global-level options specified via the API complement the settings given via configuration properties. In case a setting is given via a configuration property and the API at the same time, the latter takes precedence.

Note that for a given level (property, entity, global), an option set via annotations is overridden by the same option set programmatically. This allows you to change settings in a more flexible way if required.

To register an option configurator, specify its class name using the hibernate.ogm.option.configurator property. When bootstrapping a session factory or entity manager factory programmatically, you also can pass in an OptionConfigurator instance or the class object representing the configurator type.

Hibernate OGM tries to make the mapping to the underlying datastore as natural as possible so that third party applications not using Hibernate OGM can still read and update the same datastore. We worked particularly hard on the MongoDB model to offer various classic mappings between your object model and the MongoDB documents.

Entities are stored as MongoDB documents and not as BLOBs which means each entity property will be translated into a document field. You can use the name property of the @Table and @Column annotations to rename the collections and the document’s fields if you need to.

Note that embedded objects are mapped as nested documents.


Hibernate OGM MongoDB proposes three strategies to store navigation information for associations. To switch between these strategies, either use the @AssocationStorage and @AssociationDocumentStorage annotations (see Section 5.3.1.1, “Annotation based configuration”), the API for programmatic configuration (see Section 5.3.1.2, “Programmatic configuration”) or specify a default strategy via the hibernate.ogm.datastore.document.association_storage and hibernate.ogm.mongodb.association_document_storage configuration properties.

The three possible strategies are:

  • IN_ENTITY (default)
  • ASSOCIATION_DOCUMENT, using a global collection for all associations
  • ASSOCIATION_DOCUMENT, using a dedicated collection for each association

You can express queries in a few different ways:

Hibernate OGM supports native queries for MongoDB with some limitations:

If your use case meets these restrictions you can execute a native query like in the following example:



Note

The method in Session#createSQLQuery(…) might look misleading since we are not running a SQL query but the Session API was initially thought for relational databases and we decided it was simpler to reuse the same method than invent something new that could increase the confusion.

The result of the query is a managed entity or a list of managed entities. Just like you would get from a JP-QL query.

Native queries can also be created using the @NamedNativeQuery annotation:


Hibernate OGM stores data in a natural way so you can still execute queries using the MongoDB driver, the main drawback is that the results are going to be raw MongoDB documents and not managed entities.

Neo4j is a robust (fully ACID) transactional property graph database. This kind of databases are suited for those type of problems that can be represented with a graph like social relationships or road maps for example.

At the moment only the support for the embedded Neo4j is included in OGM.

This is our first version and a bit experimental. In particular we plan on using node navigation much more than index lookup in a future version.

CouchDB is a document-oriented datastore which stores your data in form of JSON documents and exposes its API via HTTP based on REST principles. It is thus very easy to access from a wide range of languages and applications.

Note

Support for CouchDB is considered an EXPERIMENTAL feature as of this release. In particular you should be prepared for possible changes to the persistent representation of mapped objects in future releases. Should you find any bugs or have feature requests for this dialect, then please open a ticket in the OGM issue tracker.

Hibernate OGM uses the excellent RESTEasy library to talk to CouchDB stores, so there is no need to include any of the Java client libraries for CouchDB in your classpath.

The following properties are available to configure CouchDB support in Hibernate OGM:

CouchDB datastore configuration properties

hibernate.ogm.datastore.provider
To use CouchDB as a datastore provider, this property must be set to couchdb
hibernate.ogm.option.configurator
The fully-qualified class name or an instance of a programmatic option configurator (see Section 5.5.1.2, “Programmatic configuration”)
hibernate.ogm.datastore.host
The hostname of the CouchDB instance. The default value is 127.0.0.1.
hibernate.ogm.datastore.port
The port used by the CouchDB instance. The default value is 5984.
hibernate.ogm.datastore.database
The database to connect to. This property has no default value.
hibernate.ogm.datastore.create_database
Whether to create the specified database in case it does not exist or not. Can be true or false (default). Note that the specified user must have the right to create databases if set to true.
hibernate.ogm.datastore.username
The username used when connecting to the CouchDB server. Note that this user must have the right to create design documents in the chosen database. This property has no default value. Hibernate OGM currently does not support accessing CouchDB via HTTPS; if you’re interested in such functionality, let us know.
hibernate.ogm.datastore.password
The password used to connect to the CouchDB server. This property has no default value. This property is ignored if the username isn’t specified.
hibernate.ogm.datastore.document.association_storage
Defines the way OGM stores association information in CouchDB. The following two strategies exist (values of the org.hibernate.ogm.datastore.document.options.AssociationStorageType enum): IN_ENTITY (store association information within the entity) and ASSOCIATION_DOCUMENT (store association information in a dedicated document per association). IN_ENTITY is the default and recommended option unless the association navigation data is much bigger than the core of the document and leads to performance degradation.

Note

When bootstrapping a session factory or entity manager factory programmatically, you should use the constants accessible via CouchDBProperties when specifying the configuration properties listed above. Common properties shared between (document) stores are declared on OgmProperties and DocumentStoreProperties, respectively. To ease migration between stores, it is recommended to reference these constants directly from there.

In addition to the annotation mechanism, Hibernate OGM also provides a programmatic API for applying store-specific configuration options. This can be useful if you can’t modify certain entity types or don’t want to add store-specific configuration annotations to them. The API allows set options in a type-safe fashion on the global, entity and property levels.

When working with CouchDB, you can currently configure the following options using the API:

To set this option via the API, you need to create an OptionConfigurator implementation as shown in the following example:


The call to configureOptionsFor(), passing the store-specific identifier type CouchDB, provides the entry point into the API. Following the fluent API pattern, you then can configure global options and navigate to single entities or properties to apply options specific to these.

Options given on the property level precede entity-level options. So e.g. the visitors association of the Zoo class would be stored using the in-entity strategy, while all other associations of the Zoo entity would be stored using separate association documents.

Similarly, entity-level options take precedence over options given on the global level. Global-level options specified via the API complement the settings given via configuration properties. In case a setting is given via a configuration property and the API at the same time, the latter takes precedence.

Note that for a given level (property, entity, global), an option set via annotations is overridden by the same option set programmatically. This allows you to change settings in a more flexible way if required.

To register an option configurator, specify its class name using the hibernate.ogm.option.configurator property. When bootstrapping a session factory or entity manager factory programmatically, you also can pass in an OptionConfigurator instance or the class object representing the configurator type.

Hibernate OGM tries to make the mapping to the underlying datastore as natural as possible so that third party applications not using Hibernate OGM can still read and update the same datastore. The following describe how entities and associations are mapped to CouchDB documents by Hibernate OGM.

Entities are stored as CouchDB documents and not as BLOBs which means each entity property will be translated into a document field. You can use the name property of the @Table and @Column annotations to rename the collections and the document’s fields if you need to.

CouchDB provides a built-in mechanism for detecting concurrent updates to one and the same document. For that purpose each document has an attribute named _rev (for "revision") which is to be passed back to the store when doing an update. So when writing back a document and the document’s revision has been altered by another writer in parallel, CouchDB will raise an optimistic locking error (you could then e.g. re-read the current document version and try another update).

For this mechanism to work, you need to declare a property for the _rev attribute in all your entity types and mark it with the @Version and @Generated annotations. The first marks it as a property used for optimistic locking, while the latter advices Hibernate OGM to refresh that property after writes since its value is managed by the datastore.

The following shows an example of an entity and its persistent representation in CouchDB.


Note that CouchDB doesn’t have a concept of "tables" or "collections" as e.g. MongoDB does; Instead all documents are stored in one large bucket. Thus Hibernate OGM needs to add two additional attributes: $type which contains the type of a document (entity vs. association documents) and $table which specifies the entity name as derived from the type or given via the @Table annotation.

Note

Attributes whose name starts with the "$" character are managed by Hibernate OGM and thus should not be modified manually. Also it is not recommended to start the names of your attributes with the "$" character to avoid collisions with attributes possibly introduced by Hibernate OGM in future releases.

Embedded objects are mapped as nested documents. The following listing shows an example:


Hibernate OGM CouchDB provides two strategies to store navigation information for associations. To switch between these strategies, either use the @AssocationStorage annotation (see Section 5.5.1.1, “Annotation based configuration”), the API for programmatic configuration (see Section 5.5.1.2, “Programmatic configuration”) or specify a global default strategy via the hibernate.ogm.datastore.document.association_storage configuration property.

The possible strategies are IN_ENTITY (default) and ASSOCIATION_DOCUMENT.