|
HyperGraphDB is a general purpose, extensible, portable, distributed, embeddable, open-source
data storage mechanism. It is a graph database designed specifically for artificial intelligence
and semantic web projects, it can also be used as an embedded object-oriented
database for projects of all sizes.
The system is reliable and in production use is
several projects, including a search engine and our own Seco scripting IDE
where most of the runtime environment is automatically saved as a hypergraph.
HyperGraphDB is primarily what its carefully chosen name implies: a database for storing hypergraphs.
While it falls into the general family of graph databases, it is hard to categorize HyperGraphDB
as yet another database because much of its design evolves
around providing the means to manage structure-rich information with arbitrary
layers of complexity. For instance, a relational as well as an object-oriented style
of data management can be emulated. As a graph database, HyperGraphDB doesn't impose any
constraints and offers much more generality than all other graph databases we've come across.
The design is minimalistic at its core and
the end-goal is to evolve a set of concepts and practices, combining structure and
interpretation in such a way as to allow future software to meet the complexities
of the real-world better that now.
Key Facts
- The mathematical
definition of a hypergraph is an extension to the standard graph
concept that allows an edge to point to more than two nodes. HyperGraphDB extends this even
further by allowing edges to point to other edges as well and making every node or edge carry
an arbitrary value as payload.
- The original requirements that triggered the development of the system came
from the OpenCog project which is attempt
at building an AGI (Artificial General Intelligence) system based on self-modifying probabilistic
hypergraphs.
- The basic unit of storage in HyperGraphDB is called an atom. Each atom is typed, has
an arbitrary value and can point to zero or more other atoms.
- Data types are managed by a general, extensible type system embedded itself as a hypergraph
structure. Types are themselves atoms as everybody else, but with a particular role (well, as everybody
else too).
- The storage scheme is platform independent and can thus be accessed by any programming
language from any platform. Low-level storage is currently based on BerkeleyDB from
Sleepycat Software.
- Size limitations are virtually non-existent. There is no software limit on the size of the
graph managed by a HyperGraphDB instance. Each individual value's size is limited by the underlying
storage, i.e. by BerkeleyDB's 2GB limit. However, the architecture allows bypassing BerkeleyDB
for particular types of atoms if one so desires.
- The current implementation is solely Java based. It offers an automatic mapping of idiomatic
Java types to a HyperGraphDB data schema which makes HyperGraphDB into an object-oriented database
suitable for regular business applications.A C++ implementation has been frequently
contemplated, but never initiated due to lack of manpower. Note that the storage scheme being open and
precisely specified, all languages and platforms are able to share the same data.
- Embedded in-process: the database comes in the form of a software library to be used directly
through its API.
- A P2P framework for distributed processing has been implemented for replication/data partitioning
algorithms as well as client-server style computing.
Possible Usage Scenarios
In a server-side Java application, the standard setup relies on a RDBMs together with a set of business
components and a presentation tier. If you've kept up with the latest industry advances, you have a good
O/R mapping tool such as Hibernate to transparently and non-intrusively convert your object structure to/from
database tables. Recently, there has been a noticeable trend to replace RDBMs, especially for smaller applications
by embedded in-memory databases with less sophisticated, but typically much faster querying capabilities.
In a desktop Java application, programmers frequently rely on a large set of configuration files to store
user preferences and other persistent application state. A large amount of time is devoted to the management of
configuration data and frequently end-users are not allowed to configure simple application behavior simply because
programmers don't have the time to make "everything" configurable and need to selectively predict the most important
parameters of potential interest to users. With HyperGraphDB, all beans that have to do with configuration can simply
be added as atoms and they will be managed from there on.
Bioinformatics projects form a category of fairly complex software that not only can benefit form a data
management piece like HyperGraphDB, but also constitute a very natural fit for it. Frequently, such projects need to manage
highly complex descriptive information based on structured taxonomies (or ontologies), together with large sets of
experimental data. In addition, sophisticated algorithms operate on both experimental and ontological data in order
to infer interaction networks at various level of biological organization. HyperGraphDB is designed to facilitate
all those activities.
Semantic Web projects are an obvious domain of application of HyperGraphDB. The so called "conceptual graphs" or RDF
graphs and even the more advanced modeling practices utilizing higher-order relationships have a straightforward and natural
expression within the HyperGraphDB framework.
Networks research can benefit from the capacity of HyperGraphDB to store very large, distributed graphs and
have pattern mining, computationally intensive algorithms operate on them.
YourKit is kindly supporting open source projects with its full-featured Java Profiler.
YourKit, LLC is creator of innovative and intelligent tools for profiling
Java and .NET applications. Take a look at YourKit's leading software products:
YourKit Java Profiler and YourKit .NET Profiler.
|