Discussion:
Conceptual questions on key-value databases for RDBMs users
(too old to reply)
unknown
2010-11-02 20:14:42 UTC
Permalink
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...


Hi all!

I have used SQL RDBMSs for some time. I've never used any very advanced
feature, but I know enough of it to make database applications.

Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. So I decided to pick up some Erlang and Mnesia, its
native key-value database. More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.

But all pros have cons as companion. The lack of a relationship model
is difficult for who is used to RDBMSs. So, my question is:

* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?

* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?

Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)

Thank you very much!
--
Silas Silva
unknown
2010-11-02 20:34:46 UTC
Permalink
qlc is an erlang query interface you can use with mnesia to do things like
joins.

http://www.erlang.org/doc/man/qlc.html

Joe Armstrong's book explains qlc with equivalent SQL statements.

http://www.pragprog.com/titles/jaerlang/programming-erlang

Kenny
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. I've never used any very advanced
feature, but I know enough of it to make database applications.
Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. So I decided to pick up some Erlang and Mnesia, its
native key-value database. More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.
But all pros have cons as companion. The lack of a relationship model
* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
Thank you very much!
--
Silas Silva
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
unknown
2010-11-02 20:57:36 UTC
Permalink
Hi there.
Post by unknown
qlc is an erlang query interface you can use with mnesia to do things like
joins.
http://www.erlang.org/doc/man/qlc.html
Joe Armstrong's book explains qlc with equivalent SQL statements.
http://www.pragprog.com/titles/jaerlang/programming-erlang
Well, I've read first chapters of Armstrong's book and also the chapter
about Mnesia. Some could use relationship with Mnesia, but it would
rather have to do some job that a RDBMS do behind the scenes, like
setting a secondary index (which could be done adding another "table" in
Mnesia), right?

Anyway, in Mnesia applications, is there any case wher denormalization
is recommended rather than using qlc's joins-like?

Thanks.
--
Silas Silva
unknown
2010-11-02 21:52:06 UTC
Permalink
I have used SQL RDBMSs for some time. ?I've never used any very advanced
feature, but I know enough of it to make database applications.
Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. ?So I decided to pick up some Erlang and Mnesia, its
native key-value database. ?More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.
I'll rant. Beware. SQL is a declarative language, like QLC. It is
quite domain-specific, but it has one distinct advantage: You can
query-optimize on it. Most databases that use SQL relies on a heavily
relational model in which you idealize normalized data. The advantage
is that this support ad-hoc queries very well. If you can dream up the
appropriate SQL-query, you have the answer, though it may take some
time before it comes to you, depending on the power of optimization
and the complexity of the query.

Enter the web. The basic premise is that we do not want users to do
ad-hoc queries on data. Partly due to security and partly to
(inadvertently) denying other user service. So many systems backed by
RDBMS systems artificially lock down the allowed queries to a few
blessed. Now enter Google. Google has one main thing they need to
serve, which is inverted indexes for words. This is a very specific
problem with an interesting property: you can shard the keyspace of
words into multiple machines for good distribution and parallelism.
The limitation is that you just locked down ad-hoc query to specific
queries of specific data, but you can get really good query speed on
those. You also got the sharding capability and the system is not hard
to implement. Data mining can be achieved by batch-runs of map/reduce
over all data. It is in some sense slow, but if your query fits the
M/R scheme, it parallelizes easily. In a certain sense, the M/R gave
you some of the ad-hoc query capabilities back. The final key concept
is that triggers easily work on sharding models. Upon insertion, you
run hooks which can in turn update specific query indexes, drive
full-text-search engines and so on.

Now, most web services out there have modest data storage needs. The
amount of services using the RDBMS as a glorified file system is
abundant and pervasive to an extent which feels pervertedly sick. Such
systems never really had the need for all the niceties of an SQL
system and their queries are simple. Also, the mapping from SQL into
the language of choice is not easy, especially if said language is
object oriented.

Next, the war drums begin to play. The CAP theorem is proven and this
changes the game. Now whenever you do a database system, there is a
tradeoff which must be made. Much misunderstanding of the CAP theorem
is out there. We don't care about he halting theorem that much either,
even though it is very real and there. But it does pose some
limitations to what one can achieve.

Personally, I think the K/V stores grew out of the Google train.
Google did them because no database system was ready for that kind of
scalability. But that should not be taken as if you can never make
SQL, QLC or the like support sharded queries. We have more stores of
the "NoSQL" kind popping up every month which is great for the
innovation. But time will show which of these will actually fly and
which will thud to the ground.
--
J.
unknown
2010-11-03 07:00:50 UTC
Permalink
Post by unknown
* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?
Not really, although the Mnesia User's Guide does try to use some
RDBMS terminology.

http://www.erlang.org/doc/apps/mnesia/Mnesia_chap2.html#id58308
Post by unknown
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
The thing to keep in mind is that mnesia was not designed to provide
a better way to handle relational data sets. It was originally designed
to be a real-time, in-memory database for telecom apps, and most of
the data there is well suited to simple primary-key access and few
enough relations that they can be hard-coded.

The actual updating of data is perhaps not so terribly different, but
mnesia will not do much checking for you, so you have to keep track
of relations and dependent updates in the code. If this gets to be a
great burden in your application, it is probably a sign that you need
an RDBMS after all. They really excel in certain application domains.
Post by unknown
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
QLC offers some support for basic joins. You need to be a bit more careful
about how you word the queries than with SQL, since QLC only does some
rudimentary optimizations.

BR,
Ulf W

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com
unknown
2010-11-03 08:30:49 UTC
Permalink
Post by unknown
* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?
The CouchDB book is very good for making the mental transition from SQL to
NoSQL in general..

http://guide.couchdb.org/draft/index.html

I think a lot of the data organisation concepts described therein are
transferable in principle to Mnesia (JSON --> Records, JavaScript -->
Erlang).
Post by unknown
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't?
Ahh, yes, the relationship question! There is no one-size fits all answer
to this. There are various NoSQL approaches to relationship modeling with
different levels of denormalisation and having different advantages and
disadvantages. You might find this article helpful...

http://wiki.apache.org/couchdb/EntityRelationship

Again. I believe in principle, the techniques described are transferable
and actively used by Mnesia developers (I'm not an Mnesia user myself.)
Post by unknown
How do you update denormalized data?
This normally involves a trade-off between having a separate record that
represents related data and updating that once, or completely
denormalising and updating all related records (see wiki link above.) I
usually only denormalise data for which I'm unlikely to need to update
several copies of.

For example, for a receipt, I'd have an embedded copy of a customer
record. When I update the customer record, I won't update related receipts
coz I don't DON'T want the receipts to be updated anyway since a receipt
should be an exact unmodified snapshot of history.

In contrast, for an order, I'd have a embedded only a reference to a
customer record, because if someone changes the customers address, I'd
want all related orders to reflect that without needing to map through and
change them (though for some other cases I might do just that.)4

It's very case-by-case. Other factors like replication and query
performance come in to play too. From my experience, it's something that
you kinda figure out as you go changing your mind here and there.

- Edmond -
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. I've never used any very advanced
feature, but I know enough of it to make database applications.
Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. So I decided to pick up some Erlang and Mnesia, its
native key-value database. More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.
But all pros have cons as companion. The lack of a relationship model
* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
Thank you very much!
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-04 22:01:30 UTC
Permalink
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. ?I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. ?I've never used any very advanced
feature, but I know enough of it to make database applications.
Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. ?So I decided to pick up some Erlang and Mnesia, its
native key-value database. ?More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.
But all pros have cons as companion. ?The lack of a relationship model
* Is there any guide, tutorial, book, whatever, that tries to introduce
?NoSQL databases to SQL users?
* Key-value databases are surprising simple. ?I know you solve
?relationship by denormalizing data. ?What data should be normalized?
?What shouldn't? ?How do you update denormalized data?
I'm no database expert so don't quote me here ...

As far as I am concerned traditional databases like SQL suffer
from the fact that the data stored in an individual column is incredible
simple - I can store an integer/string/... in a column but these are
incredibly simple data structures. I want to store and retrieve incredibly
complicated things - how do I store an XML parse tree in a single cell
of a database? - How do I store a database in a database ...

In a decent K-V database the value of the key can be *anything* - an
entire database for example, a compiler, ... no worries

Then when I analyse my problem I start thinking "I can store and retrieve any
complex object that keys do I need to solve my problem?"

I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.

It seems to my that SQL provides you with the ability do to complex
queries on simple things. K-V dbs can do simple queries on complex
things.

/Joe
Post by unknown
Sorry for such simple and general questions. ?Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. ?:-)
Thank you very much!
--
Silas Silva
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
unknown
2010-11-04 22:11:39 UTC
Permalink
Hi Joe, All
I'm far from being a database expert too, so feel free to correct me , but
i believe the assumption of storing simple thing is not a general rule and
should not be associated with SQL. For example, postgresql provides more
complex data types and even user defined ones.
Post by unknown
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. I've never used any very advanced
feature, but I know enough of it to make database applications.
Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. So I decided to pick up some Erlang and Mnesia, its
native key-value database. More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.
But all pros have cons as companion. The lack of a relationship model
* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
I'm no database expert so don't quote me here ...
As far as I am concerned traditional databases like SQL suffer
from the fact that the data stored in an individual column is incredible
simple - I can store an integer/string/... in a column but these are
incredibly simple data structures. I want to store and retrieve incredibly
complicated things - how do I store an XML parse tree in a single cell
of a database? - How do I store a database in a database ...
In a decent K-V database the value of the key can be *anything* - an
entire database for example, a compiler, ... no worries
Then when I analyse my problem I start thinking "I can store and retrieve any
complex object that keys do I need to solve my problem?"
I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.
It seems to my that SQL provides you with the ability do to complex
queries on simple things. K-V dbs can do simple queries on complex
things.
/Joe
Post by unknown
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
Thank you very much!
--
Silas Silva
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
--
Best Regards,
- Ahmed Omar
http://nl.linkedin.com/in/adiaa
Follow me on twitter
@spawn_think <http://twitter.com/#!/spawn_think>
unknown
2010-11-04 22:57:22 UTC
Permalink
Post by unknown
Hi Joe, All
I'm far from being a database expert too, so feel free to correct me , but
i believe the assumption of storing simple thing is not a general rule and
should not be associated with SQL. For example, postgresql provides more
complex data types and even user defined ones.
As Joe must be aware, it is a tenet of the relational model that column
values be 'atomic' in the sense that they do not encode structure that
would be more profitably decomposed into relations.

--Toby
Post by unknown
Post by unknown
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. ...
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
I'm no database expert so don't quote me here ...
As far as I am concerned traditional databases like SQL suffer
from the fact that the data stored in an individual column is incredible
simple - I can store an integer/string/... in a column but these are
incredibly simple data structures. I want to store and retrieve incredibly
complicated things - how do I store an XML parse tree in a single cell
of a database? - How do I store a database in a database ...
...
It seems to my that SQL provides you with the ability do to complex
queries on simple things. K-V dbs can do simple queries on complex
things.
/Joe
Post by unknown
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
Thank you very much!
--
Silas Silva
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
unknown
2010-11-08 14:00:39 UTC
Permalink
Hello Ahmed, Toby...
it is a tenet of the relational model that column values
be 'atomic' in the sense that they do not encode structure that
would be more profitably decomposed into relations.
This is very well put. The "SQL-way" of modeling complex data is through
multiple tables and relationships/joins. Other techniques are exceptional.
Post by unknown
i believe the assumption of storing simple thing is not a general rule and
should not be associated with SQL. For example, postgresql provides more
complex data types and even user defined ones.
I'd hardly describe Postgres, a database engine that features things like
table-inheritance, as a typical SQL-RDBMS. In many ways, Posgres has
always been cutting edge and distinctly atypical in it's field. But you're
right, using orders->items as an example, with Postgres, you can store the
items in a column of an orders table. If you then want to query all orders
that contain X-Box, you could then write something similar to...

SELECT Orders.* FROM Orders WHERE (Orders.Item).Desc = "X-Box";

This would be somewhat similar to how a NoSQL/kv-db would look at it
(though I'm not sure how you would go about tallying the item totals for
each order, but knowing Posgres, there must be a way.) However, this is
NOT typically how an SQL database would be organised, even with Postgres
that has that capability. SQL people tend to think in terms of joins and
relationships. Instead, you'd have two tables, one with orders and another
with items, then a similar (but not exactly the same) query would look
something like...

SELECT Orders.*, Items.* FROM Orders, Items INNER JOIN Orders ON
(Orders.Id = Items.OrderId) WHERE Items.Desc = "X-Box";

Complex data in the SQL world is normally modeled using
relationships/joins. IMO, this is both SQL-RDBMS biggest strength and
biggest weakness. Strength -- you can define "rules" for what's valid and
invalid in your database. Weakness -- when the complex data you're trying
to model is wide and deep, the database gets insanely complex, very
quickly, resulting in a lot of head-scratching when querying.

Contrast with the NoSQL approach. Strength -- modeling wide and deep
complex data is much easier and tends to map more directly to the
application's idea of the data. Weakness -- auto-guarding against bad data
is more difficult so you must trust the programmers.

- Edmond -

On Fri, 05 Nov 2010 09:57:22 +1100, Toby Thain <toby>
Post by unknown
Hi Joe, All
I'm far from being a database expert too, so feel free to correct me ,
but
i believe the assumption of storing simple thing is not a general rule and
should not be associated with SQL. For example, postgresql provides more
complex data types and even user defined ones.
As Joe must be aware, it is a tenet of the relational model that column
values be 'atomic' in the sense that they do not encode structure that
would be more profitably decomposed into relations.
--Toby
Post by unknown
Post by unknown
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. ...
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
I'm no database expert so don't quote me here ...
As far as I am concerned traditional databases like SQL suffer
from the fact that the data stored in an individual column is incredible
simple - I can store an integer/string/... in a column but these are
incredibly simple data structures. I want to store and retrieve incredibly
complicated things - how do I store an XML parse tree in a single cell
of a database? - How do I store a database in a database ...
...
It seems to my that SQL provides you with the ability do to complex
queries on simple things. K-V dbs can do simple queries on complex
things.
/Joe
Post by unknown
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
Thank you very much!
--
Silas Silva
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-04 23:26:31 UTC
Permalink
Post by unknown
I'm no database expert so don't quote me here ...
As far as I am concerned traditional databases like SQL suffer
from the fact that the data stored in an individual column is incredible
simple - I can store an integer/string/... in a column but these are
incredibly simple data structures.
See Date & Darwen "The Third Manifesto".

They are nearly as incandescently cross about SQL as E. F. Codd was.
Indeed, one of their criteria for a good relational data base is
"not SQL".

The *relational model* makes sense with arbitrarily complex data
values in attributes. (Date & Darwen explain how this statement
is compatible with 1NF.)
Post by unknown
I want to store and retrieve incredibly
complicated things - how do I store an XML parse tree in a single cell
of a database? - How do I store a database in a database ...
There was a lot of work done on "nested relational databases" at
RMIT. At least three (Atlas, Titan, Titan+) were developed. I
have no idea what happened to them. For that matter, I don't know
what Alan Kent is up to these days. One of the systems was used
to hold and efficiently search large amounts of SGML data.

I note that http://xml.apache.org/xindice/ describes
"Apache Xindice ... a database designed from the ground up to store XML
data". So SQL and K-V are not the only alternatives around.
(I'm *sure* I remember another free XML database...)
unknown
2010-11-05 14:23:40 UTC
Permalink
Great comment Joe, I couldn't help taking the liberty to quote you on twitter.

Best regards,

Bob
Post by unknown
It seems to my that SQL provides you with the ability do to complex
queries on simple things. K-V dbs can do simple queries on complex
things.
/Joe
unknown
2010-11-07 12:09:04 UTC
Permalink
Hi Joe,

I am not sure I agree with your statement that traditional databases "suffer" from the fact that columns have simple types. You can easily (within reason) make them store complex types. They are at the most basic level also KV stores. The most basic abstraction for a "Table" is a set of tuples stored in a certain way. There is no reason why you cannot define a table with an arbitrary key and an arbitrary "blob" as value column and then storing your parse tree in that blob field. (Obviously you need to serialise/deserialise it - as you would need to do for any storage abstraction).
Post by unknown
I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.
I agree with you. If you throw distribution, map-reduce and lots of indices into the equation you can solve most issues this way. Except for one thing - invariance between multiple records and record types. There are very few examples where it is useful to live without invariance - searching being one example.

Invariance is a first order concept in any RDBMS system and essential for many types of systems. However, in my experience, you can achieve application-level invariance in other ways. But it requires substantially more work to approximate first order invariance using KV-stores. It is sometimes simpler to have your data properly normalised I think.

Rudolph van Graan
Post by unknown
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. I've never used any very advanced
feature, but I know enough of it to make database applications.
Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. So I decided to pick up some Erlang and Mnesia, its
native key-value database. More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.
But all pros have cons as companion. The lack of a relationship model
* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
I'm no database expert so don't quote me here ...
As far as I am concerned traditional databases like SQL suffer
from the fact that the data stored in an individual column is incredible
simple - I can store an integer/string/... in a column but these are
incredibly simple data structures. I want to store and retrieve incredibly
complicated things - how do I store an XML parse tree in a single cell
of a database? - How do I store a database in a database ...
In a decent K-V database the value of the key can be *anything* - an
entire database for example, a compiler, ... no worries
Then when I analyse my problem I start thinking "I can store and retrieve any
complex object that keys do I need to solve my problem?"
I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.
It seems to my that SQL provides you with the ability do to complex
queries on simple things. K-V dbs can do simple queries on complex
things.
/Joe
Post by unknown
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
Thank you very much!
--
Silas Silva
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3822 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20101107/08760727/attachment.bin>
unknown
2010-11-07 15:06:03 UTC
Permalink
Invariance is a first order concept in any RDBMS system and essential for many types of systems. However, in my experience, you can achieve application-level invariance in other ways. But it requires substantially more work to approximate first order invariance using KV-stores. ?It is sometimes simpler to have your data properly normalised I think.
"not an expert" caveat applies to me also...

It appears to me that this discussion is another expression of the
'strong vs weak/dynamic vs static type' discussion.

Some observations/connections that seem relevant to me:
- The underlying nature of SQL DBs is revealed by the fact that
queries by anything except the primary key usually need additional
work (sequences/indexes) to be respectably efficient.
- C.J.Date uses the term tuple to refer to a "row in a table" (cf also
Mnesia).
- Date also notes that "objects" can always be mapped directly on to a
relational model.
- Mnesia is interesting in that it encourages a table to be confined
to a single record type, and allows indexing of record tuple members
(cf column indexing in SQL databases).
- The process of normalization suggests very strong typing of the
data.

...it makes me suspect that an imperative and strongly-typed language
paradigm has been a very strong motivator in the evolution of SQL
databases; and perhaps the popularity of NoSQL/NotSQL is an expression/
outcome of the rise of recent trends in programming language uptake.

/s
unknown
2010-11-08 10:16:40 UTC
Permalink
I agree - I'd go further. For most of what I want to do the weaker the
type the better - even dynamic types are too structured. A lot of what
I want can be done with unstructured text and full-text indexing
(which has even less structure that a weak type)

I find that in all the applications I've programmed so far I want the
following (In priority order)

0) Easy of setup
I'm lazy - but I don't want to struggle through hour long sessions of
Googling for bug-fixes to get stuff working.
1) Key - Value lookup/retrieval
What I put in I can get out - the key is simple the Value can be
very complex
2) Full text searches
I want to search strings for free text
3) Encapsulation
I like all the database in a single file - so I can send it in a
message move it around etc.
4) Replication
I want stuff to be stored forever
5) Scalability
I want it to scale


Most apps I can do with 1 and 2 - I really like 3 for distributing the
result of something.
Ets+Dets is pretty ok for 1+3+0. Mnesia for 4. For 5 I guess
riak/couchDB are good
but I haven't felt the need yet.

I've actually used poor-man's replication, this is pure KISS stuff.
Put the database in a dets file
and replicate using dropbox.

I feel that 2) is the big thing that missing at the moment - dets/ets
with in-built full text
indexing would be *wonderful* - I've made a full-text indexing engine
(See https://github.com/joearms/elib1/tree/master/supported/indexer/)
But it's not integrated with ets/dets.

Does anybody know of any pure-erlang add-ons to ets/dets that do
full-text indexing?
the tricky but seems to be word extraction and not
making/sorting/building the index
also building an incremental index might be difficult (I haven't done this).

I have never seen a "one-size-fits-all" database - and I haven't even mentioned
security availability ...

I just don't believe that one paradigm can suit all needs - what I
want is a toolkit of
pluggable components that do single things and do them well - ets and
dets are two such
components - missing components are the full-text-indexer, the
replication manager, ...

A number of components that can be glued together is what I want not a
pre-packaged solution.

Right now what I want is dets/ets with full-text indexing and search ...

/Joe





On Sun, Nov 7, 2010 at 4:06 PM, Steve Davis
Post by unknown
Invariance is a first order concept in any RDBMS system and essential for many types of systems. However, in my experience, you can achieve application-level invariance in other ways. But it requires substantially more work to approximate first order invariance using KV-stores. ?It is sometimes simpler to have your data properly normalised I think.
"not an expert" caveat applies to me also...
It appears to me that this discussion is another expression of the
'strong vs weak/dynamic vs static type' discussion.
- The underlying nature of SQL DBs is revealed by the fact that
queries by anything except the primary key usually need additional
work (sequences/indexes) to be respectably efficient.
- C.J.Date uses the term tuple to refer to a "row in a table" (cf also
Mnesia).
- Date also notes that "objects" can always be mapped directly on to a
relational model.
- Mnesia is interesting in that it encourages a table to be confined
to a single record type, and allows indexing of record tuple members
(cf column indexing in SQL databases).
- The process of normalization suggests very strong typing of the
data.
...it makes me suspect that an imperative and strongly-typed language
paradigm has been a very strong motivator in the evolution of SQL
databases; and perhaps the popularity of NoSQL/NotSQL is an expression/
outcome of the rise of recent trends in programming language uptake.
/s
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
unknown
2010-11-09 05:19:16 UTC
Permalink
Post by unknown
Does anybody know of any pure-erlang add-ons to ets/dets that do
full-text indexing?
http://jungerl.cvs.sourceforge.net/viewvc/jungerl/jungerl/lib/rdbms/src/

The rdbms_wsearch*.erl modules.

They were never actually integrated into RDBMS, as they needed a form
of higher-order indexing level (aggregated index). By the time I had figured
out how to do that, I got side-tracked...

The author of the modules is Hans Nilsson, and they implement the
Porter Stemming algorithm.

BR,
Ulf W

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com
unknown
2010-11-09 01:59:11 UTC
Permalink
Post by unknown
It appears to me that this discussion is another expression of the
'strong vs weak/dynamic vs static type' discussion.
...it makes me suspect that an imperative and strongly-typed language
paradigm has been a very strong motivator in the evolution of SQL
databases; and perhaps the popularity of NoSQL/NotSQL is an expression/
outcome of the rise of recent trends in programming language uptake.
You *cannot* call the types in classic SQL "strong".
Numbers, strings, and byte strings for everything is what Joe is complaining
of and he is right to do so. Encoding something as a string or blob basically
results in the data base itself having no clue about the structure or meaning
of the data.

It is important to understand that SQL is not a good example of a relational
system. A move away from SQL *could* be a move towards relational!

One of the core concepts in relational systems is that of
enforced integrity constraints, via primary keys and foreign keys.
If you don't have any integrity constraints that you care to tell the
data base about, you probably don't need a relational system.
unknown
2010-11-09 09:43:14 UTC
Permalink
Post by unknown
One of the core concepts in relational systems is that of
enforced integrity constraints, via primary keys and foreign keys.
If you don't have any integrity constraints that you care to tell the
data base about, you probably don't need a relational system.
It is important to understand that SQL is not a good example of a relational system.
Yes - SQL is a functional language where you state the solution in terms of relational concepts and ask the RDMBS to solve it and present you with an answer. Some of the Non-SQL systems uses search terms "Name = 'John'" or jscript for the same purpose. QLC is also an example (in the case of mnesia).
Post by unknown
I think a big reason kv-stores are winning over a lot of us long-time RDBMSs users is they allow us to model things-that-have-things-inside-them in the database much closer to how they are modeled in our applications. Orders/receipts/invoices with their items, users with their permissions, all these map nicely in kv-stores from db to application. This allows us to model relationships only when WE REALLY WANT RELATIONSHIPS (this receipt belongs to that invoice). That fact alone won me over and I've never looked back.
However, it is only simple to store things this way (the example of Orders/Receipts/Invoices with items "inside" them), if your only interest in the data is that the "outer" or container object encapsulates the items. Typically you want to read or write the whole thing in one operation. In real life, (and in my experience), you will pretty soon find that someone wants to know how many orders during the last 60 days included item X with quantities larger than say 6.

If your design decision was to store this whole thing (the order and its items) as one big document (my term for it), the only way to retrieve this data is to literally open up every order, filter out the items you want and aggregate them. The only way to make this fast is to avoid reading every single document and processing it over and over. And to do this optimisation, you need an index, or more likely several indices - on dates, on items types, etc. Indices require that you understand what is inside your "document" (in this case line items). By definition, this implies a relationship - orders have among other things - lines. Completely independent of the fact that you are storing the items inside the document/order.

So as a summary from my side - all data has some sort of structure, be it words within documents, or line items within orders. You can represent this any way you want.

In the distant past we wrote all the items on a single piece of paper called an order. It was all on one physical page. The page contained all the information. For the same reason it is difficult to query pieces of paper (you need to either index them or summarise them in another way), in the same way it is difficult to query data with implied relations stored in a single "thing" (blob/object/values).

- It is very difficult to enforce invariance in KV stores
- It is very difficult to index KV stores
- It is hard work to query KV stores.
- It is trivial to read from or write into KV stores
- It is hard to read from or write to database (drivers, SQL, ...)
- RDMBS systems are hard to scale
- KV stores scale easily

Rudolph van Graan
Post by unknown
Post by unknown
It appears to me that this discussion is another expression of the
'strong vs weak/dynamic vs static type' discussion.
...it makes me suspect that an imperative and strongly-typed language
paradigm has been a very strong motivator in the evolution of SQL
databases; and perhaps the popularity of NoSQL/NotSQL is an expression/
outcome of the rise of recent trends in programming language uptake.
You *cannot* call the types in classic SQL "strong".
Numbers, strings, and byte strings for everything is what Joe is complaining
of and he is right to do so. Encoding something as a string or blob basically
results in the data base itself having no clue about the structure or meaning
of the data.
It is important to understand that SQL is not a good example of a relational
system. A move away from SQL *could* be a move towards relational!
such as
One of the core concepts in relational systems is that of
enforced integrity constraints, via primary keys and foreign keys.
If you don't have any integrity constraints that you care to tell the
data base about, you probably don't need a relational system.
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3822 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20101109/c3294e5b/attachment.bin>
unknown
2010-11-09 13:38:37 UTC
Permalink
Hello Rudolph,

Some comments...

On Tue, 09 Nov 2010 20:43:14 +1100, Rudolph van Graan
Post by unknown
One of the core concepts in relational systems is that of
enforced integrity constraints, via primary keys and foreign keys.
If you don't have any integrity constraints that you care to tell the
data base about, you probably don't need a relational system.
This was exactly what I meant with "invariance" earlier. If you have
(strong) requirements for invariance in your model - an RDBMS is almost
the only solution as integrity constraints etc are all first class
concepts. If you don't have a requirement for invariance (as in Edmund's
In many cases (at least for a migrating RDBMS user), invariance is
*usually* indeed very important. And as you pointed out earlier there are
ways of faking this in the emerging class of NoSQL/kv-db, the absence of
which means tougher checks on code that's writing data (the db is unlikely
to do this for you.)

I've found that when one is making the transition (from SQL-RDBMS to
NoSQL/Kv-db), the benefits of easier mapping of db-object to
application-object normally out-weight the loss of things like
constraints. I guess you have to trust more that programmers know what
they are doing and won't write bad data! Some people cringe when hearing
this, but I've found it's as bad as it sounds!

However, I think as this newer class of DB start to attract more and more
RDBMS migrants, they will have to figure out a ways of solving these
issues. Some have already stared (CouchDB has some concept of data
validation.)
Post by unknown
It is important to understand that SQL is not a good example of a relational system.
Yes - SQL is a functional language where you state the solution in terms
of relational concepts and ask the RDMBS to solve it and present you
with an answer. Some of the Non-SQL systems uses search terms "Name =
'John'" or jscript for the same purpose. QLC is also an example (in the
case of mnesia).
I'd go further and say SQL has evolved into a primary API for accessing
todays RDBMSs. You can do much more that just query for data. You can
design/redesign the db, add/remove indicies. Many vendors add elaborate
extensions like Microsoft's Data Shaping for hierarchical data. In many
ways SQL is like the non-standard standard shell language for many
databases.
Post by unknown
I think a big reason kv-stores are winning over a lot of us long-time
RDBMSs users is they allow us to model
things-that-have-things-inside-them in the database much closer to how
they are modeled in our applications. Orders/receipts/invoices with
their items, users with their permissions, all these map nicely in
kv-stores from db to application. This allows us to model relationships
only when WE REALLY WANT RELATIONSHIPS (this receipt belongs to that
invoice). That fact alone won me over and I've never looked back.
However, it is only simple to store things this way (the example of
Orders/Receipts/Invoices with items "inside" them),
I disagree, I store things this way not because it's simple for the
database but because that's how my applications understand those things
(BTW, those were *REAL* examples I was using orders/receipts/invoices from
real code in development. Code previously written against SQL-RDBMSs and
being re-writing against CouchDB. The latter is proving to be a much more
pleasant experience!)

I've seen many SQL-RDBMS developers try to achieve a similar end-result
using object-relational mapping. You know, so when they create an instance
of a user class in say Java, a new user is automatically added the users
table in the db and corresponding entries are made in the permissions
table. I've never agreed with that way of doing things but it *does*
illustrate that there is a disconnect between how developers want to
organise data and how SQL-RDBMSs want them organise data.
if your only interest in the data is that the "outer" or container
object encapsulates the items.
This isn't true. At least not with the class of kv-database I'm referring
to. CouchDB (and I assume Mnesia) allows you to access the 'inner' objects
(to whatever depth) inside your queries. You can very easily access the
items inside an invoice.
Typically you want to read or write the whole thing in one operation. In
real life, (and in my experience), you will pretty soon find that
someone wants to know how many orders during the last 60 days included
item X with quantities larger than say 6.
Easy. I do that sort of thing everywhere (no accounting tool would be very
helpful if it couldn't do those kind of complex queries).

I don't know about query engines for other kv-dbs like Mnesia, but CouchDB
handles that sort of thing very well. Just create a view that gathers the
inner data according to the criteria you are looking for (map step) then
do the necessary tallying (reduce step).
If your design decision was to store this whole thing (the order and its
items) as one big document (my term for it), the only way to retrieve
this data is to literally open up every order, filter out the items you
want and aggregate them. The only way to make this fast is to avoid
reading every single document and processing it over and over.
Couch is very smart at ensuring that any lengthy read/map-reduce
operations are cached in an internal data structure (B-tree I think) and
tries it's best to keep the cache update-to-date. So you normally don't
have to worry too much about the whole vs partial read/write operations --
you just get on with business of writing your application using whatever
structures make sense to you. But if the query is one that is rarely
executed and lengthly, you can run it at startup so the next time it runs
it's faster.

This is one of the things I mean when I say we have to make clear the
class of kv-store that the original poster should look into. An RDBMS user
looking for a suitable alternative isn't looking for just a plain
kv-store. You're normally looking for something that has a nice well
thought-out query engine on top and has figured a lot of these issues out.
The Couch team have done an *excellent* job here.
And to do this optimisation, you need an index, or more likely several
indices - on dates, on items types, etc.
Correct. For couch's case, when you're creating a query (a view in couch
lingo) what you're essentially doing at the map step is asking couch to
create an index for you. Couch tries it's best to be very smart at how it
handles/optimises that index.
Indices require that you understand what is inside your "document" (in
this case line items).
Correct. You can look at couch's map step as providing the query engine an
understanding of the documents being stored, including any contained or
deeply hierarchical* data.

*YAY! I've _ALWAYS_ wanted this and tried desperately to simulate it in
SQL-RDBMSs with varying degrees of success.
By definition, this implies a relationship - orders have among other
things - lines. Completely independent of the fact that you are storing
the items inside the document/order.
Okay, I see what you're saying here. Indeed...

http://wiki.apache.org/couchdb/EntityRelationship

... you can view embedded data as related data. I hadn't thought of it
that way.

Let me make my argument clear by referring to what SQL-RDBMS folk tend to
think of as a relationship -- that is -- a constraint based on a
foreign-key. The kv-db equivalent of this would be a member of the
document that is merely a key pointing to another document rather than
meaningful data in it's own right. Now, using *that* definition...

For Couch's case, the map-reduce paradigm can be effectively used so that
you *DON'T HAVE TO CREATE RELATIONSHIPS (of the external key variety) IF
YOU DON'T WANT THEM*

This is my major argument against SQL-RDMBSs and in favor of NoSQL/kv-dbs
like Couch -- give me containers when I want containers and give me
relationships when I specifically want relationships.

I don't want my orders to *relate* to lines elsewhere, this is not how my
application looks at orders. I want orders to *contain* items. On the
other-hand, I want invoices to *relate* to orders. Only in the latter case
does it make sense for me to have and foreign-key/id reference stored in
the orders. I want data in my db to look like data in my application.

The db should fit my application, my application shouldn't have to fit the
db. I always found I would invest considerable time with SQL-RDBMSs, first
trying to force the db to understand how I want my data organised, then
giving up and just adapting the code of my app to think like the db -- in
terms of tables and foreign keys.

I've found NoSQL dbs like Couch treat me more like a grown up. Give me the
freedom to define and enforce my own rules in a way that makes sense to me
not in some generic way the *supposedly* adapts to every situation. Sure,
the RDBMS style may adapt to a wide array of situations, but from my
experience, this is never without pain.
So as a summary from my side - all data has some sort of structure, be
it words within documents, or line items within orders. You can
represent this any way you want.
True, but with most SQL-RDBMSs, things can get particularly hard. Aside
from my relationship rant, a lot of data just doesn't want to be stored in
tables, and when you force it, exponential complexity arises.
In the distant past we wrote all the items on a single piece of paper
called an order. It was all on one physical page. The page contained all
the information. For the same reason it is difficult to query pieces of
paper (you need to either index them or summarise them in another way),
in the same way it is difficult to query data with implied relations
stored in a single "thing" (blob/object/values).
- It is very difficult to enforce invariance in KV stores
I agree. And constraints in general are painful.
- It is very difficult to index KV stores
This depends. For couch, map-reduce is a form of indexing -- a very
effective and surprisingly flexible one at that. Doors that are normally
closed in SQL-RDBMS world are suddenly opened wide in Couch.
- It is hard work to query KV stores.
I disagree. I haven't found a query for *my* data yet that I cannot
express in a couch view. I'd go further and say I've found it easier than
SQL because I can organise data in a way that makes more sense to my
application. This somehow always translates to easier query construction
for the application domain.

Perhaps it's a question of learning curve. SQL-RDBMS users are so used to
thinking in terms of SELECTS and JOINS, but that doesn't make SQL easy! It
just makes it easy once you know how. I think the same applies to many
kv-dbs.
- It is trivial to read from or write into KV stores
- It is hard to read from or write to database (drivers, SQL, ...)
- RDMBS systems are hard to scale
I agree with all.
- KV stores scale easily
Largely true. But sometimes it's not as easy as they promise. Some things
work well locally and break when you replicate. And you only realise this
when you try and scale out. There is some element of false advertising
when NoSQL folk promise simple scaling, there should be a big red
disclaimer attached.

- Edmond -
Rudolph van Graan
Post by unknown
Post by unknown
It appears to me that this discussion is another expression of the
'strong vs weak/dynamic vs static type' discussion.
...it makes me suspect that an imperative and strongly-typed language
paradigm has been a very strong motivator in the evolution of SQL
databases; and perhaps the popularity of NoSQL/NotSQL is an expression/
outcome of the rise of recent trends in programming language uptake.
You *cannot* call the types in classic SQL "strong".
Numbers, strings, and byte strings for everything is what Joe is complaining
of and he is right to do so. Encoding something as a string or blob basically
results in the data base itself having no clue about the structure or meaning
of the data.
It is important to understand that SQL is not a good example of a relational
system. A move away from SQL *could* be a move towards relational!
such as
One of the core concepts in relational systems is that of
enforced integrity constraints, via primary keys and foreign keys.
If you don't have any integrity constraints that you care to tell the
data base about, you probably don't need a relational system.
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-09 16:01:08 UTC
Permalink
Another observation...

I think a lot of it has to do with conceptual difference between SQL-RDBMS
and NoSQL/kv-db/free-structure in terms of how both camps view the *role*
of the database.

I've found that SQL-RDMBSs tend to think of the database as central to the
design+development of an application. So naturally things must follow the
db. My issue with this is that as an app developer, I found myself always
begging and pleading with the database. The conversation would go
something like...

Me: "Please, please store this data for me so I can query it later."

DB: "Not so fast. First, split that into three tables. Add a constraint
here, foreign key there. Replace that with a type I can understand then
I'll see what I can do. BTW, you'll have to have another conversation with
my bro, the SQL query engine later. Go on... Off you go!"

The other camp seems to view the database role as more of a supplementary
one in the design+development of an application. A helper. A tool. The
conversation for me has gone more like...

Me: "Oi, Couch! Come over here. Hang onto this for me. I'll need that
back."

DB: "Sure thing boss!"

- Edmond -

On Wed, 10 Nov 2010 00:38:37 +1100, Edmond Begumisa
Post by unknown
Hello Rudolph,
Some comments...
On Tue, 09 Nov 2010 20:43:14 +1100, Rudolph van Graan
Post by unknown
One of the core concepts in relational systems is that of
enforced integrity constraints, via primary keys and foreign keys.
If you don't have any integrity constraints that you care to tell the
data base about, you probably don't need a relational system.
This was exactly what I meant with "invariance" earlier. If you have
(strong) requirements for invariance in your model - an RDBMS is almost
the only solution as integrity constraints etc are all first class
concepts. If you don't have a requirement for invariance (as in
In many cases (at least for a migrating RDBMS user), invariance is
*usually* indeed very important. And as you pointed out earlier there
are ways of faking this in the emerging class of NoSQL/kv-db, the
absence of which means tougher checks on code that's writing data (the
db is unlikely to do this for you.)
I've found that when one is making the transition (from SQL-RDBMS to
NoSQL/Kv-db), the benefits of easier mapping of db-object to
application-object normally out-weight the loss of things like
constraints. I guess you have to trust more that programmers know what
they are doing and won't write bad data! Some people cringe when hearing
this, but I've found it's as bad as it sounds!
However, I think as this newer class of DB start to attract more and
more RDBMS migrants, they will have to figure out a ways of solving
these issues. Some have already stared (CouchDB has some concept of data
validation.)
Post by unknown
It is important to understand that SQL is not a good example of a relational system.
Yes - SQL is a functional language where you state the solution in
terms of relational concepts and ask the RDMBS to solve it and present
you with an answer. Some of the Non-SQL systems uses search terms "Name
= 'John'" or jscript for the same purpose. QLC is also an example (in
the case of mnesia).
I'd go further and say SQL has evolved into a primary API for accessing
todays RDBMSs. You can do much more that just query for data. You can
design/redesign the db, add/remove indicies. Many vendors add elaborate
extensions like Microsoft's Data Shaping for hierarchical data. In many
ways SQL is like the non-standard standard shell language for many
databases.
Post by unknown
I think a big reason kv-stores are winning over a lot of us long-time
RDBMSs users is they allow us to model
things-that-have-things-inside-them in the database much closer to how
they are modeled in our applications. Orders/receipts/invoices with
their items, users with their permissions, all these map nicely in
kv-stores from db to application. This allows us to model
relationships only when WE REALLY WANT RELATIONSHIPS (this receipt
belongs to that invoice). That fact alone won me over and I've never
looked back.
However, it is only simple to store things this way (the example of
Orders/Receipts/Invoices with items "inside" them),
I disagree, I store things this way not because it's simple for the
database but because that's how my applications understand those things
(BTW, those were *REAL* examples I was using orders/receipts/invoices
from real code in development. Code previously written against
SQL-RDBMSs and being re-writing against CouchDB. The latter is proving
to be a much more pleasant experience!)
I've seen many SQL-RDBMS developers try to achieve a similar end-result
using object-relational mapping. You know, so when they create an
instance of a user class in say Java, a new user is automatically added
the users table in the db and corresponding entries are made in the
permissions table. I've never agreed with that way of doing things but
it *does* illustrate that there is a disconnect between how developers
want to organise data and how SQL-RDBMSs want them organise data.
if your only interest in the data is that the "outer" or container
object encapsulates the items.
This isn't true. At least not with the class of kv-database I'm
referring to. CouchDB (and I assume Mnesia) allows you to access the
'inner' objects (to whatever depth) inside your queries. You can very
easily access the items inside an invoice.
Typically you want to read or write the whole thing in one operation.
In real life, (and in my experience), you will pretty soon find that
someone wants to know how many orders during the last 60 days included
item X with quantities larger than say 6.
Easy. I do that sort of thing everywhere (no accounting tool would be
very helpful if it couldn't do those kind of complex queries).
I don't know about query engines for other kv-dbs like Mnesia, but
CouchDB handles that sort of thing very well. Just create a view that
gathers the inner data according to the criteria you are looking for
(map step) then do the necessary tallying (reduce step).
If your design decision was to store this whole thing (the order and
its items) as one big document (my term for it), the only way to
retrieve this data is to literally open up every order, filter out the
items you want and aggregate them. The only way to make this fast is to
avoid reading every single document and processing it over and over.
Couch is very smart at ensuring that any lengthy read/map-reduce
operations are cached in an internal data structure (B-tree I think) and
tries it's best to keep the cache update-to-date. So you normally don't
have to worry too much about the whole vs partial read/write operations
-- you just get on with business of writing your application using
whatever structures make sense to you. But if the query is one that is
rarely executed and lengthly, you can run it at startup so the next time
it runs it's faster.
This is one of the things I mean when I say we have to make clear the
class of kv-store that the original poster should look into. An RDBMS
user looking for a suitable alternative isn't looking for just a plain
kv-store. You're normally looking for something that has a nice well
thought-out query engine on top and has figured a lot of these issues
out. The Couch team have done an *excellent* job here.
And to do this optimisation, you need an index, or more likely several
indices - on dates, on items types, etc.
Correct. For couch's case, when you're creating a query (a view in couch
lingo) what you're essentially doing at the map step is asking couch to
create an index for you. Couch tries it's best to be very smart at how
it handles/optimises that index.
Indices require that you understand what is inside your "document" (in
this case line items).
Correct. You can look at couch's map step as providing the query engine
an understanding of the documents being stored, including any contained
or deeply hierarchical* data.
*YAY! I've _ALWAYS_ wanted this and tried desperately to simulate it in
SQL-RDBMSs with varying degrees of success.
By definition, this implies a relationship - orders have among other
things - lines. Completely independent of the fact that you are storing
the items inside the document/order.
Okay, I see what you're saying here. Indeed...
http://wiki.apache.org/couchdb/EntityRelationship
... you can view embedded data as related data. I hadn't thought of it
that way.
Let me make my argument clear by referring to what SQL-RDBMS folk tend
to think of as a relationship -- that is -- a constraint based on a
foreign-key. The kv-db equivalent of this would be a member of the
document that is merely a key pointing to another document rather than
meaningful data in it's own right. Now, using *that* definition...
For Couch's case, the map-reduce paradigm can be effectively used so
that you *DON'T HAVE TO CREATE RELATIONSHIPS (of the external key
variety) IF YOU DON'T WANT THEM*
This is my major argument against SQL-RDMBSs and in favor of
NoSQL/kv-dbs like Couch -- give me containers when I want containers and
give me relationships when I specifically want relationships.
I don't want my orders to *relate* to lines elsewhere, this is not how
my application looks at orders. I want orders to *contain* items. On the
other-hand, I want invoices to *relate* to orders. Only in the latter
case does it make sense for me to have and foreign-key/id reference
stored in the orders. I want data in my db to look like data in my
application.
The db should fit my application, my application shouldn't have to fit
the db. I always found I would invest considerable time with SQL-RDBMSs,
first trying to force the db to understand how I want my data organised,
then giving up and just adapting the code of my app to think like the db
-- in terms of tables and foreign keys.
I've found NoSQL dbs like Couch treat me more like a grown up. Give me
the freedom to define and enforce my own rules in a way that makes sense
to me not in some generic way the *supposedly* adapts to every
situation. Sure, the RDBMS style may adapt to a wide array of
situations, but from my experience, this is never without pain.
So as a summary from my side - all data has some sort of structure, be
it words within documents, or line items within orders. You can
represent this any way you want.
True, but with most SQL-RDBMSs, things can get particularly hard. Aside
from my relationship rant, a lot of data just doesn't want to be stored
in tables, and when you force it, exponential complexity arises.
In the distant past we wrote all the items on a single piece of paper
called an order. It was all on one physical page. The page contained
all the information. For the same reason it is difficult to query
pieces of paper (you need to either index them or summarise them in
another way), in the same way it is difficult to query data with
implied relations stored in a single "thing" (blob/object/values).
- It is very difficult to enforce invariance in KV stores
I agree. And constraints in general are painful.
- It is very difficult to index KV stores
This depends. For couch, map-reduce is a form of indexing -- a very
effective and surprisingly flexible one at that. Doors that are normally
closed in SQL-RDBMS world are suddenly opened wide in Couch.
- It is hard work to query KV stores.
I disagree. I haven't found a query for *my* data yet that I cannot
express in a couch view. I'd go further and say I've found it easier
than SQL because I can organise data in a way that makes more sense to
my application. This somehow always translates to easier query
construction for the application domain.
Perhaps it's a question of learning curve. SQL-RDBMS users are so used
to thinking in terms of SELECTS and JOINS, but that doesn't make SQL
easy! It just makes it easy once you know how. I think the same applies
to many kv-dbs.
- It is trivial to read from or write into KV stores
- It is hard to read from or write to database (drivers, SQL, ...)
- RDMBS systems are hard to scale
I agree with all.
- KV stores scale easily
Largely true. But sometimes it's not as easy as they promise. Some
things work well locally and break when you replicate. And you only
realise this when you try and scale out. There is some element of false
advertising when NoSQL folk promise simple scaling, there should be a
big red disclaimer attached.
- Edmond -
Rudolph van Graan
Post by unknown
Post by unknown
It appears to me that this discussion is another expression of the
'strong vs weak/dynamic vs static type' discussion.
...it makes me suspect that an imperative and strongly-typed language
paradigm has been a very strong motivator in the evolution of SQL
databases; and perhaps the popularity of NoSQL/NotSQL is an
expression/
outcome of the rise of recent trends in programming language uptake.
You *cannot* call the types in classic SQL "strong".
Numbers, strings, and byte strings for everything is what Joe is complaining
of and he is right to do so. Encoding something as a string or blob basically
results in the data base itself having no clue about the structure or meaning
of the data.
It is important to understand that SQL is not a good example of a relational
system. A move away from SQL *could* be a move towards relational!
such as
One of the core concepts in relational systems is that of
enforced integrity constraints, via primary keys and foreign keys.
If you don't have any integrity constraints that you care to tell the
data base about, you probably don't need a relational system.
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-09 23:23:50 UTC
Permalink
Post by unknown
Another observation...
I think a lot of it has to do with conceptual difference between SQL-RDBMS and NoSQL/kv-db/free-structure in terms of how both camps view the *role* of the database.
This is precisely how Date and Darwen characterise the difference
between the relational and OO data base camps:

Relational: Applications come, applications go. But if anything
goes wrong with the data, we're out of business.

Object: Applications are the centre of the universe.
Databases are just for persistent storage, with
maybe some indexing.

I'd point out that OO databases have been around for much longer than
the modern set of NoSQL or K-V systems. Gemstone is still around,
and Magma is pretty cool. Magma is really easy to use: objects in
the data base materialise in your workspace when you look at them,
modified objects get written back when you commit, and objects become
known to the data base by being reachable from the root object.

If what you need is persistent storage for a single application,
and you don't expect the data to outlive the application, a relational
database probably is not for you.

I've known a student write thousands of lines of Java code talking to
an SQL system in order to deal with data that could have been a simple
flat file or two with far better efficiency (all the data would have
fitted comfortably into memory). I've known another group of students
insist on using SQL for an assignment where the explict point of the
assignment was to develop an in-memory data structure (which for this
application outperformed SQL a thousand-fold).
unknown
2010-11-10 01:20:38 UTC
Permalink
On Wed, 10 Nov 2010 10:23:50 +1100, Richard O'Keefe <ok>
Post by unknown
If what you need is persistent storage for a single application,
and you don't expect the data to outlive the application, a relational
database probably is not for you.
Ditto!

It has taken me 12 years as an SQL-RDBMS user to realise that the problem
was I was using the wrong class of database the whole time! Silly, silly
me. Square peg, round hole. (actually I started suspecting this about 5
years ago.) I wonder how many more silly me's there are out there?

I think a major reason many assume the relational + sql solution as the
default answer is not because they need relational constraints per se, but
because they need complex querying for which it is assumed only SQL can
provide (and the best SQL dbs seem to be relational). My requirement list
was normally like this...

1. Persistent storage for my application that be _might_ later be shared
with other applications, but I'm free to dictate how.
2. Complex querying preferably with indexing.
3. Data unlikely to outlive my application (if it needs to, it can be
migrated).
4. Data won't fit in RAM (without hogging too much of it).
5. (NEW) Data can be replicated.

For a long time I assumed only SQL-RDBMSs could give me these. I've found,
like many, that I was wrong. In retrospect, a lot of it was
do-as-others-do I guess. As you point out OO dbs could have given me many
of these. Some other alternatives were there too. I discovered Btrieve
about five years back and really liked it.

I think what the NoSQL/kv-dbs like Couch are adding to their older OO
counterparts is 1 (HTTP APIs for other apps), 2, & 5.

It's interesting too to observe the increasing popularity of
object-relational mapping layers for popular SQL-RDBMSs. Essentially
trying to turn RDBMSs into something closer to OO dbs. Funny that! Round
and round we go!

- Edmond -
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-10 08:31:38 UTC
Permalink
Ooops, big, big correction. My 1st requirement should have been...

1. ACIDic transactional concurrent persistent storage for my application
that be _might_ later be shared with other applications, but I'm free to
dictate how.

Very important distinction.

- Edmond -

On Wed, 10 Nov 2010 12:20:38 +1100, Edmond Begumisa
Post by unknown
On Wed, 10 Nov 2010 10:23:50 +1100, Richard O'Keefe <ok>
Post by unknown
If what you need is persistent storage for a single application,
and you don't expect the data to outlive the application, a relational
database probably is not for you.
Ditto!
It has taken me 12 years as an SQL-RDBMS user to realise that the
problem was I was using the wrong class of database the whole time!
Silly, silly me. Square peg, round hole. (actually I started suspecting
this about 5 years ago.) I wonder how many more silly me's there are out
there?
I think a major reason many assume the relational + sql solution as the
default answer is not because they need relational constraints per se,
but because they need complex querying for which it is assumed only SQL
can provide (and the best SQL dbs seem to be relational). My requirement
list was normally like this...
1. Persistent storage for my application that be _might_ later be shared
with other applications, but I'm free to dictate how.
2. Complex querying preferably with indexing.
3. Data unlikely to outlive my application (if it needs to, it can be
migrated).
4. Data won't fit in RAM (without hogging too much of it).
5. (NEW) Data can be replicated.
For a long time I assumed only SQL-RDBMSs could give me these. I've
found, like many, that I was wrong. In retrospect, a lot of it was
do-as-others-do I guess. As you point out OO dbs could have given me
many of these. Some other alternatives were there too. I discovered
Btrieve about five years back and really liked it.
I think what the NoSQL/kv-dbs like Couch are adding to their older OO
counterparts is 1 (HTTP APIs for other apps), 2, & 5.
It's interesting too to observe the increasing popularity of
object-relational mapping layers for popular SQL-RDBMSs. Essentially
trying to turn RDBMSs into something closer to OO dbs. Funny that! Round
and round we go!
- Edmond -
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-12 03:21:22 UTC
Permalink
On Tue, Nov 9, 2010 at 7:08 PM, Edmond Begumisa
Post by unknown
Post by unknown
- KV stores scale easily
Largely true. But sometimes it's not as easy as they promise. Some things
work well locally and break when you replicate. And you only realise this
when you try and scale out. There is some element of false advertising when
NoSQL folk promise simple scaling, there should be a big red disclaimer
attached.
That's quite a revelation, for someone (me) whose exposure to NoSQL
(KV stores) is purely theoretical so far. To me that promise of
"scaling easily" was the single biggest motivation, i.e. scaling out
in a large distributed cluster environment. Could you elaborate on
that point, and share some instances, scenarios, examples of what may
be the scaling pitfalls ?
unknown
2010-11-12 16:29:32 UTC
Permalink
Post by unknown
That's quite a revelation, for someone (me) whose exposure to NoSQL
(KV stores) is purely theoretical so far. To me that promise of
"scaling easily" was the single biggest motivation, i.e. scaling out
in a large distributed cluster environment.
A more accurate description would be "scale easIER" than other camps as
opposed to "scale easILY". This is not just semantics. Scaling databases
is an inherently difficult problem. The NoSQL camp haven't just discovered
some new miracle cure.

Scalability can can be made easier or harder by the design decisions of
_BOTH_ the implementor and user of the db (and there are always trade
offs.) It cannot be made easy (with no trade offs.)
Post by unknown
Could you elaborate on
that point, and share some instances, scenarios, examples of what may
be the scaling pitfalls ?
My issue with the promise of easy scaling is that it is usually worded to
imply that you can take an application that is *designed* for local data
access to a NoSQL db, sprinkle some NoSQL scalability pixie-dust on it,
and suddenly it will run on google's world-wide network. There's usually
no consideration given to the fact that scalability needs to be designed
into the application *itself* from the word go, not just be made available
by the db and there are trade offs involved.

To elaborate...

----
Caveat: The following is based on *my* experience as an SQL-RDBMS ->
NoSQL/kv-db migrant who has tried a number of dbs in both camps. I'll
deliberately not name names here to avoid flames. You can easily
investigate further and imply which stores fall where (according to lay
me). I speak for myself as a db user and make the rather large assumption
that I *may* be speaking for many others like me. I am NOT an expert in
distrusted systems.
----

My understanding of scaling in kv-stores/dbs/document-stores (I don't even
know what to call these anymore) is rooted in three common
requirements/wishes that an RDBMS migrant goes to bed dreaming about and
decides to make the NoSQL boat-trip to the promised land. Usually, you
look into scaling out when you want either...

(A) DISTRIBUTION (don't know if this is the correct term, table-splitting
perhaps)

What you want: The same class of documents/values stored but not
duplicated across several databases (usually on different machines in the
same location). e.g I've got a gazillion orders and they won't fit on one
machine or render the database very ineffective when querying them, so I
want them distributed round-robin or some other way using a cluster of
dbs/machines.

What you expect: When you do this, your application doesn't need to know.
When updating, the db will decide where things should go (say, based on
some configuration setting). Queries just get executed across the
different databases and the result is given to you. Your code doesn't have
to change (I think Mnesia can do this with fragmented tables?)

What you get: Very few give you the full Mnesia style. Some give you none.
Some give you more. Many CANNOT provide this 'table-splitting' feature AT
ALL and instead offer clone replication (below) which is different. Worst
case, you have to do extra work to expressly decide where to save data and
run the queries across the different databases then manually combine the
results. Your code normally has to change. Your code normally gets more
complicated.

(B) REPLICATION

What you want: the same class of documents/values stored and
duplicated+synced across several databases (usually on different machines
in different locations) e.g I've got several branches of my business and I
want a receipt that appears at my Melbourne branch to appear in my Sydney
head office so I can export it to the central general ledger.

What you don't expect: That you are ignorant on how the CAP Theorem and
strong vs eventual consistency will impact on how you write your code.

What you get: Code that's written in absence of the informed design
decisions required to make eventual consistency work for you. This code
has to be re-thought and re-written when you finally decide to replicate.
This could be very large chunks of your application especially for someone
coming from the SQL-RDBMS world who is used to ACID/release consistency
and awakens to the fact that several assumptions in your application
suddenly break when one moves to BASE/eventual consistency. ACID(ic) local
+ BASE(ic) remote = big an impact on your application design+coding. You
can't just paste this into your code later.

(C) MORE BEEF

What you want: queries take forever, your server is way too stressed.

What you expect: Throw more hardware at the problem (more cores, memory,
etc). The more you throw the better things get.

What you get: This seems to be the one area where NoSQL dbs consistently
deliver. Some exhibit almost linear improvement. I haven't scientifically
benchmarked this myself, but I have noticed with rudimentary tests using
different machines I have at my disposal. I understand though that you do
arrive to a saturation point where the law of diminishing returns sets in
(so I've read), someone else would be better at explaining why. That is
something people neglect to mention though.

IN THEIR DEFENSE:

Expectations are probably not the fault of the NoSQL database. It's just
something that one is led into believing by the NoSQL chatter on the
internet. That things just magically scale with little effort. That all
the well-documented and well-researched problems with distrusted
programming/storage are just "somehow taken care of" by the brilliance of
NoSQL. The interpretation is that the application programmer doesn't need
to do anything. Doesn't need rethink local vs distributed/replicated data
for his application. I've found this to be misleading.

It's certainly less effort than the alternatives, and certainly works
*far* *far* better than alternatives once you get it working, BUT effort
and planning *is* required. These things are probably obvious for someone
used to working in that world so are rarely mentioned.

There's no scalability check-box that you just flip. Things have to be
well thought out and consequences have to be planned for and code written
in awareness and many times even to implement some aspects. SQL-RDBMS
migrants are rarely prepared for this. They are used to the db making all
the decisions for them. "Dig in then just scale up later when you need to"
is unlikely work.

Light reading up on elementary CAP theorem, eventual consistency and the
associated trade-offs is essential for an SQL-RDBMS -> NoSQL migrant who
is unlikely to have even considered/needed to consider these things in the
past. These will greatly affect how you do your part and harmonise it with
how the db in question does it's part to get the best (but _never_ ideal)
situation.

THE DISCLAIMER:

Should read: "EasIER scaling BUT the onus is on you to ensure you
understand the impact of the CAP theorem and eventual consistency many
have and investigate any extra work that may be required for the database
in question BEFORE you design let alone code your application".

For the NoSQL camp it might be be like "Duh!" But IMO, these things are
not obvious for the RDBMS migrant who has been promised easy scaling from
a bullet list of features. This renders scalability not easy, just more
achievable, with trade-offs.

- Edmond -
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/



On Fri, 12 Nov 2010 14:21:22 +1100, Icarus Alive <icarus.alive>
Post by unknown
On Tue, Nov 9, 2010 at 7:08 PM, Edmond Begumisa
Post by unknown
Post by unknown
- KV stores scale easily
Largely true. But sometimes it's not as easy as they promise. Some things
work well locally and break when you replicate. And you only realise this
when you try and scale out. There is some element of false advertising when
NoSQL folk promise simple scaling, there should be a big red disclaimer
attached.
That's quite a revelation, for someone (me) whose exposure to NoSQL
(KV stores) is purely theoretical so far. To me that promise of
"scaling easily" was the single biggest motivation, i.e. scaling out
in a large distributed cluster environment. Could you elaborate on
that point, and share some instances, scenarios, examples of what may
be the scaling pitfalls ?
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-09 21:15:02 UTC
Permalink
Sorry, I missed this bit...
Post by unknown
In the distant past we wrote all the items on a single piece of paper
called an order. It was all on one physical page. The page contained all
the information.
I'm not advocating for moving towards data models that are more physical.
I just want data models that are closer to my application's world-view. It
just so happens that in this case, the model that make sense to my
application is also close to the physical. Someone else with a different
coding style might write the very same application completely differently.
He should then have more freedom than SQL-RDBMSs typically gives for
adapting the database to speak his language.
Post by unknown
For the same reason it is difficult to query pieces of paper (you need
to either index them or summarise them in another way), in the same way
it is difficult to query data with implied relations stored in a single
"thing" (blob/object/values).
I really don't see the difference. Whether you query line items that are
inside the document or outside the document the human work is the same.
The data still has to be pulled out (when it's inside the document you do
this later when you need it, when it's outside you do this before-hand
EVEN THOUGH YOU MIGHT NOT NEED IT). Either way you have to write that bit
only once. And either way it probalby needs to be indexed.

My experience has been this: when I'm given the option of implied
relations as you put them (I call these containers), I find my database
design greatly simplified firstly because a good chunk of relationships
devoted to containers are erased, and secondly because this is a concept I
find I use a lot within my applications.

Using the order example, from *my application view*, it pulls the document
from the db and displays an order on the screen, lines items are
added/edited are edited. The save button is pressed, I then write this to
the db in one big doc as you describe it to the db. This happens to fit
very nicely in the way the app is designed. I want it that way in the db
*by design*

With the SQL-RDBMS two-table solution to this, I'd have to force my code
to think of this in two steps (when the address is edited thats one table,
when a line is added thats another table, grrrrr!) I've gotten around this
in the past using various techniques, hierarchical data modeling being the
most advanced -- a structure that is aware that lines belong to the order
and can have various levels of depth but that has always been a lot of
work.

The second stage, querying. My user wants all orders for the X-Box 360
before July 30. No problem, I'd have a couch view for this sort of thing.
Yes, sure the view essentially goes through all the orders but...

1) this is exactly what the my app needs (it is what the user has asked
for)
2) Couch is smart about how it caches this
3) a SQL-RDBMS would essentially do the same thing (it would just walk
though several tables instead and cache/index)
4) Couch would give me more control over the indexing

But here's the thing. You *don't* have to do it that way. That's just the
way I usually approach it. I might be a terrible way. But it's *my*
terrible way -- DB design constraints haven't forced things on me. With
SQL-RDBMSs I always felt I was being nudged -- being told what was good
for me. I never felt complete freedom.

And I might change my mind when I find orders contain thousands of lines
and do it some other way, maybe the equivalent to the two-table RDBMS
solution. I just find the NoSQL/kv-db more flexible in allowing the
application developer to choose how he/she wants to do things. If I want
to hang myself give me rope. That's my business. You're just the db. Don't
be too smart.

- Edmond -
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-09 11:23:33 UTC
Permalink
On Tue, 09 Nov 2010 12:59:11 +1100, Richard O'Keefe <ok>
Post by unknown
It is important to understand that SQL is not a good example of a relational
system. A move away from SQL *could* be a move towards relational!
I see what you mean. However, most RDBMSs today (if there is such a thing)
are accessed via SQL interfaces. So there is a tendency to use the terms
SQL and RDBMS (wrongly) interchangeably. I think SQL-RDBMS is the best
description of this class of tabular DB that seems to dominate the sector.
Optional integrity-checks/primary/foreign-keys with an SQL interface on
top.
Post by unknown
One of the core concepts in relational systems is that of
enforced integrity constraints, via primary keys and foreign keys.
If you don't have any integrity constraints that you care to tell the
data base about, you probably don't need a relational system.
I agree. In fact, this has been one of my main issues with SQL-RDBMSs:
many times one is forced to create 1:N, 1:1, N:N relationships (using
integrity constraints, primary keys, foreign keys) or simulate them with
JOINS, to do things that are not really modeling related data from the
eyes of the application (relationship bloat). The concept is stretched too
far.

The nice thing about this increasingly popular other class of database
(Couch/Mnesia/Mongo), is that you have more control over this bloat and
can choose when and how you want relationships. This normally comes at the
cost of not having enforced constraints. In my view, the trade off is
usually worth it.

- Edmond -
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-07 18:15:05 UTC
Permalink
Hello Rudolph,

Small comment. I think one has to be careful not to suggest that blobs in
typical RDBMSs and values in typical kv-stores are similar. I've heard
that comparison being made before and forgive me, but I just don't see it
(if that is indeed what what you were suggesting.)

In SQL-RDBMSs, blob fields are normally opaque to the db. Storing blobs is
the exception rather than the norm so querying against the content of
these is usually poorly catered for (if catered for at all) by the query
engine. The result is you normally can't use blobs to store anything
complex because querying against them is a pain (i.e you'd only store
things in a blob that you don't intend on using in the criteria of an SQL
statement.)

Where as for (I think most) kv-stores, the 'values' are not opaque.
Storying complex data in the values is the norm, so most kv-stores are
aware of the structural content of those values (e.g. JSON in CouchDB,
Erlang-Terms in Mnesia.) The result is that kv-store query-engines have a
mechanism for querying against the complex values being stored (views for
Couch, qlc for Mnesia).

I would think this renders the use-cases for blobs in RDBM's and values in
kv-stores in-comparable. For couch's case, the equivalent to a blob would
be more like an attachment. For Mnesia, a blob would be closer to a binary
member of a tuple/record.
The issue is that the information in the "blob" is not useful. Neither
stored in the RDBMS, nor in a plain KV-store.
That said, I don't really a agree with this statement. In a typical
kv-store the 'value' is useful because the kv-store query engine would
understand how to parse it. A typical SQL query-engine would not.

I'm no Mnesia user, but I would think that you could easily query against
parts of an Erlang-term parse tree stored in the database. To my
knowledge, the only way to get the equivalent in an SQL-database would be
to either...

a) Spread the parse tree across different tables/records (hierarchical
data modeling difficulty with tables non-withstanding)
b) Use an atypical SQL database like Postgres/Oracle that supports
complex/user-defined types
c) Cheat -- write some sort of plugin that you can from a stored-procedure

- Edmond -

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


On Sun, 07 Nov 2010 23:09:04 +1100, Rudolph van Graan
Hi Joe,
I am not sure I agree with your statement that traditional databases
"suffer" from the fact that columns have simple types. You can easily
(within reason) make them store complex types. They are at the most
basic level also KV stores. The most basic abstraction for a "Table" is
a set of tuples stored in a certain way. There is no reason why you
cannot define a table with an arbitrary key and an arbitrary "blob" as
value column and then storing your parse tree in that blob field.
(Obviously you need to serialise/deserialise it - as you would need to
do for any storage abstraction).
The issue is that the information in the "blob" is not useful. Neither
stored in the RDBMS, nor in a plain KV-store. You have to make it useful
before you can do something with it or extract information from it. I
Post by unknown
I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.
I agree with you. If you throw distribution, map-reduce and lots of
indices into the equation you can solve most issues this way. Except for
one thing - invariance between multiple records and record types. There
are very few examples where it is useful to live without invariance -
searching being one example.
Invariance is a first order concept in any RDBMS system and essential
for many types of systems. However, in my experience, you can achieve
application-level invariance in other ways. But it requires
substantially more work to approximate first order invariance using
KV-stores. It is sometimes simpler to have your data properly
normalised I think.
Rudolph van Graan
Post by unknown
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. I've never used any very advanced
feature, but I know enough of it to make database applications.
Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. So I decided to pick up some Erlang and Mnesia, its
native key-value database. More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.
But all pros have cons as companion. The lack of a relationship model
* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
I'm no database expert so don't quote me here ...
As far as I am concerned traditional databases like SQL suffer
from the fact that the data stored in an individual column is incredible
simple - I can store an integer/string/... in a column but these are
incredibly simple data structures. I want to store and retrieve incredibly
complicated things - how do I store an XML parse tree in a single cell
of a database? - How do I store a database in a database ...
In a decent K-V database the value of the key can be *anything* - an
entire database for example, a compiler, ... no worries
Then when I analyse my problem I start thinking "I can store and retrieve any
complex object that keys do I need to solve my problem?"
I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.
It seems to my that SQL provides you with the ability do to complex
queries on simple things. K-V dbs can do simple queries on complex
things.
/Joe
Post by unknown
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
Thank you very much!
--
Silas Silva
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-07 21:09:44 UTC
Permalink
Hi,

Edmond, you talk about datastores that can peek inside the values
instead of regarding them as opaque binaries.
But are those datastores really kv-stores?
I would claim that, by definition, kv-stores can only do get, put,
delete and maybe some batch insertions and retrievals, but not more.
Couchdb and mnesia should probably not be named kv-stores, but rather
something more descriptive of their capabilities.


Also, no-sql is not the same as key-value.
Part of the problem is that no-sql is too loose and broad a category.

Morten.


inside th eI agree with your points about data stores that can look
inside the values
Post by unknown
Hello Rudolph,
Small comment. I think one has to be careful not to suggest that blobs
in typical RDBMSs and values in typical kv-stores are similar. I've
heard that comparison being made before and forgive me, but I just
don't see it (if that is indeed what what you were suggesting.)
In SQL-RDBMSs, blob fields are normally opaque to the db. Storing
blobs is the exception rather than the norm so querying against the
content of these is usually poorly catered for (if catered for at all)
by the query engine. The result is you normally can't use blobs to
store anything complex because querying against them is a pain (i.e
you'd only store things in a blob that you don't intend on using in
the criteria of an SQL statement.)
Where as for (I think most) kv-stores, the 'values' are not opaque.
Storying complex data in the values is the norm, so most kv-stores are
aware of the structural content of those values (e.g. JSON in CouchDB,
Erlang-Terms in Mnesia.) The result is that kv-store query-engines
have a mechanism for querying against the complex values being stored
(views for Couch, qlc for Mnesia).
I would think this renders the use-cases for blobs in RDBM's and
values in kv-stores in-comparable. For couch's case, the equivalent to
a blob would be more like an attachment. For Mnesia, a blob would be
closer to a binary member of a tuple/record.
The issue is that the information in the "blob" is not useful.
Neither stored in the RDBMS, nor in a plain KV-store.
That said, I don't really a agree with this statement. In a typical
kv-store the 'value' is useful because the kv-store query engine would
understand how to parse it. A typical SQL query-engine would not.
I'm no Mnesia user, but I would think that you could easily query
against parts of an Erlang-term parse tree stored in the database. To
my knowledge, the only way to get the equivalent in an SQL-database
would be to either...
a) Spread the parse tree across different tables/records (hierarchical
data modeling difficulty with tables non-withstanding)
b) Use an atypical SQL database like Postgres/Oracle that supports
complex/user-defined types
c) Cheat -- write some sort of plugin that you can from a
stored-procedure
- Edmond -
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
On Sun, 07 Nov 2010 23:09:04 +1100, Rudolph van Graan
Hi Joe,
I am not sure I agree with your statement that traditional databases
"suffer" from the fact that columns have simple types. You can easily
(within reason) make them store complex types. They are at the most
basic level also KV stores. The most basic abstraction for a "Table"
is a set of tuples stored in a certain way. There is no reason why
you cannot define a table with an arbitrary key and an arbitrary
"blob" as value column and then storing your parse tree in that blob
field. (Obviously you need to serialise/deserialise it - as you would
need to do for any storage abstraction).
The issue is that the information in the "blob" is not useful.
Neither stored in the RDBMS, nor in a plain KV-store. You have to
make it useful before you can do something with it or extract
Post by unknown
I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.
I agree with you. If you throw distribution, map-reduce and lots of
indices into the equation you can solve most issues this way. Except
for one thing - invariance between multiple records and record types.
There are very few examples where it is useful to live without
invariance - searching being one example.
Invariance is a first order concept in any RDBMS system and essential
for many types of systems. However, in my experience, you can achieve
application-level invariance in other ways. But it requires
substantially more work to approximate first order invariance using
KV-stores. It is sometimes simpler to have your data properly
normalised I think.
Rudolph van Graan
Post by unknown
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. I've never used any very advanced
feature, but I know enough of it to make database applications.
Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. So I decided to pick up some Erlang and Mnesia, its
native key-value database. More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.
But all pros have cons as companion. The lack of a relationship model
* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
I'm no database expert so don't quote me here ...
As far as I am concerned traditional databases like SQL suffer
from the fact that the data stored in an individual column is incredible
simple - I can store an integer/string/... in a column but these are
incredibly simple data structures. I want to store and retrieve incredibly
complicated things - how do I store an XML parse tree in a single cell
of a database? - How do I store a database in a database ...
In a decent K-V database the value of the key can be *anything* - an
entire database for example, a compiler, ... no worries
Then when I analyse my problem I start thinking "I can store and retrieve any
complex object that keys do I need to solve my problem?"
I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.
It seems to my that SQL provides you with the ability do to complex
queries on simple things. K-V dbs can do simple queries on complex
things.
/Joe
Post by unknown
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
Thank you very much!
--
Silas Silva
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
unknown
2010-11-08 07:13:04 UTC
Permalink
I should have probably used the term "key-value database" as the original
poster did to better home in on the category. Specifically, that class of
key-based NoSQL databases that you would list when starting a new project
and looking for an alternative to RDBMSs which is what I assumed the
thread was about.

Mnesia, CouchDB, MongoDB, Cassandra, etc, all have query engines that are
aware of how to access the structures being stored. When querying, you can
go far beyond just "give me the value with this key". This would be too
tedious for the kind of datasets used in RDBMSs that you're looking to
store in NoSQL databases instead. So typically, these kv-databases allow
you to construct queries that can access the contents of the structures
being stored, in-fact this is the norm. You then hand the query over to
the query engine and say "fetch me this data with this criteria".

- Edmond -
Post by unknown
Hi,
Edmond, you talk about datastores that can peek inside the values
instead of regarding them as opaque binaries.
But are those datastores really kv-stores?
I would claim that, by definition, kv-stores can only do get, put,
delete and maybe some batch insertions and retrievals, but not more.
Couchdb and mnesia should probably not be named kv-stores, but rather
something more descriptive of their capabilities.
Also, no-sql is not the same as key-value.
Part of the problem is that no-sql is too loose and broad a category.
Morten.
inside th eI agree with your points about data stores that can look
inside the values
Post by unknown
Hello Rudolph,
Small comment. I think one has to be careful not to suggest that blobs
in typical RDBMSs and values in typical kv-stores are similar. I've
heard that comparison being made before and forgive me, but I just
don't see it (if that is indeed what what you were suggesting.)
In SQL-RDBMSs, blob fields are normally opaque to the db. Storing blobs
is the exception rather than the norm so querying against the content
of these is usually poorly catered for (if catered for at all) by the
query engine. The result is you normally can't use blobs to store
anything complex because querying against them is a pain (i.e you'd
only store things in a blob that you don't intend on using in the
criteria of an SQL statement.)
Where as for (I think most) kv-stores, the 'values' are not opaque.
Storying complex data in the values is the norm, so most kv-stores are
aware of the structural content of those values (e.g. JSON in CouchDB,
Erlang-Terms in Mnesia.) The result is that kv-store query-engines have
a mechanism for querying against the complex values being stored (views
for Couch, qlc for Mnesia).
I would think this renders the use-cases for blobs in RDBM's and values
in kv-stores in-comparable. For couch's case, the equivalent to a blob
would be more like an attachment. For Mnesia, a blob would be closer to
a binary member of a tuple/record.
The issue is that the information in the "blob" is not useful. Neither
stored in the RDBMS, nor in a plain KV-store.
That said, I don't really a agree with this statement. In a typical
kv-store the 'value' is useful because the kv-store query engine would
understand how to parse it. A typical SQL query-engine would not.
I'm no Mnesia user, but I would think that you could easily query
against parts of an Erlang-term parse tree stored in the database. To
my knowledge, the only way to get the equivalent in an SQL-database
would be to either...
a) Spread the parse tree across different tables/records (hierarchical
data modeling difficulty with tables non-withstanding)
b) Use an atypical SQL database like Postgres/Oracle that supports
complex/user-defined types
c) Cheat -- write some sort of plugin that you can from a
stored-procedure
- Edmond -
-- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
On Sun, 07 Nov 2010 23:09:04 +1100, Rudolph van Graan
Hi Joe,
I am not sure I agree with your statement that traditional databases
"suffer" from the fact that columns have simple types. You can easily
(within reason) make them store complex types. They are at the most
basic level also KV stores. The most basic abstraction for a "Table"
is a set of tuples stored in a certain way. There is no reason why you
cannot define a table with an arbitrary key and an arbitrary "blob" as
value column and then storing your parse tree in that blob field.
(Obviously you need to serialise/deserialise it - as you would need to
do for any storage abstraction).
The issue is that the information in the "blob" is not useful. Neither
stored in the RDBMS, nor in a plain KV-store. You have to make it
useful before you can do something with it or extract information from
Post by unknown
I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.
I agree with you. If you throw distribution, map-reduce and lots of
indices into the equation you can solve most issues this way. Except
for one thing - invariance between multiple records and record types.
There are very few examples where it is useful to live without
invariance - searching being one example.
Invariance is a first order concept in any RDBMS system and essential
for many types of systems. However, in my experience, you can achieve
application-level invariance in other ways. But it requires
substantially more work to approximate first order invariance using
KV-stores. It is sometimes simpler to have your data properly
normalised I think.
Rudolph van Graan
Post by unknown
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. I've never used any very advanced
feature, but I know enough of it to make database applications.
Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. So I decided to pick up some Erlang and Mnesia, its
native key-value database. More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.
But all pros have cons as companion. The lack of a relationship model
* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
I'm no database expert so don't quote me here ...
As far as I am concerned traditional databases like SQL suffer
from the fact that the data stored in an individual column is incredible
simple - I can store an integer/string/... in a column but these are
incredibly simple data structures. I want to store and retrieve incredibly
complicated things - how do I store an XML parse tree in a single cell
of a database? - How do I store a database in a database ...
In a decent K-V database the value of the key can be *anything* - an
entire database for example, a compiler, ... no worries
Then when I analyse my problem I start thinking "I can store and retrieve any
complex object that keys do I need to solve my problem?"
I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.
It seems to my that SQL provides you with the ability do to complex
queries on simple things. K-V dbs can do simple queries on complex
things.
/Joe
Post by unknown
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
Thank you very much!
-- Silas Silva
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-07 17:16:43 UTC
Permalink
Java's James Gosling seems to share your views...

"... I've never got it when it comes to SQL databases. It's like, why?
Just give me a hash table and a sh*tload of RAM and I'm happy. And then
you do something to deal with failures. And you look at the way things
like the NoSQL movement is. It's various flavors of large scale
distributed hash tables and trying to deal with massive scale and massive
replication, and you can't back up the database because no tape farm is
big enough. And you find scale and reliability can fit together at the
same time..."

http://www.basementcoders.com/transcripts/James_Gosling_Transcript.html

Funny, never thought two co-inventors of two different languages would
actually agree on something :)

The main reason I've switched to NoSQL (CouchDB) after many years of using
SQL-RDBMSs is for modeling hierarchical data. RDBMSs force you to do this
with relationships across different tables even when the data belongs to
the same domain/object -- this feels unnatural and results in the DB
equivalent of spaghetti code with hours staring at relationship diagrams.

Take the simple example of orders that have items. In a typical SQL-RDBMS,
this would normally be modeled using two tables related to each other in
1:many orders->items. A kv-store like Couch allows me to store this in a
more sane manner, just a bunch of orders, each with items embedded inside
them. But where kv-stores really shine is when your data has uneven levels
of hierarchy within the same types of objects. I've always struggled with
this in RDBMSs.

I think a big reason kv-stores are winning over a lot of us long-time
RDBMSs users is they allow us to model things-that-have-things-inside-them
in the database much closer to how they are modeled in our applications.
Orders/receipts/invoices with their items, users with their permissions,
all these map nicely in kv-stores from db to application. This allows us
to model relationships only when WE REALLY WANT RELATIONSHIPS (this
receipt belongs to that invoice). That fact alone won me over and I've
never looked back.


- Edmond -
Post by unknown
Post by unknown
This is a message I sent to the nosql-discussion
discussion group. I thought it might be interesting to send it
erlang-questions, so here we go...
Hi all!
I have used SQL RDBMSs for some time. I've never used any very advanced
feature, but I know enough of it to make database applications.
Nowadays, I decided it would be interesting to learn some NoSQL
databases concepts. So I decided to pick up some Erlang and Mnesia, its
native key-value database. More than scalability itself, the most
valuable feature for me is the possibility of replication and
synchronization between nodes.
But all pros have cons as companion. The lack of a relationship model
* Is there any guide, tutorial, book, whatever, that tries to introduce
NoSQL databases to SQL users?
* Key-value databases are surprising simple. I know you solve
relationship by denormalizing data. What data should be normalized?
What shouldn't? How do you update denormalized data?
I'm no database expert so don't quote me here ...
As far as I am concerned traditional databases like SQL suffer
from the fact that the data stored in an individual column is incredible
simple - I can store an integer/string/... in a column but these are
incredibly simple data structures. I want to store and retrieve incredibly
complicated things - how do I store an XML parse tree in a single cell
of a database? - How do I store a database in a database ...
In a decent K-V database the value of the key can be *anything* - an
entire database for example, a compiler, ... no worries
Then when I analyse my problem I start thinking "I can store and retrieve any
complex object that keys do I need to solve my problem?"
I'm not thinking in terms of joins and normalizing things - the thought process
is different - so far I haven't met any problems that don't map onto key-values
queries.
It seems to my that SQL provides you with the ability do to complex
queries on simple things. K-V dbs can do simple queries on complex
things.
/Joe
Post by unknown
Sorry for such simple and general questions. Things were simple up to
the moment that I realized that it would be easily solved with a JOIN
SQL statement. :-)
Thank you very much!
--
Silas Silva
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-08 10:06:19 UTC
Permalink
On Mon, 08 Nov 2010 04:16:43 +1100, "Edmond Begumisa"
Post by unknown
The main reason I've switched to NoSQL (CouchDB) after many years of
using SQL-RDBMSs is for modeling hierarchical data. RDBMSs force you
to do this with relationships across different tables even when the
data belongs to the same domain/object -- this feels unnatural and
results in the DB equivalent of spaghetti code with hours staring at
relationship diagrams.
The relational model can express hierarchical relations. SQL cannot
express the whole relational algebra, but it does allow to handle
hierarchical relations. You can use self-referencing tables:

CREATE TABLE hierarchy (
id INTEGER PRIMARY KEY,
parent_id INTEGER
REFERENCES hierarchy(id)
ON UPDATE CASCADE
ON DELETE RESTRICT,
...
);

And then you could use common table expressions in queries:

http://en.wikipedia.org/wiki/Common_table_expressions

Funny note: the relational model was developed 40 years ago in
order to provide a declarative and formal method for structuring
data, and overcome the limitations of the DBMSs of the time ---
which were, in fact, mostly hierarchical and document-oriented,
and required a procedural interface in order to perform queries.

As a result, the data in hierarchical DBMSs was structured without
formal analysis, and only considering the immediate requirements of
the application that was going to use it. When other applications
needed to access the DB, or the application itself required updates
which broke the mapping between DB objects and application objects,
then *huge* pains would arise...

Regards,
--
Alceste Scalas <alceste>
unknown
2010-11-08 13:49:35 UTC
Permalink
Alceste,

Very true. Hierarchical data can be expressed even in simple SQL databases
that don't support the self-reference/common-table constructs you
describe. You can do interesting things with indices and almost-circular
SQL-statements. Some DBs even have a whole API devoted to this sort of
thing -- I used Microsoft's "Data Shaping" extension to SQL for quite some
time. Nevertheless, it never feels as natural and is never as easy as it
is in NoSQL/kv-databases.

Your historical note is interesting. It's almost as though we are going
backwards with NoSQL, like in a lot of other areas of development that are
proclaimed as "new" (e.g. I always looked at Ajax/Web 2.0 and thought -
"wait a minute, that's just rich/smart client development that's been
around since the 90s!")

Your observation of data creeping into a database that's not consistent
with the application model is one of the arguments used for RDBMs against
NoSQL. The idea that relationships and constraints can be used to
safeguard bad data. I guess this is one of the things that you
weight/prioritise when choosing either to go NoSQL or SQL.

My problem with relationships is that they tend to be used to implement
too many things and introduce excessive complexity in an attempt to guard
against bad data. From my experience, SQL is uses relationships/joins for
two different things...

1) Containment: Orders contain items --- create a relationship.
2) Actual Relationships: Orders have corresponding invoices -- create a
relationship.

The problem is containment is such a common thing that a good chunk of
your relationships aren't really relationships. This is where relationship
diagrams and SQL queries start to bloat. To the extent that I sometimes
question whether the benefits gained from guarding against bad data are
negated by the complexity introduced.

NoSQL/kv-dbs on the on-the-other-hand tend to give you real containment
when you want it (even deep hierarchies if you need them), and
relationships when you actually want relationships. The cost is less
sophistication in auto-guarding against bad data.

The "SQL-way" is: don't trust the programmer, make sure everything fits
the per-defined data model (even if it is inconvenient/doesn't map nicely
against the application's model.)

The "NoSQL-way" is: trust the programmer, make it easier to model data
that is convenient for/maps nicely against the application's model.

For me, the NoSQL-way makes more sense. The database should work for the
programmer, not the other way around.


- Edmond -

On Mon, 08 Nov 2010 21:06:19 +1100, Alceste Scalas <alceste>
Post by unknown
On Mon, 08 Nov 2010 04:16:43 +1100, "Edmond Begumisa"
Post by unknown
The main reason I've switched to NoSQL (CouchDB) after many years of
using SQL-RDBMSs is for modeling hierarchical data. RDBMSs force you
to do this with relationships across different tables even when the
data belongs to the same domain/object -- this feels unnatural and
results in the DB equivalent of spaghetti code with hours staring at
relationship diagrams.
The relational model can express hierarchical relations. SQL cannot
express the whole relational algebra, but it does allow to handle
CREATE TABLE hierarchy (
id INTEGER PRIMARY KEY,
parent_id INTEGER
REFERENCES hierarchy(id)
ON UPDATE CASCADE
ON DELETE RESTRICT,
...
);
http://en.wikipedia.org/wiki/Common_table_expressions
Funny note: the relational model was developed 40 years ago in
order to provide a declarative and formal method for structuring
data, and overcome the limitations of the DBMSs of the time ---
which were, in fact, mostly hierarchical and document-oriented,
and required a procedural interface in order to perform queries.
As a result, the data in hierarchical DBMSs was structured without
formal analysis, and only considering the immediate requirements of
the application that was going to use it. When other applications
needed to access the DB, or the application itself required updates
which broke the mapping between DB objects and application objects,
then *huge* pains would arise...
Regards,
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-08 22:29:34 UTC
Permalink
I've a couple of comments, joining the thread late, sorry.

Regarding the labels of "key-value database" vs. "document-oriented
database", I typically think of KV DBs as considering their value blobs
to be opaque and doc-oriented DBs as understanding all (or at least
part) of the structure of their values.

* Hibari is KV
* Riak is both (the lower parts of the KV store treat blobs as
opaque, but both the map-reduce KV framework and the new Riak
Search app can parse the value blobs as JSON, Erlang terms, or
whatever a Search text analyzer decides to parse)
* CouchDB and MongoDB are doc-oriented

Joe Armstrong <erlang> wrote:

ja> Does anybody know of any pure-erlang add-ons to ets/dets that do
ja> full-text indexing? the tricky but seems to be word extraction and
ja> not making/sorting/building the index also building an incremental
ja> index might be difficult (I haven't done this)

Not that I'm aware of.

In the Riak Search app, IIRC, text analysis (and stemming, etc.) can be
done in Erlang or Java. The inverted indexes managed at each node are
pure Erlang (specifically the merge_index app, see apps/merge_index/src
subdir). Replication of the merge_index indexes are managed by the
riak_core application.

https://github.com/basho/riak_search
https://github.com/basho/riak_core

-Scott
unknown
2010-11-09 10:54:09 UTC
Permalink
On Tue, 09 Nov 2010 09:29:34 +1100, Scott Lystig Fritchie
Post by unknown
Regarding the labels of "key-value database" vs. "document-oriented
database", I typically think of KV DBs as considering their value blobs
to be opaque and doc-oriented DBs as understanding all (or at least
part) of the structure of their values.
* Hibari is KV
* Riak is both (the lower parts of the KV store treat blobs as
opaque, but both the map-reduce KV framework and the new Riak
Search app can parse the value blobs as JSON, Erlang terms, or
whatever a Search text analyzer decides to parse)
* CouchDB and MongoDB are doc-oriented
Hmmmm,

I guess that's one way of interpreting the terminology.

I'd argue that within the *context* of the original post "key-value
databases for RDBMS users", one would be talking more about stores like
CouchDB, MongoDB, Mnesia.

I think that the primary concern for a SQL-RDBMS user like Silas (and
myself a while back), looking for an alternative datastore would be some
reasonably powerful querying so one can ask the kind of complex questions
from the data that any decent SQL-RDBMS could answer. This would
necessitate a query engine that can facilitate peering into the structures
being stored, otherwise I wouldn't see the point in migrating.

- Edmond -
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
unknown
2010-11-09 14:07:40 UTC
Permalink
I'm not sure if you picked up on my point. You seem to have focused on
the "native SQL types".

The "strong types" I'm talking about are the complex data structures
mapped across tables using referential constraints to map relationships
(i.e. ORM).

regs,
Steve
Post by unknown
Post by unknown
It appears to me that this discussion is another expression of the
'strong vs weak/dynamic vs static type' discussion.
...it makes me suspect that an imperative and strongly-typed language
paradigm has been a very strong motivator in the evolution of SQL
databases; and perhaps the popularity of NoSQL/NotSQL is an expression/
outcome of the rise of recent trends in programming language uptake.
You *cannot* call the types in classic SQL "strong".
Numbers, strings, and byte strings for everything is what Joe is complaining
of and he is right to do so. Encoding something as a string or blob basically
results in the data base itself having no clue about the structure or meaning
of the data.
It is important to understand that SQL is not a good example of a relational
system. A move away from SQL *could* be a move towards relational!
One of the core concepts in relational systems is that of
enforced integrity constraints, via primary keys and foreign keys.
If you don't have any integrity constraints that you care to tell the
data base about, you probably don't need a relational system.
Continue reading on narkive:
Loading...