[xwiki-devs] [Idea] Rewrite Activity Stream + Stats using Elastic Search
Hi devs, I think that for data that are both not critical and high volume we should use ElasticSearch instead of saving them in our RDBMS. So the idea would be to have an embedded ES in XWiki by default (using the permanent directory to store its data) and admins could configure XWiki to use a separate ES instance (very similar to what we do with SOLR). Whenever a user modifies/creates/deletes/does operations on XObjects/etc, this is sent to ES. The AS UI queries ES to display the data. The Stats UI does the same. Pros: - scalability - performance - extensibility. It’s easy to evolve the schema in ES, and we can easily have several formats (as was proven by the Active Installs code) I’d like to start a POC in my “free” time. WDYT? Thanks -Vincent
Hello Vincent, While I strongly believe that a NoSQL-type of storage is a fundamentally good idea to store activity streams, I believe you may be attracted by applying ElasticSearch mostly on a superficial basis compared to Solr. Most analytics systems base indeed on noSQL storages, ElasticSearch and Solr are examples of such. Many bigger systems are used in other analytics solutions such as CouchDB and MongoDB. Almost all will optimize for the chosen views. My impression is that many persons are excited by ElasticSearch because it has fancy UIs, whereas Solr may be more optimized for its very effective caching. In both cases, the creation of an analytics system will involve designing a storage that architects for making effective the queries that are expected by the views of the analytics system, e.g. the row of page-view-counts along recent times. I would expect a Solr or ElasticSearch based Stats module to have few differences. One thing that is crucial when using a stats system (and, I believe, even if trying to adjust the SQL-stored-activity-stream by doing less writes) is that viewers should not expect a perfect real time updated view. ElasticSearch and Solr have the same behaviour: real time is only "near real time". Alternatively, the real-time aspect (as done by Google analytics for example) should be a completely separated view which probably bases on in-memory values. paul PS: did you consider using hsqlDB for a part of this? This is in memory and locks are certainly way less hurting. Persistence should be somewhat decoupled... PPS: schema evolution is never painless, even in a noSQL system. If a field needs to be merged or split, there is a price to it, whatever the storage system.
[email protected] <mailto:[email protected]> 21 novembre 2015 12:01 Hi devs,
I think that for data that are both not critical and high volume we should use ElasticSearch instead of saving them in our RDBMS.
So the idea would be to have an embedded ES in XWiki by default (using the permanent directory to store its data) and admins could configure XWiki to use a separate ES instance (very similar to what we do with SOLR).
Whenever a user modifies/creates/deletes/does operations on XObjects/etc, this is sent to ES.
The AS UI queries ES to display the data.
The Stats UI does the same.
Pros: - scalability - performance - extensibility. It’s easy to evolve the schema in ES, and we can easily have several formats (as was proven by the Active Installs code)
I’d like to start a POC in my “free” time.
WDYT?
Thanks -Vincent
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
On Sat, Nov 21, 2015 at 1:01 PM, [email protected] <[email protected]> wrote:
Hi devs,
I think that for data that are both not critical and high volume we should use ElasticSearch instead of saving them in our RDBMS.
Why ElasticSearch and not Solr or something else? There are many comparisons on the web between these two. I wouldn't chose one or another without an investigation. I agree that data that are both not critical and high volume could be stored outside our RDBMS. Thanks, Marius
So the idea would be to have an embedded ES in XWiki by default (using the permanent directory to store its data) and admins could configure XWiki to use a separate ES instance (very similar to what we do with SOLR).
Whenever a user modifies/creates/deletes/does operations on XObjects/etc, this is sent to ES.
The AS UI queries ES to display the data.
The Stats UI does the same.
Pros: - scalability - performance - extensibility. It’s easy to evolve the schema in ES, and we can easily have several formats (as was proven by the Active Installs code)
I’d like to start a POC in my “free” time.
WDYT?
Thanks -Vincent
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
Without jumping into the Solr v. ElasticSearch discussion, I am quite favorable to the general idea of a big-data store complementing the RDBMS. Especially if it might allow us to again enable stats that are more fine-grained than those collected by piwik. This also opens the door to possibly storing revision histories outside of the db which would be a big win for scalability and might even lead to one day recommending an embedded db for production, a big win for ease of installation. +1 for an investigation. Caleb On 23/11/15 10:26, Marius Dumitru Florea wrote:
On Sat, Nov 21, 2015 at 1:01 PM, [email protected] <[email protected]> wrote:
Hi devs,
I think that for data that are both not critical and high volume we should use ElasticSearch instead of saving them in our RDBMS.
Why ElasticSearch and not Solr or something else? There are many comparisons on the web between these two. I wouldn't chose one or another without an investigation.
I agree that data that are both not critical and high volume could be stored outside our RDBMS.
Thanks, Marius
So the idea would be to have an embedded ES in XWiki by default (using the permanent directory to store its data) and admins could configure XWiki to use a separate ES instance (very similar to what we do with SOLR).
Whenever a user modifies/creates/deletes/does operations on XObjects/etc, this is sent to ES.
The AS UI queries ES to display the data.
The Stats UI does the same.
Pros: - scalability - performance - extensibility. It’s easy to evolve the schema in ES, and we can easily have several formats (as was proven by the Active Installs code)
I’d like to start a POC in my “free” time.
WDYT?
Thanks -Vincent
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
Hi Marius, On 23 Nov 2015 at 10:26:21, Marius Dumitru Florea ([email protected](mailto:[email protected])) wrote:
On Sat, Nov 21, 2015 at 1:01 PM, [email protected] wrote:
Hi devs,
I think that for data that are both not critical and high volume we should use ElasticSearch instead of saving them in our RDBMS.
Why ElasticSearch and not Solr or something else? There are many comparisons on the web between these two. I wouldn't chose one or another without an investigation.
Indeed. I mentioned ES because I’ve used it with ActiveInstalls and I know how to use it and it seems to work fine. But you’re right. Note that even if we were to use SOLR it would be a separate instance at the logical level since we need to let our users install it on a separate machine different from the SOLR instance we use for search. I have no experience with SOLR at this stage so it would mean spending several more days in order to come up with a POC so I’ll leave this to someone else since I don’t have the time ATM. TBH I’m not even sure I have the time for a POC based on ES right now ;) Note 2: My idea was to do a quick POC of coding AS + Stats as extensions, in xwiki-contrib to start with. Of course the XWiki Dev Team plans would probably be to take over those extensions into the core which means it’s better to get an agreement on the technologies to use from the onset...
I agree that data that are both not critical and high volume could be stored outside our RDBMS.
Yes, this is the important part for me: agreeing that we should reimplement AS + Stats using some external store (and since it’s high volume, a non RDBMS store). Thanks -Vincent
Thanks, Marius
So the idea would be to have an embedded ES in XWiki by default (using the permanent directory to store its data) and admins could configure XWiki to use a separate ES instance (very similar to what we do with SOLR).
Whenever a user modifies/creates/deletes/does operations on XObjects/etc, this is sent to ES.
The AS UI queries ES to display the data.
The Stats UI does the same.
Pros: - scalability - performance - extensibility. It’s easy to evolve the schema in ES, and we can easily have several formats (as was proven by the Active Installs code)
I’d like to start a POC in my “free” time.
WDYT?
Thanks -Vincent
Same here, we already have done lots of work to deal with Solr so it seems to be using Solr would be a better fit for us. Just need to add another core (each core has it's own shema). On Mon, Nov 23, 2015 at 10:26 AM, Marius Dumitru Florea < [email protected]> wrote:
On Sat, Nov 21, 2015 at 1:01 PM, [email protected] <[email protected]> wrote:
Hi devs,
I think that for data that are both not critical and high volume we should use ElasticSearch instead of saving them in our RDBMS.
Why ElasticSearch and not Solr or something else? There are many comparisons on the web between these two. I wouldn't chose one or another without an investigation.
I agree that data that are both not critical and high volume could be stored outside our RDBMS.
Thanks, Marius
So the idea would be to have an embedded ES in XWiki by default (using
the
permanent directory to store its data) and admins could configure XWiki to use a separate ES instance (very similar to what we do with SOLR).
Whenever a user modifies/creates/deletes/does operations on XObjects/etc, this is sent to ES.
The AS UI queries ES to display the data.
The Stats UI does the same.
Pros: - scalability - performance - extensibility. It’s easy to evolve the schema in ES, and we can easily have several formats (as was proven by the Active Installs code)
I’d like to start a POC in my “free” time.
WDYT?
Thanks -Vincent
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
-- Thomas Mortagne
Note that one reason I started this thread is because we want to rewrite the AS (see http://design.xwiki.org/xwiki/bin/view/Proposal/AcitvityStreamRefactoring62) and IMO if we do this we should not continue to store the events in main store (RDBMS). We also know that stats can be a bit slow and it also doesn’t make sense IMO to store them in the main store. So my main goal is to see if we agree on these 2 points. Thanks -Vincent On 21 Nov 2015 at 12:01:31, [email protected] ([email protected](mailto:[email protected])) wrote:
Hi devs,
I think that for data that are both not critical and high volume we should use ElasticSearch instead of saving them in our RDBMS.
So the idea would be to have an embedded ES in XWiki by default (using the permanent directory to store its data) and admins could configure XWiki to use a separate ES instance (very similar to what we do with SOLR).
Whenever a user modifies/creates/deletes/does operations on XObjects/etc, this is sent to ES.
The AS UI queries ES to display the data.
The Stats UI does the same.
Pros: - scalability - performance - extensibility. It’s easy to evolve the schema in ES, and we can easily have several formats (as was proven by the Active Installs code)
I’d like to start a POC in my “free” time.
WDYT?
Thanks -Vincent
+1 for the non-RDBMS approach for AS and Stats. Makes sense for transient and maybe loosly-structured (e.g. event parameters) information. +1 for using Solr with a separate core, unless some technical limitation exists, since I would prefer avoiding bloating XWiki more than it already is and increasing complexity. Thanks, Eduard On Mon, Nov 23, 2015 at 11:48 AM, [email protected] <[email protected]> wrote:
Note that one reason I started this thread is because we want to rewrite the AS (see http://design.xwiki.org/xwiki/bin/view/Proposal/AcitvityStreamRefactoring62) and IMO if we do this we should not continue to store the events in main store (RDBMS).
We also know that stats can be a bit slow and it also doesn’t make sense IMO to store them in the main store.
So my main goal is to see if we agree on these 2 points.
Thanks -Vincent
On 21 Nov 2015 at 12:01:31, [email protected] ([email protected](mailto: [email protected])) wrote:
Hi devs,
I think that for data that are both not critical and high volume we should use ElasticSearch instead of saving them in our RDBMS.
So the idea would be to have an embedded ES in XWiki by default (using the permanent directory to store its data) and admins could configure XWiki to use a separate ES instance (very similar to what we do with SOLR).
Whenever a user modifies/creates/deletes/does operations on XObjects/etc, this is sent to ES.
The AS UI queries ES to display the data.
The Stats UI does the same.
Pros: - scalability - performance - extensibility. It’s easy to evolve the schema in ES, and we can easily have several formats (as was proven by the Active Installs code)
I’d like to start a POC in my “free” time.
WDYT?
Thanks -Vincent
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
+1 for storing stats outside the main database system, either in a no-SQL system or in a (separate) HSQLDB (as suggested by Paul). Thanks, Guillaume 2015-11-23 12:50 GMT+01:00 Eduard Moraru <[email protected]>:
+1 for the non-RDBMS approach for AS and Stats. Makes sense for transient and maybe loosly-structured (e.g. event parameters) information.
+1 for using Solr with a separate core, unless some technical limitation exists, since I would prefer avoiding bloating XWiki more than it already is and increasing complexity.
Thanks, Eduard
On Mon, Nov 23, 2015 at 11:48 AM, [email protected] <[email protected]> wrote:
Note that one reason I started this thread is because we want to rewrite the AS (see
http://design.xwiki.org/xwiki/bin/view/Proposal/AcitvityStreamRefactoring62 )
and IMO if we do this we should not continue to store the events in main store (RDBMS).
We also know that stats can be a bit slow and it also doesn’t make sense IMO to store them in the main store.
So my main goal is to see if we agree on these 2 points.
Thanks -Vincent
On 21 Nov 2015 at 12:01:31, [email protected] ([email protected] (mailto: [email protected])) wrote:
Hi devs,
I think that for data that are both not critical and high volume we should use ElasticSearch instead of saving them in our RDBMS.
So the idea would be to have an embedded ES in XWiki by default (using the permanent directory to store its data) and admins could configure XWiki to use a separate ES instance (very similar to what we do with SOLR).
Whenever a user modifies/creates/deletes/does operations on XObjects/etc, this is sent to ES.
The AS UI queries ES to display the data.
The Stats UI does the same.
Pros: - scalability - performance - extensibility. It’s easy to evolve the schema in ES, and we can easily have several formats (as was proven by the Active Installs code)
I’d like to start a POC in my “free” time.
WDYT?
Thanks -Vincent
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
-- Guillaume Delhumeau ([email protected]) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project
participants (7)
-
Caleb James DeLisle -
Eduard Moraru -
Guillaume "Louis-Marie" Delhumeau -
Marius Dumitru Florea -
Paul Libbrecht -
Thomas Mortagne -
vincent@massol.net