[xwiki-devs] [Brainstorming] Notifications sending email

List overview All Threads
Download

newer

older

[xwiki-devs] [GSoC] More...

[xwiki-devs] [Brainstorming] Make...

Guillaume Delhumeau

31 May 2017 31 May '17

12:15 p.m.

Help me to decide!

TL;DR:

* I need to know if performing a query on the database for each user who want to receive an email with all the notifications, is a scalability issue (in a job context). * If it's not an issue, I can implement the "naïve" solution which requires less development.

Full message:

Status: * notifications are displayed on the top menu when you browse the wiki. * notifications are displayed differently for each individual user according to their preferences (filters on event type, on locations, etc...). * similar notifications are grouped together into "composite notifications". * there is only a few notifications displayed (5 by default).

Objective: * send an email periodically (every hour, every day, every week) according to the user preferences with ALL events that happened during the last period of time, but still according to the user preferences.

Inspiration: * the watchlist gets ALL events that happened during the last period of time * then, for each user, remove the events which the user is not interested in * Benefit: only one query to get the events from the database for all users

Problems: * in the notifications, I have introduced a NotificationFilter role the make possible to inject some SQL in the query to get the events according to the user preferences. I call this "pre-filters". ** it means we generate a unique request for each individual user, so if we send a mail to 1000 users, we will have 1000 requests to the database.

I wonder if it's a non-problem or a big scability issue. Because even if the whole job that send emails take ~10 minutes, it does not matter. It's not a realtime thing.

For the records, NotificationFilter have "post-filters" too, that perform check on the event itself (for example checking the permissions, etc...).

Alternatives: * just like the watchlist, perform a very generic query on the database to get all the events that happened during the last period of time * then for each user, use only the "post-filters" to remove events the user don't care of

Problem: * it means the pre-filters that make sense in the notification use-case cannot be used for emails. Developers must be aware of this. * it requires some refactoring of the code that group similar notifications.

Question: Should I go with the "naive" solution, ie for each user get all notifications and send a mail, or should I go with the "only 1 query to the database for all users" version?

Thanks,

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

Show replies by date

Thomas Mortagne

31 May 31 May

2:45 p.m.

On Wed, May 31, 2017 at 12:15 PM, Guillaume Delhumeau guillaume.delhumeau@xwiki.com wrote:

...

Help me to decide!

TL;DR:

I need to know if performing a query on the database for each user who

want to receive an email with all the notifications, is a scalability issue (in a job context).

If it's not an issue, I can implement the "naïve" solution which requires

less development.

Full message:

Status:

notifications are displayed on the top menu when you browse the wiki.

notifications are displayed differently for each individual user

according to their preferences (filters on event type, on locations, etc...).

similar notifications are grouped together into "composite notifications".

there is only a few notifications displayed (5 by default).

Objective:

send an email periodically (every hour, every day, every week) according

to the user preferences with ALL events that happened during the last period of time, but still according to the user preferences.

Inspiration:

the watchlist gets ALL events that happened during the last period of time

then, for each user, remove the events which the user is not interested in

Benefit: only one query to get the events from the database for all users

Problems:

in the notifications, I have introduced a NotificationFilter role the

make possible to inject some SQL in the query to get the events according to the user preferences. I call this "pre-filters". ** it means we generate a unique request for each individual user, so if we send a mail to 1000 users, we will have 1000 requests to the database.

I wonder if it's a non-problem or a big scability issue. Because even if the whole job that send emails take ~10 minutes, it does not matter. It's not a realtime thing.

Not sure what is the best yet but note that what the Watchlist is currently doing is a big scalability issue as soon as you do a lot of things between two watchlist schedules even when no user enabled anything. See http://jira.xwiki.org/browse/XWIKI-10594. So I would really not recommend keep doing the same behavior.

In other words whatever design you choose make sure it does not involve putting in memory all the events you need to manipulate because you have no idea how many you will end up with.

...

For the records, NotificationFilter have "post-filters" too, that perform check on the event itself (for example checking the permissions, etc...).

Alternatives:

just like the watchlist, perform a very generic query on the database to

get all the events that happened during the last period of time

then for each user, use only the "post-filters" to remove events the user

don't care of

Problem:

it means the pre-filters that make sense in the notification use-case

cannot be used for emails. Developers must be aware of this.

it requires some refactoring of the code that group similar notifications.

Question: Should I go with the "naive" solution, ie for each user get all notifications and send a mail, or should I go with the "only 1 query to the database for all users" version?

Thanks,

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

-- Thomas Mortagne

Vincent Massol

3:59 p.m.

Hi Guillaume,

...

On 31 May 2017, at 12:15, Guillaume Delhumeau guillaume.delhumeau@xwiki.com wrote:

Help me to decide!

TL;DR:

I need to know if performing a query on the database for each user who

want to receive an email with all the notifications, is a scalability issue (in a job context).

Yes whenever we do a lot of queries to the DB it’s a scalability issue. If we have 100K users then it’s 100K queries for definitely a scalability issue.

We need to find a way to do a single query (or a small fixed number of queries independent of the # of users).

If not possible then we may need to either: A) Add some new table in our DB to help do that B) Use some tool other than the DB, e.g. SOLR, etc

Thanks -Vincent

...

If it's not an issue, I can implement the "naïve" solution which requires

less development.

Full message:

Status:

notifications are displayed on the top menu when you browse the wiki.

notifications are displayed differently for each individual user

according to their preferences (filters on event type, on locations, etc...).

similar notifications are grouped together into "composite notifications".

there is only a few notifications displayed (5 by default).

Objective:

send an email periodically (every hour, every day, every week) according

to the user preferences with ALL events that happened during the last period of time, but still according to the user preferences.

Inspiration:

the watchlist gets ALL events that happened during the last period of time

then, for each user, remove the events which the user is not interested in

Benefit: only one query to get the events from the database for all users

Problems:

in the notifications, I have introduced a NotificationFilter role the

make possible to inject some SQL in the query to get the events according to the user preferences. I call this "pre-filters". ** it means we generate a unique request for each individual user, so if we send a mail to 1000 users, we will have 1000 requests to the database.

I wonder if it's a non-problem or a big scability issue. Because even if the whole job that send emails take ~10 minutes, it does not matter. It's not a realtime thing.

For the records, NotificationFilter have "post-filters" too, that perform check on the event itself (for example checking the permissions, etc...).

Alternatives:

just like the watchlist, perform a very generic query on the database to

get all the events that happened during the last period of time

then for each user, use only the "post-filters" to remove events the user

don't care of

Problem:

it means the pre-filters that make sense in the notification use-case

cannot be used for emails. Developers must be aware of this.

it requires some refactoring of the code that group similar notifications.

Question: Should I go with the "naive" solution, ie for each user get all notifications and send a mail, or should I go with the "only 1 query to the database for all users" version?

Thanks,

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

Vincent Massol

3:59 p.m.

...

On 31 May 2017, at 15:59, Vincent Massol vincent@massol.net wrote:

Hi Guillaume,

...
On 31 May 2017, at 12:15, Guillaume Delhumeau guillaume.delhumeau@xwiki.com wrote:

Help me to decide!

TL;DR:

I need to know if performing a query on the database for each user who

want to receive an email with all the notifications, is a scalability issue (in a job context).

Yes whenever we do a lot of queries to the DB it’s a scalability issue. If we have 100K users then it’s 100K queries for definitely a scalability issue.

We need to find a way to do a single query (or a small fixed number of queries independent of the # of users).

If not possible then we may need to either: A) Add some new table in our DB to help do that B) Use some tool other than the DB, e.g. SOLR, etc

PS: I forgot to mention that I haven’t read the full message part yet before answering :)

Thanks -Vincent

...

Thanks -Vincent

...

If it's not an issue, I can implement the "naïve" solution which requires

less development.

Full message:

Status:

notifications are displayed on the top menu when you browse the wiki.

notifications are displayed differently for each individual user

according to their preferences (filters on event type, on locations, etc...).

similar notifications are grouped together into "composite notifications".

there is only a few notifications displayed (5 by default).

Objective:

send an email periodically (every hour, every day, every week) according

to the user preferences with ALL events that happened during the last period of time, but still according to the user preferences.

Inspiration:

the watchlist gets ALL events that happened during the last period of time

then, for each user, remove the events which the user is not interested in

Benefit: only one query to get the events from the database for all users

Problems:

in the notifications, I have introduced a NotificationFilter role the

make possible to inject some SQL in the query to get the events according to the user preferences. I call this "pre-filters". ** it means we generate a unique request for each individual user, so if we send a mail to 1000 users, we will have 1000 requests to the database.

I wonder if it's a non-problem or a big scability issue. Because even if the whole job that send emails take ~10 minutes, it does not matter. It's not a realtime thing.

For the records, NotificationFilter have "post-filters" too, that perform check on the event itself (for example checking the permissions, etc...).

Alternatives:

just like the watchlist, perform a very generic query on the database to

get all the events that happened during the last period of time

then for each user, use only the "post-filters" to remove events the user

don't care of

Problem:

it means the pre-filters that make sense in the notification use-case

cannot be used for emails. Developers must be aware of this.

it requires some refactoring of the code that group similar notifications.

Question: Should I go with the "naive" solution, ie for each user get all notifications and send a mail, or should I go with the "only 1 query to the database for all users" version?

Thanks,

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

Guillaume Delhumeau

8:50 p.m.

2017-05-31 15:59 GMT+02:00 Vincent Massol vincent@massol.net:

...

Hi Guillaume,

...
On 31 May 2017, at 12:15, Guillaume Delhumeau <

guillaume.delhumeau@xwiki.com> wrote:

...
Help me to decide!

TL;DR:

I need to know if performing a query on the database for each user who

want to receive an email with all the notifications, is a scalability

issue

...
(in a job context).

Yes whenever we do a lot of queries to the DB it’s a scalability issue. If we have 100K users then it’s 100K queries for definitely a scalability issue.

Well, in that case, I don't know if sending 100K emails is scalable too.

...

We need to find a way to do a single query (or a small fixed number of queries independent of the # of users).

If not possible then we may need to either: A) Add some new table in our DB to help do that B) Use some tool other than the DB, e.g. SOLR, etc

Thanks -Vincent

...

If it's not an issue, I can implement the "naïve" solution which

requires

...
less development.

Full message:

Status:

notifications are displayed on the top menu when you browse the wiki.

notifications are displayed differently for each individual user

according to their preferences (filters on event type, on locations, etc...).

similar notifications are grouped together into "composite

notifications".

...

there is only a few notifications displayed (5 by default).

Objective:

send an email periodically (every hour, every day, every week)

according

...
to the user preferences with ALL events that happened during the last period of time, but still according to the user preferences.

Inspiration:

the watchlist gets ALL events that happened during the last period of

time

...

then, for each user, remove the events which the user is not

interested in

...

Benefit: only one query to get the events from the database for all

users

...
Problems:

in the notifications, I have introduced a NotificationFilter role the

make possible to inject some SQL in the query to get the events according to the user preferences. I call this "pre-filters". ** it means we generate a unique request for each individual user, so if

we

...
send a mail to 1000 users, we will have 1000 requests to the database.

I wonder if it's a non-problem or a big scability issue. Because even if the whole job that send emails take ~10 minutes, it does not matter. It's not a realtime thing.

For the records, NotificationFilter have "post-filters" too, that perform check on the event itself (for example checking the permissions, etc...).

Alternatives:

just like the watchlist, perform a very generic query on the database

to

...
get all the events that happened during the last period of time

then for each user, use only the "post-filters" to remove events the

user

...
don't care of

Problem:

it means the pre-filters that make sense in the notification use-case

cannot be used for emails. Developers must be aware of this.

it requires some refactoring of the code that group similar

notifications.

...
Question: Should I go with the "naive" solution, ie for each user get all notifications and send a mail, or should I go with the "only 1 query to

the

...
database for all users" version?

Thanks,

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

Vincent Massol

9:16 p.m.

Hi,

...

On 31 May 2017, at 20:50, Guillaume Delhumeau guillaume.delhumeau@xwiki.com wrote:

2017-05-31 15:59 GMT+02:00 Vincent Massol vincent@massol.net:

...
Hi Guillaume,

...
On 31 May 2017, at 12:15, Guillaume Delhumeau <

guillaume.delhumeau@xwiki.com> wrote:

...
Help me to decide!

TL;DR:

I need to know if performing a query on the database for each user who

want to receive an email with all the notifications, is a scalability

issue

...
(in a job context).

Yes whenever we do a lot of queries to the DB it’s a scalability issue. If we have 100K users then it’s 100K queries for definitely a scalability issue.

Well, in that case, I don't know if sending 100K emails is scalable too.

The new mail system is made for that. There’s a single mail thread (actually 2 but that’s a detail) and it can send an infinite number of mails without slowing down XWiki. Ofc the only thing not guaranteed is how long it takes to do so. But that can be fixed outside of XWiki by having a proxy mail server which would accept immediately all mails sent by XWiki before forwarding them to some cluster of mail servers. It may not be enough though and maybe sending 100K mails to the proxy mail server would already take too long. Would be interesting to have some measure of how long it takes to send a single mail. I think I did some computation at some point but i don’t remember the results.

Do you mean that the notification center would execute the DB queries one by one? In this case it could work indeed and it should be left to the mail module to handle that by implementing a custom MimeMessageFactory with an iterator. It’s important to delegate this to the mail sender API IMO. See UsersAndGroupsMimeMessageFactory for an example. AFAIR Edy refactored the watchlist to use a MimeMessageFactory.

Thanks -Vincent

...

We need to find a way to do a single query (or a small fixed number of

...
queries independent of the # of users).

If not possible then we may need to either: A) Add some new table in our DB to help do that B) Use some tool other than the DB, e.g. SOLR, etc

Thanks -Vincent

...

If it's not an issue, I can implement the "naïve" solution which

requires

...
less development.

Full message:

Status:

notifications are displayed on the top menu when you browse the wiki.

notifications are displayed differently for each individual user

according to their preferences (filters on event type, on locations, etc...).

similar notifications are grouped together into "composite

notifications".

...

there is only a few notifications displayed (5 by default).

Objective:

send an email periodically (every hour, every day, every week)

according

...
to the user preferences with ALL events that happened during the last period of time, but still according to the user preferences.

Inspiration:

the watchlist gets ALL events that happened during the last period of

time

...

then, for each user, remove the events which the user is not

interested in

...

Benefit: only one query to get the events from the database for all

users

...
Problems:

in the notifications, I have introduced a NotificationFilter role the

make possible to inject some SQL in the query to get the events according to the user preferences. I call this "pre-filters". ** it means we generate a unique request for each individual user, so if

we

...
send a mail to 1000 users, we will have 1000 requests to the database.

I wonder if it's a non-problem or a big scability issue. Because even if the whole job that send emails take ~10 minutes, it does not matter. It's not a realtime thing.

For the records, NotificationFilter have "post-filters" too, that perform check on the event itself (for example checking the permissions, etc...).

Alternatives:

just like the watchlist, perform a very generic query on the database

to

...
get all the events that happened during the last period of time

then for each user, use only the "post-filters" to remove events the

user

...
don't care of

Problem:

it means the pre-filters that make sense in the notification use-case

cannot be used for emails. Developers must be aware of this.

it requires some refactoring of the code that group similar

notifications.

...
Question: Should I go with the "naive" solution, ie for each user get all notifications and send a mail, or should I go with the "only 1 query to

the

...
database for all users" version?

Thanks,

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

Guillaume Delhumeau

1 Jun 1 Jun

9:36 a.m.

2017-05-31 21:16 GMT+02:00 Vincent Massol vincent@massol.net:

...

Hi,

...
On 31 May 2017, at 20:50, Guillaume Delhumeau <

guillaume.delhumeau@xwiki.com> wrote:

...
2017-05-31 15:59 GMT+02:00 Vincent Massol vincent@massol.net:

...
Hi Guillaume,

...
On 31 May 2017, at 12:15, Guillaume Delhumeau <

guillaume.delhumeau@xwiki.com> wrote:

...
Help me to decide!

TL;DR:

I need to know if performing a query on the database for each user

who

...
...
...
want to receive an email with all the notifications, is a scalability

issue

...
(in a job context).

Yes whenever we do a lot of queries to the DB it’s a scalability issue.

If

...
...
we have 100K users then it’s 100K queries for definitely a scalability issue.

Well, in that case, I don't know if sending 100K emails is scalable too.

The new mail system is made for that. There’s a single mail thread (actually 2 but that’s a detail) and it can send an infinite number of mails without slowing down XWiki. Ofc the only thing not guaranteed is how long it takes to do so. But that can be fixed outside of XWiki by having a proxy mail server which would accept immediately all mails sent by XWiki before forwarding them to some cluster of mail servers. It may not be enough though and maybe sending 100K mails to the proxy mail server would already take too long. Would be interesting to have some measure of how long it takes to send a single mail. I think I did some computation at some point but i don’t remember the results.

Do you mean that the notification center would execute the DB queries one by one?

Yes this is what I mean.

...

In this case it could work indeed and it should be left to the mail module to handle that by implementing a custom MimeMessageFactory with an iterator. It’s important to delegate this to the mail sender API IMO. See UsersAndGroupsMimeMessageFactory for an example. AFAIR Edy refactored the watchlist to use a MimeMessageFactory.

Thanks -Vincent

...
We need to find a way to do a single query (or a small fixed number of

...
queries independent of the # of users).

If not possible then we may need to either: A) Add some new table in our DB to help do that B) Use some tool other than the DB, e.g. SOLR, etc

Thanks -Vincent

...

If it's not an issue, I can implement the "naïve" solution which

requires

...
less development.

Full message:

Status:

notifications are displayed on the top menu when you browse the wiki.

notifications are displayed differently for each individual user

according to their preferences (filters on event type, on locations, etc...).

similar notifications are grouped together into "composite

notifications".

...

there is only a few notifications displayed (5 by default).

Objective:

send an email periodically (every hour, every day, every week)

according

...
to the user preferences with ALL events that happened during the last period of time, but still according to the user preferences.

Inspiration:

the watchlist gets ALL events that happened during the last period of

time

...

then, for each user, remove the events which the user is not

interested in

...

Benefit: only one query to get the events from the database for all

users

...
Problems:

in the notifications, I have introduced a NotificationFilter role the

make possible to inject some SQL in the query to get the events

according

...
...
...
to the user preferences. I call this "pre-filters". ** it means we generate a unique request for each individual user, so

if

...
...
we

...
send a mail to 1000 users, we will have 1000 requests to the database.

I wonder if it's a non-problem or a big scability issue. Because even

if

...
...
...
the whole job that send emails take ~10 minutes, it does not matter.

It's

...
...
...
not a realtime thing.

For the records, NotificationFilter have "post-filters" too, that

perform

...
...
...
check on the event itself (for example checking the permissions,

etc...).

...
...
...
Alternatives:

just like the watchlist, perform a very generic query on the database

to

...
get all the events that happened during the last period of time

then for each user, use only the "post-filters" to remove events the

user

...
don't care of

Problem:

it means the pre-filters that make sense in the notification use-case

cannot be used for emails. Developers must be aware of this.

it requires some refactoring of the code that group similar

notifications.

...
Question: Should I go with the "naive" solution, ie for each user get all notifications and send a mail, or should I go with the "only 1 query to

the

...
database for all users" version?

Thanks,

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

Vincent Massol

9:58 a.m.

...

On 1 Jun 2017, at 09:36, Guillaume Delhumeau guillaume.delhumeau@xwiki.com wrote:

2017-05-31 21:16 GMT+02:00 Vincent Massol vincent@massol.net:

...
Hi,

...
On 31 May 2017, at 20:50, Guillaume Delhumeau <

guillaume.delhumeau@xwiki.com> wrote:

...
2017-05-31 15:59 GMT+02:00 Vincent Massol vincent@massol.net:

...
Hi Guillaume,

...
On 31 May 2017, at 12:15, Guillaume Delhumeau <

guillaume.delhumeau@xwiki.com> wrote:

...
Help me to decide!

TL;DR:

I need to know if performing a query on the database for each user

who

...
...
...
want to receive an email with all the notifications, is a scalability

issue

...
(in a job context).

Yes whenever we do a lot of queries to the DB it’s a scalability issue.

If

...
...
we have 100K users then it’s 100K queries for definitely a scalability issue.

Well, in that case, I don't know if sending 100K emails is scalable too.

The new mail system is made for that. There’s a single mail thread (actually 2 but that’s a detail) and it can send an infinite number of mails without slowing down XWiki. Ofc the only thing not guaranteed is how long it takes to do so. But that can be fixed outside of XWiki by having a proxy mail server which would accept immediately all mails sent by XWiki before forwarding them to some cluster of mail servers. It may not be enough though and maybe sending 100K mails to the proxy mail server would already take too long. Would be interesting to have some measure of how long it takes to send a single mail. I think I did some computation at some point but i don’t remember the results.

Do you mean that the notification center would execute the DB queries one by one?

Yes this is what I mean.

ok so the issue is how a large company who wants to use XWiki could speed up the mai delivery so that it becomes close to realtime with 100K users.

Some questions: * If we do one DB query per user how long would it take for 100K users? * How long it takes to prepare 100K emails with a static fixed string content? * How long does it take to send the 100K-prepared emails IF the receiving mail server respond instantaneously and is on the same local machine?

The last 2 questions is to evaluate the mail sender performance itself. If the answer is acceptable then the large company could use a proxy mail server as I mentioned earlier and we need to ensure that the preparation of the emails is very fast on our side. Doing 100K queries would definitely not scale in this case (if each query takes 30ms, then that would be 3000 seconds, which is 50 minutes, which is about 1 hour). Since we have hourly jobs, the previous job would not even finish before the next one triggers…

And I’m not even considering the realtime use case. For realtime, we should do the other way around anyway, ie find who’s subscribed to the related notification(s).

In any case since number of users are unbound and can reach very high values, I don’t think we should iterate over users. There are a fixed numbers of events during a time period and since there are a lot more reads than writes on a wiki (more than 80% are readers), it’s a number that would stay low even when you have 100K users. So I would always try to find events first and then in a single query find all users having subscribed to those events.

Thanks -Vincent

...

...
In this case it could work indeed and it should be left to the mail module to handle that by implementing a custom MimeMessageFactory with an iterator. It’s important to delegate this to the mail sender API IMO. See UsersAndGroupsMimeMessageFactory for an example. AFAIR Edy refactored the watchlist to use a MimeMessageFactory.

Thanks -Vincent

...
We need to find a way to do a single query (or a small fixed number of

...
queries independent of the # of users).

If not possible then we may need to either: A) Add some new table in our DB to help do that B) Use some tool other than the DB, e.g. SOLR, etc

Thanks -Vincent

...

If it's not an issue, I can implement the "naïve" solution which

requires

...
less development.

Full message:

Status:

notifications are displayed on the top menu when you browse the wiki.

notifications are displayed differently for each individual user

according to their preferences (filters on event type, on locations, etc...).

similar notifications are grouped together into "composite

notifications".

...

there is only a few notifications displayed (5 by default).

Objective:

send an email periodically (every hour, every day, every week)

according

...
to the user preferences with ALL events that happened during the last period of time, but still according to the user preferences.

Inspiration:

the watchlist gets ALL events that happened during the last period of

time

...

then, for each user, remove the events which the user is not

interested in

...

Benefit: only one query to get the events from the database for all

users

...
Problems:

in the notifications, I have introduced a NotificationFilter role the

make possible to inject some SQL in the query to get the events

according

...
...
...
to the user preferences. I call this "pre-filters". ** it means we generate a unique request for each individual user, so

if

...
...
we

...
send a mail to 1000 users, we will have 1000 requests to the database.

I wonder if it's a non-problem or a big scability issue. Because even

if

...
...
...
the whole job that send emails take ~10 minutes, it does not matter.

It's

...
...
...
not a realtime thing.

For the records, NotificationFilter have "post-filters" too, that

perform

...
...
...
check on the event itself (for example checking the permissions,

etc...).

...
...
...
Alternatives:

just like the watchlist, perform a very generic query on the database

to

...
get all the events that happened during the last period of time

then for each user, use only the "post-filters" to remove events the

user

...
don't care of

Problem:

it means the pre-filters that make sense in the notification use-case

cannot be used for emails. Developers must be aware of this.

it requires some refactoring of the code that group similar

notifications.

...
Question: Should I go with the "naive" solution, ie for each user get all notifications and send a mail, or should I go with the "only 1 query to

the

...
database for all users" version?

Thanks,

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

Thomas Mortagne

10:50 a.m.

I don't think the time it takes to send a mail is relevant in this discussion. It's not like there was any choice here, if 100K users have mail notifications enabled you will have to sent 100K mails, it does not have any impact on the design of the notification module side. It sure is interesting from for mail sender module and system administration but that's a different subject.

Back to Notification module, as I said the "let's get everything and then filter it later" is really a bad idea from memory point of view (and is a big issue in current watchlist implementation) unless you don't put stuff to filter in memory but then you end up reinventing a database and then query it for each user so IMO you should go in the "let's deal with each user separately". By the way you might need to consider sending several mails depending on the number of events, to be safe.

That's for the general idea. IMO you should go this way, do tests and optimize what you can in what needs to be done for sending a user mail and it's possible it's actually not that long without doing anything special (a request that only deal with one table with proper indices is very fast usually). Also since it's a background thread, speed is less critical (unless it takes a ridiculous among of time of course :)), certainly far less important that trying to not steeling all the available memory to front threads.

On the more technical details Vincent talked about MimeMessageFactory API and I agree implementing a NotificationMimeMessageFactory would probably be a good idea.

On Thu, Jun 1, 2017 at 9:58 AM, Vincent Massol vincent@massol.net wrote:

...

...
On 1 Jun 2017, at 09:36, Guillaume Delhumeau guillaume.delhumeau@xwiki.com wrote:

2017-05-31 21:16 GMT+02:00 Vincent Massol vincent@massol.net:

...
Hi,

...
On 31 May 2017, at 20:50, Guillaume Delhumeau <

guillaume.delhumeau@xwiki.com> wrote:

...
2017-05-31 15:59 GMT+02:00 Vincent Massol vincent@massol.net:

...
Hi Guillaume,

...
On 31 May 2017, at 12:15, Guillaume Delhumeau <

guillaume.delhumeau@xwiki.com> wrote:

...
Help me to decide!

TL;DR:

I need to know if performing a query on the database for each user

who

...
...
...
want to receive an email with all the notifications, is a scalability

issue

...
(in a job context).

Yes whenever we do a lot of queries to the DB it’s a scalability issue.

If

...
...
we have 100K users then it’s 100K queries for definitely a scalability issue.

Well, in that case, I don't know if sending 100K emails is scalable too.

The new mail system is made for that. There’s a single mail thread (actually 2 but that’s a detail) and it can send an infinite number of mails without slowing down XWiki. Ofc the only thing not guaranteed is how long it takes to do so. But that can be fixed outside of XWiki by having a proxy mail server which would accept immediately all mails sent by XWiki before forwarding them to some cluster of mail servers. It may not be enough though and maybe sending 100K mails to the proxy mail server would already take too long. Would be interesting to have some measure of how long it takes to send a single mail. I think I did some computation at some point but i don’t remember the results.

Do you mean that the notification center would execute the DB queries one by one?

Yes this is what I mean.

ok so the issue is how a large company who wants to use XWiki could speed up the mai delivery so that it becomes close to realtime with 100K users.

Some questions:

If we do one DB query per user how long would it take for 100K users?

How long it takes to prepare 100K emails with a static fixed string content?

How long does it take to send the 100K-prepared emails IF the receiving mail server respond instantaneously and is on the same local machine?

The last 2 questions is to evaluate the mail sender performance itself. If the answer is acceptable then the large company could use a proxy mail server as I mentioned earlier and we need to ensure that the preparation of the emails is very fast on our side. Doing 100K queries would definitely not scale in this case (if each query takes 30ms, then that would be 3000 seconds, which is 50 minutes, which is about 1 hour). Since we have hourly jobs, the previous job would not even finish before the next one triggers…

And I’m not even considering the realtime use case. For realtime, we should do the other way around anyway, ie find who’s subscribed to the related notification(s).

In any case since number of users are unbound and can reach very high values, I don’t think we should iterate over users. There are a fixed numbers of events during a time period and since there are a lot more reads than writes on a wiki (more than 80% are readers), it’s a number that would stay low even when you have 100K users. So I would always try to find events first and then in a single query find all users having subscribed to those events.

Thanks -Vincent

...
...
In this case it could work indeed and it should be left to the mail module to handle that by implementing a custom MimeMessageFactory with an iterator. It’s important to delegate this to the mail sender API IMO. See UsersAndGroupsMimeMessageFactory for an example. AFAIR Edy refactored the watchlist to use a MimeMessageFactory.

Thanks -Vincent

...
We need to find a way to do a single query (or a small fixed number of

...
queries independent of the # of users).

If not possible then we may need to either: A) Add some new table in our DB to help do that B) Use some tool other than the DB, e.g. SOLR, etc

Thanks -Vincent

...

If it's not an issue, I can implement the "naïve" solution which

requires

...
less development.

Full message:

Status:

notifications are displayed on the top menu when you browse the wiki.

notifications are displayed differently for each individual user

according to their preferences (filters on event type, on locations, etc...).

similar notifications are grouped together into "composite

notifications".

...

there is only a few notifications displayed (5 by default).

Objective:

send an email periodically (every hour, every day, every week)

according

...
to the user preferences with ALL events that happened during the last period of time, but still according to the user preferences.

Inspiration:

the watchlist gets ALL events that happened during the last period of

time

...

then, for each user, remove the events which the user is not

interested in

...

Benefit: only one query to get the events from the database for all

users

...
Problems:

in the notifications, I have introduced a NotificationFilter role the

make possible to inject some SQL in the query to get the events

according

...
...
...
to the user preferences. I call this "pre-filters". ** it means we generate a unique request for each individual user, so

if

...
...
we

...
send a mail to 1000 users, we will have 1000 requests to the database.

I wonder if it's a non-problem or a big scability issue. Because even

if

...
...
...
the whole job that send emails take ~10 minutes, it does not matter.

It's

...
...
...
not a realtime thing.

For the records, NotificationFilter have "post-filters" too, that

perform

...
...
...
check on the event itself (for example checking the permissions,

etc...).

...
...
...
Alternatives:

just like the watchlist, perform a very generic query on the database

to

...
get all the events that happened during the last period of time

then for each user, use only the "post-filters" to remove events the

user

...
don't care of

Problem:

it means the pre-filters that make sense in the notification use-case

cannot be used for emails. Developers must be aware of this.

it requires some refactoring of the code that group similar

notifications.

...
Question: Should I go with the "naive" solution, ie for each user get all notifications and send a mail, or should I go with the "only 1 query to

the

...
database for all users" version?

Thanks,

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

-- Thomas Mortagne

Vincent Massol

11:01 a.m.

...

On 1 Jun 2017, at 10:50, Thomas Mortagne thomas.mortagne@xwiki.com wrote:

I don't think the time it takes to send a mail is relevant in this discussion. It's not like there was any choice here, if 100K users have mail notifications enabled you will have to sent 100K mails, it does not have any impact on the design of the notification module side.

What’s important and what I tried to convey in my previous messages is that users/companies with large needs can use XWiki and not be blocked. If we do one DB query per user for example they’ll never be able to succeed in sending 100K mails fast. OTOH they can act at the mail server level by having a farm of mail servers. So we need to scale and be fast up to the point when we’re ready to send the mails.

If the mail preparation is too slow we may even need to change our architecture/design too in order to spread the preparation of emails on various XWiki instances or delegate that to some external processes (by putting the requests on a queue, to be handled by those external processes). Or at least have a design that allows companies with large needs to replace our implementation which one that does this. Note that this since XWiki is component-based it’s probably already possible to do that but we need to make it as easy as possible and document it.

Thanks -Vincent

[snip]

Vincent Massol

31 May 31 May

4:01 p.m.

...

On 31 May 2017, at 12:15, Guillaume Delhumeau guillaume.delhumeau@xwiki.com wrote:

Help me to decide!

TL;DR:

I need to know if performing a query on the database for each user who

want to receive an email with all the notifications, is a scalability issue (in a job context).

If it's not an issue, I can implement the "naïve" solution which requires

less development.

Full message:

Status:

notifications are displayed on the top menu when you browse the wiki.

notifications are displayed differently for each individual user

according to their preferences (filters on event type, on locations, etc...).

similar notifications are grouped together into "composite notifications".

there is only a few notifications displayed (5 by default).

Objective:

send an email periodically (every hour, every day, every week) according

to the user preferences with ALL events that happened during the last period of time, but still according to the user preferences.

Don’t forget the realtime email notification use case too!

Thanks -Vincent

...

Inspiration:

the watchlist gets ALL events that happened during the last period of time

then, for each user, remove the events which the user is not interested in

Benefit: only one query to get the events from the database for all users

Problems:

in the notifications, I have introduced a NotificationFilter role the

make possible to inject some SQL in the query to get the events according to the user preferences. I call this "pre-filters". ** it means we generate a unique request for each individual user, so if we send a mail to 1000 users, we will have 1000 requests to the database.

I wonder if it's a non-problem or a big scability issue. Because even if the whole job that send emails take ~10 minutes, it does not matter. It's not a realtime thing.

For the records, NotificationFilter have "post-filters" too, that perform check on the event itself (for example checking the permissions, etc...).

Alternatives:

just like the watchlist, perform a very generic query on the database to

get all the events that happened during the last period of time

then for each user, use only the "post-filters" to remove events the user

don't care of

Problem:

it means the pre-filters that make sense in the notification use-case

cannot be used for emails. Developers must be aware of this.

it requires some refactoring of the code that group similar notifications.

Question: Should I go with the "naive" solution, ie for each user get all notifications and send a mail, or should I go with the "only 1 query to the database for all users" version?

Thanks,

-- Guillaume Delhumeau (guillaume.delhumeau@xwiki.com) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

3148

Age (days ago)

3149

Last active (days ago)

xwiki-devs@xwiki.org

10 comments

3 participants

tags (0)

participants (3)

Guillaume Delhumeau
Thomas Mortagne
Vincent Massol