My answer:
XWiki is by definition "the wiki with features", one of the most
important features is free-text search. The only serious text
search solutions are written in Java so the JVM must be a part
of my solution.
I would use two layers of code, the bottom layer being a set of
modules which do not have any means of communication among themselves
so that can not become intertwined. The top layer shall be written
in a popular scripting language and must be kept thin.
I have reviewed Jython, Jruby, Groovy, and Node.js and gave cursory
attention to Velocity, Scala, Rhino, and Clojure.
I am a fan of Node and considered running a Node process external
to the JVM but decided it is too difficult as it would require C++
to hook the garbage collection of "handles" in node and clear their
associated Java objects.
Velocity does not have the flexibility to represent models
and controllers and I don't think Groovy has grown a large enough
community.
Scala is very interesting and might come up in the future, after all,
Twitter is using it. There seems to be a solid community according
to one popularity ranking[1] and it's syntax is quite beautiful by
my standards. One can write functional code or imperative as they
choose so the learning curve is reasonable. It has been faulted for
long compile time. The build must be fast, otherwise developing is a
pain. That said I will continue to watch it.
We must always remember that even if our core devs like their IDEs,
contributors who want to write one small patch will not want to
spend hours importing XWiki so we must not make an IDE a requirement
for developing.
Clojure and Rhino are interesting but like Groovy, they don't have
sufficient communities to get me excited. Clojure is very complete
in it's vision but for people who are not familiar with functional
programming, it's parenthesis soup.
Python community is large and Django-on-Jython was a consideration
but Jython is not getting much attention and it is said to perform
poorly.
Ruby performance is bad but it's said to be a reasonably well
thought out language. It is one of the top ranking languages in terms
of popularity so more users will feel at home, the Jruby engine is
getting serious developer attention and the performance should be
okay since most of the heavy lifting should be implemented in Java
modules.
I choose Jruby for the popularity but continue to watch Scala.
For an MVC framework, I thought about Rails because it has the most
community backing but I have been playing with Rails and found it to
be overbuilt and reeking of jar hell. Sinatra to be far more simple
so I would select Sinatra or a similar lightweight framework and
possibly port to Rails later if need be.
"how to run your Ruby webapp on the XWiki2 Platform" would get
likely contributors excited who otherwise wouldn't care.
I recently had the pleasure of working with HAML template engine
which is a ruby-on-rails favorite. While I am quite impressed with
HAML, I think Jade is even more complete in it's vision. Jade has
been ported to Java so it should be faster than the Ruby
implementation.
http://jade-lang.com/
I choose Jade for the views.
Storage
-------
A central part of Ruby on Rails (or Sinatra) is the simplicity of
ActiveRecord. Where Hibernate offers every option one could ever want,
ActiveRecord offers the options that everybody is likely to want in a
simple way.
Using Jruby for the front end, ActiveRecord is the default answer and
JDBC adapters for ActiveRecord on Jruby are available.
Getting the Ruby defined model to be accessible to the Java code is a
difficult problem but there has been some work on this front.
http://blog.liveramp.com/2011/03/28/bringing-rubys-activerecord-to-java/
User defined model is the original killer app of XWiki and since the
heavy normalization used in the current implementation poses some
performance issues, I would have to generate native objects at the
user's request and add and update the database tables on the fly to
handle them. This is a technical challenge but not one which I think
can be avoided.
Where to put the data? Mysql? integrated HSQL/Derby? Hadoop? Cassandra?
One thing I really don't want to deal with is customers who want to
store the data in a weird backward database and then report bugs in my
code because of problems they brought on themselves.
I have a huge soft spot for Cassandra, it allows the code to scale to
big data which we all know every enterprise has and just doesn't know
it yet ;) Since it can be integrated in the JVM without very much hacking,
it can keep the stack I test the same as the stack the user runs.
This means getting ActiveRecord to talk JDO or to talk directly to
Cassandra but the potential benefits are enormous:
"The only wiki which can seriously scale"
"Run your Ruby app in your own Cassandra cloud."
Since this exercise didn't come with a budget or a timeline, I'm
going to go long and say "integrated Cassandra node".
My budget/timeline answer would have been Mysql.
What will be Java
-----------------
So far this sounds a bit like a total port to Ruby. What will stay
in Java land? The answer is that heavy lifting and hot codepaths
will be in Java since it's faster but the Java code will be designed
to be generic in nature.
Wiki syntax rendering and Jade templating will be in Java.
Storage in Cassandra, search via Solr (or possibly ElasticSearch which
I just learned about) and doc import and pdf export would be all in
java (as extensions).
Why will this not become a ball of mud?
---------------------------------------
It should be easy to see why the java layer will not become a big ball
of mud. Each module has no means of communicating with any other module.
I have concluded that dependency injection is too dangerous to use on
a project. In a well designed project DI provides little in the way of
benefits but in a poorly designed project it acts as a Bank of Technical
Debt, sweeping design problems under the carpet and allowing them to
compound until the project becomes truly unmaintainable.
For example: A dependency injector which allows Search module to pull in
Permissions allows Search to:
A: alter the state of Permissions in a way that causes a bug which can
neither be seen in Search nor in Permissions alone but only in the two
together.
B: alter the state of Permissions in a way that causes a bug when Search
is not present, creating a surprise dependency.
Also my design would have no XWikiContext, no globally accessible thread
local ExecutionContext or similar. All of these designs provide back
channels through which modules can unintentionally communicate, leading
to scenarios A and B. A module must only work with the tools which it is
given by the code which called it.
Why will the Ruby code not become a ball of mud?
This is harder because it is the part which is designed to be easy to
alter, it is easy to alter because most customization of the look and
feel will be done here.
Step 1: Use a framework, follow best practices for the framework.
MVC, keep the logic in the controllers.
Step 2: When controllers get too big (1000 lines), start finding code
which can be moved down into the lower level. This is hard!
You have a big controller filled with logic that calls modules like
Search, Permissions, Database and Rendering and you have to subdivide
the logic. You have to write new Search APIs for the Search code which
can't touch Permissions, Rendering or Database. You can choose to write
a wrapper module which wraps Search and Database and provides a richer
API but your API must remain generic. If you are just rewriting the
ugly controller in Java, you're pushing the dirt around.
Suppose you don't keep up with it, what you get is a great big
controller which is an eye sore and nobody likes it.
You have obvious technical debt in one place.
This is infinitely better than shoving the code down into some module
which pulls in what it needs (wants) by dependency injection because
then what you get is subtle technical debt accumulating in places all
over the codebase.
In place of dependency injection, I would use a single initialization
script which lies on the boundary between code and configuration. This
script constructs each module, feeding them their dependencies as they
are constructed. This script is expected to look somewhat ugly, that's
because any dependency problems will concentrate in this one place so
it should be forgiven of some uglyness but watched as it is a metric
of overall project health.
Isn't this what people do with Guice or Dagger when configuring instances ?
i.e. the :
bind(TransactionLog.class).to(DatabaseTransactionLog.class);
bind(CreditCardProcessor.class).to(PaypalCreditCardProcessor.class);
bind(BillingService.class).to(RealBillingService.class);
part of Guice documentation.
I interpret your reservations against DI as reservations against auto-wiring of
components. Is this correct ?