Hi Vincent,
On 05/17/2017 09:30 AM, Vincent Massol wrote:
Hi Sergiu,
On 16 May 2017, at 07:36, Sergiu Dumitriu
<sergiu(a)xwiki.org> wrote:
Hi devs,
A while ago GitHub introduced several features that I think can help
boost even more the code quality, reduce the bus factor, and make it
more attractive to contribute to the project.
= Protected branches =
Branches can be configured as protected in several ways:
* Require pull request reviews before merging
** no direct commits are allowed, everything must go through a pull request
** a pull request must be reviewed by at least one other trusted
committer before it is allowed to be merged
* Require status checks to pass before merging
** pull requests must pass automated checks, for example by being
successfully built by a CI service
* Restrict who can push to this branch
** official branches (master + stable-x.y) could be restricted to
approved committers, but we could grant write access to all other
interested developers more easily
= Reviews for pull requests =
Comments on all commits as well on pull requests have been supported
since the early days of GitHub, but more recently it is easier to submit
(and request) actual reviews for pull requests. Instead of simply
commenting, and changing that special field on Jira, an official pull
request status is supported, with three states:
- review required
- changes requested
- approved
PR authors can either await for reviews from anybody, or request someone
in particular for a review.
Committers can filter PRs by their status, and they can also show only
the PRs that require their review.
= Getting more eyeballs on the code =
As Linus' Law states, "given enough eyeballs, all bugs are shallow".
We've been experimenting with mandatory pull requests for a while in my
other XWiki-based project, with great success so far. So here's a
proposal for a different development workflow:
1. Set up 2 teams, one with master (senior) committers, fully vetted
committers that have proven they understand the XWiki code, as well as
the XWiki product, and one with junior committers
How is that different from what we’re currently doing?
Right now we have:
* Committers (they’ve been voted committers because they have proven their knowledge on
our platform + long term dedication to the project). They’re the “senior” of your team
* Contributors, who need to go through PR to get their code in. They’re the “juniors” of
your other team.
It's different because:
- they are part of the XWiki organization, not just some forkers; people
may care about the list of organizations that's displayed on their profile
- they can commit code in the official repository, so everybody else can
see their commits in one place (including through notification emails),
and every other committer can contribute to their branches
2. Protect the
master branch. Require pull requests and reviews before
any code is merged in it. Restrict push only to the master committers.
Main pros for me:
* we can ask for the CI to pass before merging, thus disturbing less other committers.
* easier to converge for releases but *only* if we’re ready to drop features. If we’re
not then it’s the opposite and makes it harder to converge.
Main cons:
* takes a LOT of time before code is merged. You can be sure that it’ll need several
hours before it gets merged and that you’ll have moved on to something else when the
comment comes in. And since your new work has an important chance of being related you
might end up chaining the PRs.
3. Everybody works on separate branches
How does the CI ensure that all branches are merged before the tests are executed?
Why all branches? I'm not sure I understand this...
Jenkins has a GitHubPullRequestBuilder plugin, which can build every
pull request, after every new push to the PR branch, and this should say
if the build will work after merging that particular pull request. It
used to be fragile several years ago, but it's working great now.
Example: I’m working on something and I’m modifying
some @Unstable API. Someone else is working with the previous API. We’ll only discover the
issue late when both PRs are merged.
There's nothing new, the same thing may happen with everybody committing
directly to master. Either there's a conflict, in which case the last
pull request to be merged will have to be rebased and reviewed manually,
or there's no conflict, but in this case how does committing to master
help detect the problem? Other than rebuilding the whole project every
time after every pull, in which case someone could be stuck indefinitely
in a push->failed->pull->build->repeat loop, as long as others keep
pushing while building.
That's why the master auto-builds will stay in place, building each PR
separately isn't a replacement for the current quality check procedures,
but an enhancement.
4. Once the
code is almost done, make a pull request and require reviews
from two junior committers
Why from junior committers?
For their learning, and so that senior committers don't have to waste
time with "this isn't our codestyle" reviews and low hanging bad-fruits.
If the senior committer doesn't make the obvious mistakes, then the
review will be quick. Nobody is saying that every review will reveal at
least 5 serious problems, most reviews _should_ be quick and painless.
So you’d be asking junior devs to review senior devs’s
code? How can they do it and achieve it? While the junior may learn stuff, they’ll most
likely have no knowledge to do a proper review.
That's why the seniors have to do a review as well.
FYI just had lunch with some Docker guys today who had
this setup in place. It was a nightmare and they had to drop it after a few months. Was
just too painful to ask for 2 reviews. They’re now down to 1 reviewer.
The numbers are up for discussion, if you think two juniors plus a
senior is too much, we can go to one junior and one senior, or just one
senior if you really don't believe that early reviews from juniors have
no value. I for one saw how many mistakes I'd have missed were spotted
by other reviewers, and I'm spending less time doing trivial reviews.
How do you require reviews?
It's a GitHub feature, when viewing a pull request there's a
"Reviewers"
field where you can choose members of the project. Someone set as
reviewer will get an email notification, and they can also see which
pull request require their review.
How can you ensure they’ll have the time right now? It
seems this would be painful.
Why right now? Will you rather have buggy code in master than wait a
while for someone else to review?
Not every individual commit needs a separate pull request with its
reviews. If you work on something, then what you'll be working on next
depends on that something, maybe they belong together in the same pull
request? If it's a bugfix to something very recently developed, maybe it
would have been found by a review, in which case it should go in the
pull request that's already open. If it's a bugfix for an old bug, is it
that critical to have it in master in 2 minutes, since it's been
unresolved for a long time so far?
It is painful, indeed, but "no pain, no gain". Is speed that much more
important that no speed impediments are worth a bit of extra quality?
4a. Or,
instead of requesting, let committers volunteer themselves, but
it's less likely that someone will volunteer on time
5. The junior committers must then carefully review the code, and either
require changes, just comment, or approve the pull request
6. Once the junior committers have approved the pull request, the PR
author must request review from a senior committer
wow … so 3 reviewers!
Did I tell you that in PhenoTips we also usually require a review from
the QA team, which involves building from the branch, and manually
testing what the PR is fixing/improving/adding?
7. The senior
committer can either approve and merge the pull request,
or send it back to step 3.
e. We can define allowed exceptions: don't require PRs from senior
committers if the fixes are really small and obvious, like fixing typos,
or small and important bugfixes that must be merged quickly, but these
should really be rare exceptions
While it does sound like more work for an already overworked team,
It’s a lot more work indeed.
this
has many benefits:
* the code will have better quality
* awareness of:
** what's new / changing
** how others are approaching a problem (especially juniors learning
from seniors by being exposed to more code)
** the existing code, since the codebase is large and otherwise people
have few occasions to look at many of the parts of XWiki
* this means a larger bus factor for new code, and slowly increasing it
for existing code that's being touched by one and reviewed by many
* theoretically, less time spent doing reviews, since all committers
should look over every commit anyway, but this way they are explicitly
told when they should look, instead of wasting time reviewing work in
progress
In theory I agree. In practice I have the feeling this means slowing down the pace of
development in favor of more shared knowledge.
Aren't commits supposed to be reviewed anyway? See my last bullet above.
Personally I know that I cannot review any PR if I
don’t understand what the dev is doing and the code alone is almost never enough. I need
to talk to the person to understand the goal. Otherwise I just do cosmetic comments which
don’t help spread the knowledge. That’s unless it’s on a topic that I completely master.
Aren't new features supposed to be discussed before the implementation
starts? As I can see, there are design pages and ML discussions for
every major new feature, so this should ensure that every committer has
at least a general idea of what the goals are.
As for talking with the dev to understand the goal, at PhenoTips we have
a weekly status meeting where we discuss what we're working on, so
everybody in the team knows what everybody else is working on.
And these reviews are supposed to help everybody understand better what
everybody else is doing, so after a while you shouldn't have a problem
with not knowing what others are doing.
And reviewers shouldn't be picked randomly, pick someone who is more
likely to know the domain.
Globally I have the feeling it would slow us down
substantially (or it doesn’t that it wouldn’t bring anything more at the expense of more
constraints). I like the idea of spreading the knowledge but I’m not sure this would work
for us. However, I like to try new stuff and decide based on real feedback.
* larger community, since people can more quickly
become junior
committers instead of having to invest many months of years of forkwork
before committership
This I don’t understand. Contributors can already do PR so I don’t see a difference (see
my point above). Also this would mean defining on what new rule you make someone a
committer and the committership rules would probably need to change (voting, etc).
The difference is where the code is. Right now a non-committer can only
commit in a personal fork, which, as stated above:
- other people can't write to, if they want to collaborate, unless
explicitly granted access to that fork
- doesn't send notifications when new commits are added
Perhaps two examples will make it clearer why it's relevant:
- For bigger features that take a while to be finished, sometimes the
original developer moves to a different task that's more urgent, and
someone else has to step in and take over the code. That wouldn't be
easy with a fork, but it's very simple when the code is in an "origin"
branch (even before the PR is ready to be opened)
- Sometimes it's easier for me to go and commit some changes directly
than to write in the pull request review what needs to be done, which
involves writing the code, copy-paste it to github, then the other
person copy-pastes from github to the IDE, then commits.
Also right now we have xwiki-contrib which is open to
anyone in term of committership and we’re pushing more and more to it so all this might
not be actually needed (except maybe on contrib if too many errors are introduced by code
that is not reviewed - I’m more worried about this than for the xwiki repos TBH).
* easier collaboration on code, since it's
very hard to work on someone
else's fork branches, but easy to work on an origin branch
So, what do you think?
Sounds pretty ambitious to me :)
The part that I prefer is the part about the CI building before merging (see above) but
even that is not so simple. Can we make dependent jobs also build so that all tests are
executed including functional tests in xwiki-enterprise?
I really don’t like the 3 reviews asked (even 2). One would already be a lot IMO. It’d
mean a lot of pinging around to find someone to review your PR :)
I’d be willing to try it out if others are interested but for a limited amount of time
based on these changes:
* Same committership rules as now
* Master is protected and all code needs to go through PR
* Only 1 reviewer is required. This reviewer would do the merge.
* The onus is on the committer to find a reviewer but everyone should play the game. IMO
that’s the hard part and we risk seeing always the same person being asked to review.
* We continue to do time-boxing for releases and we stop merging PRs 2 days before the
release.
* We try this for 3 releases (i.e. 3 months - June-August) and then do a postmortem to
decide if we continue or not
Sounds like a good start.
* We need to define some PR rules and reviewer
"power". What if I comment to an existing committer that they need to split
their PR into several ones so that I can understand it better? Would they be willing to do
that? What if the PR is good but could be better by following better some coding style.
Are we willing to annoy the committer and refuse the PR for that reason? What if the PR
has no tests?
You can review a pull request commit-by-commit, if you find that easier,
and if the author made nice clean commits.
The code rules should stay the same, there's already a requirement that
the TPC should only go up, not down, so if you don't like that tests are
missing, comment about that. The author can negotiate for an exception,
but that doesn't have anything to do with the fact that it's a pull
request: it's a commit like any other commit intended for the master
branch. If he would have done "frowned-upon things" directly on master,
would that be better or worse?
In conclusion I’m pretty sure this would slow down a
lot and we’re already a pretty small team. The real question is whether more knowledge
sharing and higher code quality is more important than pace of development or not. It’s
hard to decide on that (at least for me).
The XWiki community prides in its transparency and democracy, and this
is a proposal up for discussion, let's hear more opinions.
Also I don’t like changing something for the sake of
changing it. Are there any recent experience that make you think that we need this change?
No, nothing bad in XWiki, but:
- I haven't been monitoring commits in a long time, so I wouldn't have
noticed anything anyway
- I did commit some things on master that were not optimal, since I
haven't been keeping up to date with what else has been happening, so I
personally feel that my commits should be reviewed
- I have mostly good things to say about this workflow, which has been
in use for more than a year in the PhenoTips team
Well, I should mention some stats:
- Several PRs are merged within minutes, if they're small enough
- Most take one or two days to be merged
- If everybody is working hard to get their assignments done, PRs can
stay several days / couple of weeks before being reviewed, then comes a
period of reviews and merges
- Sometimes a big feature will be moved to the next cycle
- Pull requests that we're not happy with will stay open for several
months waiting for some free time to be refactored
In any case, thanks for starting this interesting
topic.
Thanks
-Vincent
--
Sergiu Dumitriu
http://purl.org/net/sergiu/