Hi Denis,
On Jun 9, 2011, at 2:11 PM, Denis Gervalle wrote:
Hi committers,
Even if I completely understand and agree with the goal pursued, I
really dislike this way of solving it. It is the responsability of all
of us to keep the build stable at all time. It should be the concerns
of all recent committers to check if their recent commit breaks the
CI, and to fix it ASAP when needed.
Nobody has suggested to change this and this is still the rule.
If someone is not ready to do so
in the upcomming hours following its commit, he should simply refrain
to do that commit until he can follow it into the CI. I always follow
these rules (even if most of the time, I commit only stuff that I have
in production on my side), and I should admit that I would have commit
more without them. But this is the necessary trade-off between
evolutivity and stability.
Again this has always been the rule and will continue to be the rule.
Having someone to enforce such rules is admitting that
some of us
needs another one to remind them the best practices.
Not at all. The reason you don't see the need is because you've not been active on
helping with maintaing our build I guess. I'll list some build issues that happen
frequently:
* Agents stop working. For ex this morning there were 2 jobs stuck because of a failure to
start FF. This happens from time to time. Someone needs to investigate and at the very
least kill the job to free the agent.
* New versions of jenkins fixing bugs. Someone needs to check and upgrade the build when a
new version has interesting stuff for us and when it fixes some of our issues.
* The most important ones: flickering tests. I can tell for sure that you haven't
written UI tests or you'd have introduced flickering tests for sure :) I have myself
introduced several. Of course we always test locally before committing and I even check
that it works on jenkins. But flickering don't fail immediately, they'll run fine
for 50 or 100 iterations and suddenly start to fail. It's not always easy to find
who's the culprit on flickering tests.
* Various issues with filesystem locks, permissions on agents, memory settings, etc that
make the build fail
BTW there are also other build tasks such as:
* Upgrade versions of maven plugins we use when there are new versions
* Fix TODOs in pom.xml which are workaround waiting for maven issues to be fixed
You seem to think these are done automagically. The reason you think this is because
people like Thomas or myself (and even others but I think Thomas and me have been the most
active on this) are doing. Personally it's not my role to be the Build Manager and
I'm fed up of doing it. I want to share the workload with everyone.
In order to address these problems you cannot just rely on the good will of everyone to
work on them. You need someone responsible. Rather than having a single person responsible
all the time, I'm proposing that committers take turn to do that.
This is not for
me the philosophy behind Open-Source project, where everyone should do
their best for the wellness of the project.
Ok so tell me the last time you helped fix failing builds? :) IMO it's a long time ago
which means you don't do the best for the wellness of the project (according to what
you said) ... :)
(Surely you agree that the specific issues I've listed above are nobody's
faults).
So I could not admit there
is a real need for this, and I really hope that everyone of us will
understand the needs to move their cursor towards the stability of the
build.
That's exactly what I'm trying to achieve. Ideally everyone would take care about
the build but that doesn't work. Right now the goal is to create a task force to
stabilize the build again, ie make sure it's stable without any false positive. Again
the goal of the Build Manager is NOT to fix all issues himself/herself but to ping others
to help/fix the issues and in general to increase the awareness of having a stable build.
So please guys, takes your responsibility without a
need for a build
policeman.
It hasn't worked. Hence this new proposal. At some point we may find that the build
manager has nothing to do anymore which will be great. When that happens we'll be able
to remove that role.
Sorry, but I am -1 to do the policeman (but if I need
to, I will do my
duty), and I vote -0, just because I do not consider myself active
enough to veto.
Let's hope I've provided more info on why we need a build manager for the time
being ;)
Thanks
-Vincent
Denis
On Thursday, June 9, 2011, Thomas Mortagne <thomas.mortagne(a)xwiki.com>
wrote:
On Thu, Jun 9, 2011 at 09:20, Vincent Massol
<vincent(a)massol.net> wrote:
>
> On Jun 9, 2011, at 9:08 AM, Thomas Mortagne wrote:
>
>> On Wed, Jun 8, 2011 at 19:40, Vincent Massol <vincent(a)massol.net> wrote:
>>> Hi committers,
>>>
>>> We're having a hard time stabilizing our build (especially the
functional test part, see my previous mail entitled "[VOTE] Important:
Strategy to fix failing tests and stability"). Now I believe that it's going
to be hard to enforce it and thus I'd like to propose a variation:
>>>
>>> * The Build Manager has the *responsibility* to get the build fixed
ASAP whenever it's failing. His priority #1 during the week becomes
monitoring the Build
>>> * By "Build" we mean the CI
Build on
ci.xwiki.org and by "failing" we
mean anything that makes the
build fail: tests, compilation, clirr, etc.
>>> * Every week we have a different
Build Manager chosen amongst the
Committers
>>
>> A week seems a bit short but in the other hand it will seems pretty
>> long for the Build Manager itself I'm sure ;)
>>
>>> * In order to fix build issues the Build Manager has several
possibilities:
>>> - find out who caused the build to
break and ask that person to fix it.
That person cannot refuse that and must
consider it his/her priority to fix
it (or rollback the change that caused the build to fail)
>>> - rollback the issue that caused the
build to fail
>>> - fix it himself/herself
>>> - find someone knowledgable in the failing domain and get him/her to
fix the build.
>>> * At the end of the Week the Build
Manager hands over his duty to the
next Build Manager by contacting him/her.
>>> * We create a Build Manager Roster
page on
dev.xwiki.org to log past
Build Managers (and possibly future ones if some
have expressed the wish to
be the Build Manager for a specific week).
>>> * All committers must perform this
duty and take turns
>>>
>>> Since I've started doing this this week, I propose to take this role
for the current week. I'm also proposing to log Caleb has having been the
Build Manager for the past week since he's done a lot to stabilize the
build.
>>>
>>> If the vote is passed I'll log this on the Committership page as a
Committer duty (I'll also cross reference it from the Build page).
>>>
>>> Here's my +1
>>
>> +1
>>
>> What don't you think about designed people who broke the build the
>> most for the following week ?
>
> An interesting idea...
>
> However:
> 1) it's hard for flickering tests to find out the culprit
> 2) it's not so much a problem of breaking the build often, it's more a
problem of not fixing it immediately when broken
Sure, my really proposal was actually "design the most painful people
for Build Manager as build manager" but I wanted to find a better
metric :)
>
> However I agree that in the Roster we could log information for the past
week
about who broke the build, how many flicker fixed, etc
>>
>> Thanks
>> -Vincent