The application of animal welfare standards in intensive production systems using the assessment protocols of Welfare Quality ® : Fattening pig husbandry in Northwest Germany

The increased requirements for animal welfare have raised the need for a comprehensive on-farm assessment system. This paper is a first step to analysing the reliability and feasibility of the on-farm welfare assessment with regard to the animal-related measures of Welfare Quality© in intensive fattening pig husbandries. Based on the 2009 Welfare Quality® assessment protocols for pigs, six analyses were undertaken by one observer on three farms. It became apparent that the system in essence, fulfils the requirements of a sound assessment of animal welfare under intensive production with low on-farm variability. The behaviour-based measurements had a higher degree of within-farm variability than clinicaland resource-based measurements as the assessment involves a greater degree of subjectivity. Some measurements seemed to be of low sensitivities as there were no or very low variations in many of the indicators being assessed. Despite this, this preliminary study indicates that the assessment system is a reliable and feasible tool for the evaluation of animal welfare status in intensive pig production for fattening pig.


INTRODUCTION
The welfare of animals used for food production has increasingly become an area of interest at all levels of the value-added chain (Blokhuis et al., 2008).The reasons are due to the changes in domestic animal husbandry, which has become more and more specialised and intensified (Temple at al., 2011a;Aparicio Tovar and Vargas Giraldo, 2006;Hughes and Duncan, 1988).Additionally, consumers increasingly demand animalwelfare-friendly products (Tuyttens et al., 2010;Ellis et al., 2009;Carlsson et al., 2007;Harper and Makatouni, 2002;Velde et al., 2002;McGlone, 2001).The food industry has reacted to this situation and is at present discussing the introduction of numerous different labels to guarantee high standards of animal welfare.
Although, a large body of approaches has been developed, none of them could be establish in practice so far.Against this background, the Welfare Quality® system was developed within the European Union's Sixth Framework Programme on Food Quality andSafety (2006 to 2010).The project involved a total of 44 institutions based in 13 European and 4 Latin American countries.Its aim was to develop reliable standardised methods for the assessment of animal welfare at farm level (Welfare Quality®, 2009).
A first evaluation with a prototype of the Welfare Quality® system for sows and piglets was undertaken by Scott et al. (2009) on 82 pig farms in the UK and the Netherlands, encompassing a wide variety of farming systems.The analysis showed that the incidence of clinical welfare problems as indicated by the system was generally low.The main criticisms involved stereotypical behaviour patterns.Knierim and Winckler (2009) provide a review about the validity, reliability and feasibility of the scoring system.These authors also discussed future perspectives of using Welfare Quality® evaluations by looking at the welfare of cattle.
The first step towards the evaluation of Welfare Quality® assessment protocols on intensive pig farming systems was done by Temple et al. (2011a, b) on 30 conventional growing pig farms in Spain.Their results showed that the measures presented very little variation to differentiate farms using intensive production and could just be used to identify poor welfare levels under such conditions.In addition, the state of welfare at the slaughterhouse was analysed by an overview of the sensitivity and feasibility of the Welfare Quality® protocol for finishing pigs in ten Spanish slaughterhouses (Dalmau et al., 2009).
But so far, there exist no studies about the reliability of the Welfare Quality© assessment in intensive fattening pig farming, an important requirement to establish an assessment system in practice.The present study takes this into account to evaluate reliability within farms (testretest reliability).Consistency of welfare assessment requires further attention in the future, particularly if farms are to be certified on the welfare status, based on infrequent recorded measurements (Knierim and Winkler, 2009).
Generally, reliability issues of animal welfare assessments have often been neglected so far and require more thorough investigation and discussion.A great absence of studies about reliability measures for animal welfare indicators is remarkable (Kniriem and Winkler, 2009).In literature, consistent statistical tests for reliability are rare.While rare unequivocal scientific methods or criteria for range "good" or "acceptable" reliability have been established so far, some opinions are given in the literature (Knierim and Winkler, 2009).Often, the correlation coefficient is used.This was not possible in this preliminary study because of the sample size.
Therefore, another widely used analysis method (especially at preliminary studies) was chosen, the coefficient of variation (CV).It expresses the experimental error as percentage of the mean and is a very good index of the reliability (Gomez and Gomez, 1984).However, the rate of acceptable CV varies greatly with type of experience (Patel et al., 2001).Gomez and Gomez (1984) opined an acceptable CV from 8 to 15%.
In the present study, the authors chose a CV of 10% an acceptable range for a reliable assessment.
Beside the objectivities of reliability, the study discusses the feasibility of the assessment on intensive fattening production for future potential of implementation in practice.To be feasible, an assessment system should be relatively easy to perform and require little input from the farmers (Temple et al., 2011a).Also, time constraints are a main concern of an assessment system considering feasibility (Knierim and Winkler, 2009).The actual time needed for an adequate assessment of a farm is difficult to gauge (Temple et al., 2011a).Therefore, Knierim and Winkler (2009) discuss that welfare status on a farm should be carried out during a one-day visit.

Experimental design
Three intensive fattening pig farms (described in the following as Farms A, B and C) were each analysed by six repeats (08.12.2011 to 24.12.2011 at weekly intervals (9:00 a.m. to 14:00 a.m.) using the Welfare Quality® assessment protocols for pigs (Welfare Quality®, 2009).All analyses were done by the same assessor, who had absolved an official training course at the Welfare Quality® consortium to ensure a correct application of the Welfare Quality® protocols.The farms were situated in Northwest Germany and represented the typical production system for fattening pigs in this intensive livestock region with respect to herd size, equipment and state of technology.The farm size was 1700 to 2500 fattening (genetic hybrid) pigs.The pig houses on all three farms were insulated, had mechanical ventilation and fully-slatted concrete floors.Neither sows nor piglets were kept on the farms.
Three different feeding regimes were implemented.The pigs were fed by manual liquid feeders in troughs (limit fed, 4 x per day; Farm A), by automatic liquid feeders (ad libitum; Farm B) or by automatic sensor-controlled liquid feeders in troughs (ad libitum; Farm C) with a feeding place / pig ratio of 1:1 (Farms A and C) or 1:6 (Farm B) (Table 1).

Assessment of growing pigs by Welfare Quality®
The core element of the Welfare Quality® scoring system is an animal-based assessment followed by an evaluation of certain resource-and management-based measures.The final evaluation is comprised of four principles: "Good feeding", "Good housing", "Good health" and "Appropriate behaviour".These four principles are based on twelve criteria which are calculated from various indicators.Table 2 shows the three-stage structure of the Welfare Quality® system (Welfare Quality®, 2009;Botreau et al., 2007) (Table 2).
The evaluation was done exactly according to the 2009 Welfare Quality® assessment protocol for pigs.At the start of each assessment, the farmer was interviewed about general information concerning the management of feeding and hygiene, the records of production and mortality, the regulation of ambient temperature, castration and tail docking routines, and the use of anaesthetics and the prevention of disease.
At the beginning of each investigation, a sketch of the husbandry was made involving each pen individual.Then, the number of required pens was selected randomly and arbitrarily from the sketch for ensuring a representative random sample.Therefore at every  Step 3 Step 2 Step 1 Welfare principles

Welfare criteria Indicators
Good farm visit, the average sample ages was of the animals in husbandry were analysed.The sample size of the observed measures was: QBA = 6 observation points x 25 animals (20 min.);coughing and sneezing 6 observation points x 25 animals; social behaviour and exploratory behaviour = 3 observation point x 50 animals; huddling, shivering, panting, fear of humans, bursitis, absence of manure on body, tail bitten, lameness, pumping, twisted snouts, rectal prolapse, scouring, skin condition, ruptures and hernias = 15 observation points x 10 animals.Thereafter, the observations started with an assessment of the principle "Appropriate behaviour" (Table 2)."Appropriate behaviour" was evaluated by the animals' social behaviour, exploratory behaviour, the fear of humans and a qualitative behaviour assessment (QBA).In the QBA, the emotional status of an animal was assessed by discerning the intensity of the occurrence of ten positive and ten negative behaviour patterns within a 20 min period: positive -active, calm, content, enjoying, happy, lively, playful, positively occupied, relaxed, sociable; negative -agitated, aimless, bored, distressed, fearful, frustrated, indifferent, irritable, listless, tense.
To undertake the QBA, the assessor entered the room and ensured that all the pigs get up.After waiting five minutes, the observer started the assessment from outside the pen in the run passage; they have to be in a partly active state to show behavior for assessment.The pigs were scaled (0 to 120) on the basis of the number of pigs showing the behaviour pattern and the intensity of the behaviour.To evaluate their social and exploratory behaviour, the pigs were scored as to whether they were active or inactive by scan samplings (five scan samples made at two-minute intervals).The active ones were scaled in positive social behaviour (sniffing, nosing, licking, moving gently away from another animal), negative behaviour (aggressive behaviour or social behaviour as a response from a disturbed animal), exploratory behaviour or other behaviours (not classified).The pigs' exploratory behaviour was also divided into pen behaviour (sniffing, nosing, licking part of pen) or other (behaviour patterns not included above) (Temple et al., 2011a;Welfare Quality®, 2009).
The assessment of the social and exploratory behaviour should be applied at three different ages of the fattening period if possible: at the beginning but at least one week after being mixed (before the establishment of a social hierarchy), in the middle of the fattening period (around 70 kg live weight), and at the end of the fattening period (Welfare Quality®, 2009).
The indicator for human-animal relationship (HAR) based on the fear of humans test, whereby the reaction of the pigs to the farmer entering the pen is analysed (the farmer goes very slowly along the passage and waits there for 30 s).For the test, 10 pens were analysed on every visit to each farm.Each pen was analysed as a whole.In the HAR, two reactions are possible: 0 = no panic present; 2 = more than 60% showing panicking behaviour.
The animal-based measures to evaluate "Good feeding", "Good housing" and "Good health" followed the "Appropriate behaviour" assessment.Table 3 shows the indicator assessments and their scoring (Welfare Quality®, 2009) (Table 3).
Most measurements were scored according to a three-point scale (0 to 2): 0 = welfare is good; 1 = welfare is acceptable (compromises made); 2 = poor and unacceptable welfare.In some cases, just the presence or absence of a particular behaviour was scored: 0 (present) or 2 (absent).
The data evaluation of the analysed indicators and the calculation of the algorithm were done by the Welfare Quality® consortium (National Institute for Agricultural Research INRA, France).Here, the overall evaluation (range of scores = 0 -100) of a farm is given one of four values excellent, enhanced, acceptable or not classified.In the overall evaluation, the individual criteria within a particular principle do not compensate for each other (that is, a high score in one will not compensate for a low score in another).A farm is considered to be excellent if it scores more than 55 on all principles and more than 80 on two of them.It is considered to be enhanced if it scores more than 20 on all principles and more than 55 on two of them.An acceptable level of animal welfare score will be obtained by scores more than 10 on all principles and more than 20 on three of them.If a farm does not reach this minimum standard it will not be classified (Welfare Quality®, 2009).

Statistical analysis
The statistical analysis of the welfare data by the Welfare Quality® consortium was carried out with the software program SPSS, Version 19 (PASW Statistics -SPSS 19. for Windows).The coefficients of variation and the upper and lower confidence intervals (95%) of each respective farm's observations were calculated to evaluate the reproducibility and variability of the welfare assessment.Additionally, an analysis of variance (Tukey's RM) was performed to analyse the differences between the farms.A value of P ≤ 0.05 was considered as statistically significant.A log transformation was done before calculation.

RESULTS
In the final evaluation according to the Welfare Quality® system Farms A and B achieved an overall evaluation of "enhanced" at every evaluation.Farm C achieved this only on the first one and thereafter would be classed an "acceptable" range.Table 6 shows the rated values of the criteria, principles and the overall evaluations and the reliability of the assessments within the three farms.The comparison of means demonstrate significant differences between the farms (Table 4).
In the overall assessment, there were no differences in the evaluations of Farm A and B; Farm C was found to be significantly worse.The differences between farms occurred in the criteria absence of prolonged hunger, absence of prolonged thirst, ease of movement and expression of social behaviour.The principle of "Good feeding" was assessed as having the worst principle of all.Especially deficits of the criterion absence of prolonged thirst had a significant impact.Criticism was not only the number of waterers (Farms A and C: 12 pigs / waterer; Farm B: 30 pigs/waterer) but also their functionality and cleanliness.Especially Farm B was significantly worse in this aspect.Farm A had a worse evaluated absence of prolonged hunger because there were worse body conditions scored.Farm B had a high variation because of the measured absence of prolonged thirst.Farm A and C did not have any variation in this aspect.
The middle range of the principle Good housing was scored due to evaluation of the criteria comfort of resting (presence of indicators manure on body and bursitis) and ease of movement.Especially Farm C had the significant worse criterion easy of movement due to the average minimum space available per animal at the start of the fattening period 2.50 m 2 /100 kg falling to only 0.62 m 2 /100 kg at the end.Therefore, a high variation became apparent because of changing number of pigs per pen.
The best value in the Welfare Quality® assessment was achieved on the principle of "Good health" by the farms.The criterion thermal comfort and the criterion absence of pain by management procedures did not have any variation at all.The assessment of absence of pain induced by management procedures gave a uniform value due to castration and tail docking being undertaken on all three farms.The use of local anaesthetic during these procedures was, however, considered as being a positive factor.With the principle "Appropriate behaviour" only the criterion expression of social behaviour had a positive result.The criterion expression of other behaviours was negative due to the low investigative behaviour shown by the pigs in their pens.The evaluation of the "Appropriate behaviour" also had the highest coefficient of variation.Almost all measurement had a high variation.The results of the QBA are shown in Table 5.On average, the means of positive behaviours were found significantly more commonly than the expression of negative behaviours on all three farms (Farm A: P-value 0.009; Farm B = P-value 0.003; Farm C: P-value = 0.039) (N = 10).There was a strikingly low number of pigs showing the behaviour patterns positively occupied, playful and enjoying.With respect to the negative behaviour patterns, the raised values for tense and agitated indicated that there was a degree of unrest present in the pigs.The variation of the QBA is significantly higher than with the other criteria at Farm A (P-value = 0.0454) and Farm C (P-value = 0.0029); (Farm B; P-value = 0.0735) (Table 5).

Feasibility and practicability
The observation takes about 250 min time (QBA = 25.0 min, social behaviour and respiratory behaviour = 75.0min, assessments in pen = 150.0min).In addition, variations in time were caused by the necessary conversations with the farmer for the analysis and the discussion of the results.Some assessments have to be carried out at certain times.It was very important that the assessment was not carried out during feeding time, the changing of pigs between pens or when any treatments are given because these can influence the results.The Welfare Quality's® guidelines also point this out.
Sometimes, the natural curiosity of the pigs complicated the observations making the evaluation difficult.Poor visibility due to the lighting conditions on one Farm (A) in addition to the large number of pigs in the pens also made observation problematic.Another problem became apparent: Sick or injured animals are often taken out of pens and placed in a hospitalpen so that they can no longer be matched back to their original pens and therefore can no longer be used for the assessment.Otherwise, the application of the assessment protocols was easy to perform.Table 4.The variation in the criteria, principles and the overall assessment of the six analyses on each farm (coefficient of variation and confidence interval 95%) and the differences between the farms (comparison of means).Different letters a,b,c following the means define significant differences between farms.

DISCUSSION
Because of the increasing demands regarding onfarm welfare assessment the Welfare Quality© protocols were developed.The study was a first step to evaluate the reliability and feasibility of the assessment protocols for fattening pigs in intensive production systems.The Welfare Quality© protocols evaluated general criticism was an inadequate water supply.Furthermore, the indicator body condition score was found to be critical in the evaluation of the principle "Good feeding".Signs of malnutrition or dehydration were not visible on the analysed farms.Certain husbandry mistakes also became apparent by the presence of bursitis and manure on body.These deficits in the welfare criterion ease of movement were due to the concrete flooring and stocking density used in the intensive production conditions on the farms (Mouttotou et al., 1999).This criterion is therefore, a sensitive and important indicator of animal welfare in intensive production systems (Waiblinger et al., 2001).Scott et al. (2009) found in their study that the clinical welfare problems in sow and piglet production were rather low.
However, the main criticisms of these authors relate to the presence of stereotypical behaviour patterns.The present study confirms their results with the general deficits in the principle "Appropriate behaviour".The consideration of the animals' behaviour has gained imperative that in intensive production systems, farmers pay greater attention to the species-specific behaviour of their animals to ensure animal-friendly husbandry.
A reliability assessment is often not easy to achieve in the Welfare Quality® evaluation (Temple et al., 2011a;Knierim and Winkler, 2009).In the present investigation, the coefficient cannot reliably be assessed during a short time of observation (Knierim and Winkler, 2009).Temple et al. (2011a) reported a low rate (<2%) in the occurrence of the indicators panting, pumping, shivering, huddling, wounds on the body, tail biting, lameness, hernia, and scouring in growing pigs kept under intensive conditions.In the present study the minimal observation of these measures also became apparent.Furthermore, the indicators skin condition, twisted snouts and rectal prolapse were not indicated at all.In addition, the indicators castration and tail docking and the criteria absence of pain by management procedures and thermal comfort showed no variation in the observations.This lack of variation could partly be due to the small scales used in the Welfare Quality© scoring system.
The majority of the measurements were scored according to a three-point scale (0 to 2) or just grouped according to the presence or absence of an indicator.The use of a narrow scale means that the reliability of the Welfare Quality® is increased and that different observers will reach the same results (Knierim and Winkler, 2009).However, with intensive husbandry, the narrow scale might lead to a low degree of sensibility of the assessment by some indicators.Further studies are necessary, if the merging score is (or is not) a problem in intensive systems for fattening pig.The present sample sizes allow no representative statement in this respect, but seem to be suggesting Temple et al. (2010a).
The data collected from slaughterhouses could also be useful for the assessment of animal welfare.However, studies have shown that the reliability and validity is unsatisfactory at present (Bahlmann, 2009).The predictive value (the validity and reliability) of the collected data from slaughtering processes for animal welfare is not without controversy and better (especially conform) procedures of assessments at slaughterhouses are necessary.The data could especially be very useful to document long-term changes of animal health.
Simple arrangements to improve the reliability are intensity training for observers and refining definitions or data recording design (Knierim and Winkler, 2009).In the Welfare Quality© protocols, both the training of the consortium and mentoring given after training were already judged as being good by the authors.

Feasibility and practicability
The time needed is certainly to be rated efficient for a correct evaluation of on-farm animal welfare situation.Like the animal based approach of Welfare Quality©, these measures take much more time (2/3) than the management-and of resource-based one (1/3).With increasing herd size, the total time needed also increase.However, the time per pig is reduced as the time needed for the herd analysis, collection of general information and discussion with the farmer is similar to that needed for smaller units.The feeding times and other managerial practices needs to be taken into consideration and reduces flexibility when the investigations can take place but otherwise the system is flexible regarding time.The only part of the assessment that requires the farmers input is the interview at the beginning.Otherwise, participation from the farmers is not required.
However, a final meeting should be held to explain the assessment's results and to discuss any recommenddations for future practice.Especially the animal-based measurements (particularly the behaviour assessments) require some explaining; enough time should be taken to ensure an adequate transfer of knowledge.Knierim and Winckler (2009) even emphasise the high degree of interest of their farmers in the animal-based parameters.
Also, it must be clarified what happens with sick and injured pigs which have been taken out of their original pens and placed in a hospital pen.Such animals can often then no longer be exactly placed with respect to their original pen, making their inclusion in the assessment difficult.Apart from this weakness, from the authors' point of view, the Welfare Quality® system fulfils the requirements of a feasible assessment with a short duration that is easy to perform under intensive production conditions.However, for a valid implementation of this method, schooling of the observers by the Welfare Quality® consortium is of paramount importance.

Conclusions
The present study must be considered as a preliminary study which analyses the reliability and feasibility of the assessment protocols for pigs in intensive production systems.It is a first step to give an overview of the reliability and feasibility of the assessments protocols in these productions systems and does not allow giving a general statement about improvement measurements.The results indicate, however, pioneering clues on which further studies have to follow.As a conclusion, it can be said that the Welfare Quality® protocols seem to be a right step in the context of the ongoing discussion regarding a reliable on-farm assessment system for these production systems.

Table 1 .
Description of the three pig fattening farms.

Table 2 .
Structure of the three steps of the Welfare Quality® assessment system.

Table 3 .
Measurements and evaluation scores for animal welfare in the Welfare Quality® assessment protocols for pigs.