Background: The Bruininks-Oseretsky test (BOT2) assesses global and fine motor proficiency in healthy children. We evaluated concurrent validity and reliability of the short form (BOT2-SF) and the upper-limb items of the complete form (BOT2-UL) in children with Cerebral Palsy (CP).
Methods: 15 CP children, Manual Ability Classification System (MACS)≤4 and Gross Motor Function Classification System (GMFCS) ≤3, were evaluated with the BOT2-UL and 15 with the BOT2-SF.
Results: Excellent inter- (ICC 0.99-UL, 0.95-SF) and intra- (ICC 0.99-UL, 0.98-SF) rater reliability; excellent inverse correlation between the BOT2-UL and the MACS level (ρ=-0.81-UL, -0.64-SF, p< 0.05); no statistically significant correlation between the BOT2-SF and the GMFCS level.
Conclusion: The BOT2-UL and the BOT2-SF are reliable tests to evaluate upperlimb in CP children MACS levels 1-4 & GMFCS levels 1-3. Concurrent validity is excellent. Further studies are required to validate the BOT2-SF in this population.
Keywords: Cerebral palsy; BOT2, Bruininks-Oseretsky test of motor proficiency; Validity; Reliability; Evaluation
Abbreviations: BOT2: Bruininks-Oseretsky Test; MACS: Manual Ability Classification System; GMFCS: Gross Motor Function Classification System; CP: Cerebral palsy; ICF: International Classification of Functioning; ICC: Intra-Class Correlation Coefficient; MDC: Minimal Detectable Change; SEM: Standard Error of Measurement
Cerebral palsy (CP) is the leading cause of motor disability in children in developed countries, affecting 2 to 3.5 per 1000 livebirths worldwide . Since clinical presentation varies widely, with three CP subtypes accepted nowadays (spastic, dyskinetic and ataxic), it is important to perform a comprehensive and reliable evaluation of motor function according to the International Classification of Functioning, Disabilities and Health’s (ICF) framework, to enable better clinical decision making and follow-up [1-3]. In the activities domain of the ICF, two validated classification systems for gross motor function are the Gross Motor Function Classification System (GMFCS)  and the Manual Ability Classification System (MACS) . Both provide a quick global picture of the activity’s limitations of the child, one focusing on upper and one on lower-limb abilities. Despite their fast administration time and their usefulness in classifying gross motor function in CP, they do not provide detail as to which areas the child is most impaired in. These are limits that they use in follow up and in the modulation of therapeutic strategies. Another reference tool is to evaluate gross motor function in CP is the Gross Motor Function Measure-66 (GMFM-66) [6,7].
It is a standardised validated observational instrument computed to measure change in gross motor function in CP children. However, its administration time is up to 60 minutes and it does not assess fine motor proficiency. It is also less well suited for more functioning CP children, due to its ceiling effect . Various standardised tools exist to assess both global and fine motor proficiency in healthy children, mainly differing in the age target. A well-known tool is the Bruininks-Oseretsky test, second edition (BOT2) . It is a standardised tool that assesses global and fine motor proficiency in healthy children aged 4-21 [9,10]. Originally published in 1978, it was revised in 2005 (BOT second edition: BOT2) . It includes a complete (BOT2-CF) and a short form (BOT2-SF). It evaluates the activities domain of the ICF and is regularly used in the evaluation and follow up of CP children, but has not been validated, to our knowledge, in this population.
Clinimetric properties of a test should be studied before using it in clinical routine, according to the Cosmin Taxonomy guidelines . Validity corresponds to a test’s ability to measure what it claims to be measuring. Concurrent validity is studied by comparing the results of a given test to those of another, already validated one, that measures the same parameter . Reliability determines whether the test is able to provide the same results on repeated measures in the same subject when applied by the same evaluator (intra-rater reliability) or by two different evaluators (inter-rater reliability) . Measurement error and internal consistency must also be studied to evaluate reliability, according to Cosmin Taxonomy . Validity and reliability of the BOT2 have been examined in healthy children [9,14,15] but never in CP . We thus set out to assess concurrent validity and reliability of the BOT2 in CP children.
Materials and Methods
Two versions of the BOT were assessed: the BOT2-CF and the BOT2-SF. The BOT2-CF is divided into 4 motor area composites, each including 2 sub-tests (8 overall), which in term, regroup various items (46 overall). The 4 motor area composites are fine manual control (d440 Fine hand use), manual coordination (d445 Hand and arm use), body coordination (d415 Maintaining a body position) and strength & agility (d446 Fine foot use). Each composite is scored separately and a global score over 320 is obtained . Higher scores account for better motor proficiency. The BOT2-CF is a thorough test; however, the administration time is up to one hour. This can be a limitation for children with attention deficits  and regarding human resources. We thus decided to extract the items that evaluate upper-limb function. We called this section of the test BOT2-CF, upper-limb evaluation (BOT2-UL). The five sub-tests that evaluate upper-limb function are fine motor precision, fine motor integration, manual dexterity, upper-limb coordination and bilateral coordination. The 29 items of the BOT2- UL are summarised in Table 1. A score over 172 was attributed to each child.
The BOT2-SF is a summary of the CF. It is divided into the same 4 motor area composites, each including the same 2 sub-tests, and 14 items were selected from the 46 original ones, thus shortening its administration time to 30 minutes. Items of each sub-test are summarised in Table 2. An overall score over 88 is calculated .
Fifteen CP children, aged 4-21, were evaluated with the BOT2- UL and fifteen with the BOT2-SF. Children were included if they had a diagnosis of CP, a MACS≤4, a GMFCS≤3. They were excluded it they presented other concurrent progressive neurological disorders or severe cognitive impairment impeding them to understand instructions. Participants were recruited from two specialised schools in Belgium (“Centre Belge d’Education Thérapeutique pour Infirmes Moteurs Cérébraux” (CBIMC) and “Institut Royal d’Accueil pour le Handicap Moteur”). The study was approved by the UCL’s Hospital-Faculty Biomedical Ethics Commission and parents signed an informed consent form. Participant’s characteristics are reported in Table 3. A prospective cohort study was performed.
Children were evaluated twice by two different evaluators (A and B) on day 1, and maximum one week later (day 2, 4.8±1.4 days) again by evaluator A. Regular activities were carried out as usual between the evaluations. Given that the BOT2 evaluates the dominant side, in children with diplegia and quadriplegia, the dominant upper limb was tested. This was determined by presenting a pencil or a tennis ball to the child and recording which hand the child would take it with. However, in hemiplegic children, the affected side was tested, given that the aim of our study was to assess the BOT2-UL and BOT2-SF as tools to evaluate motor defect. Children were examined in a quiet room, with only the evaluator present. The duration of each test was approximately 30 minutes.
Calculations were performed with the SPSS software (SPSS v220.127.116.11 for Windows ; IBM SPSS ; Armonk, NY, USA). For each test, statistical significance was considered at 0.05.
Concurrent Validity: Concurrent validity allows us to confirm that a certain test measures that for what it was computed. A nonparametric Spearman correlation was performed between the BOT2-SF results and the MACS and GMFCS levels; and of the BOT2- UL results and the MACS level. We didn’t evaluate the correlation between the BOT2-UL results and the GMFCS level because the latter refers mainly to lower-limb activity. Correlation was considered good, moderate or poor if the correlation coefficient (ρ) was >0.6, 0.3 < ρ <0.6 or ρ <0.3, respectively .
Reliability: Two aspects of reliability were studied, according to Cosmin taxonomy  : internal consistency and inter- and intrarater reliability. Internal consistency corresponds to the degree to which items are measuring the same construct. Cronbach’s α coefficient was calculated with the results for each subset score from the first evaluation, both for the BOT2-UL and for the BOT2- SF. Cronbach’s α coefficient was considered acceptable, good and excellent, above 0.7, 0.8 and 0.9, respectively . Inter- and intrarater reliability were quantified with the Intra-class Correlation Coefficient (ICC) and the Minimal Detectable Change (MDC) . We calculated intra-rater reliability by comparing the results obtained by the same evaluator (A, performed on two different days, maximum one week apart) and interrater reliability by comparing the results obtained on the first day by two different evaluators (A and B, performed the same day).
The ICC is related to the variability of results across repeated measures within the subjects (i.e. between subjects’ variability) and to the measurement error (i.e. within-subject) . For interand intra-rater reliability, ICC were respectively calculated with a two-ways mixed-effects model with “absolute agreement” and “consistency” types 5. Reliability was rated as excellent, moderate or poor, with ICC scores >0.75, 0.40–0.75 and <0.40, respectively . MDC corresponds to the minimal change that exceeds the measurement error in score. The MDC within a 95% confidence interval (MDC95) was calculated as follows :
where 1.96 corresponds to the 95% confidence interval of the z-score of a bilateral test, and √2 is used to account for the variance between 2 measurements. Standard error of measurement (SEM) is related to measurement error throughout repeated measures and was calculated as follows:
where SDx is the standard deviation for all observations from test sessions .
All subjects were able to perform the three evaluations. All the results are presented in Table 4, illustrated in Figures 1 & 2, and are summarized below. We obtained a homogenous distribution of the participants throughout the scores, as shown in Figure 1.
A1= results from evaluation 1, performed by rater A on day 1. B1= results from evaluation 2, performed by rater B on day 1. A2= results from evaluation 3, performed by rater A on day 2 (4.8±1.4 days later). Median [Q1-Q3]. ρ: correlation coefficient. ICC= intra-class correlation coefficient. MDC95= minimal detectable change. *p-value<0.05.
To assess concurrent validity, we compared the results of the BOT2-UL to the MACS level, and those of the BOT2-SF to the MACS and GMFCS level. Results are presented in Table 4 and Figure 2. An excellent inverse correlation was found between the BOT2-UL results and the MACS level (ρ: -0.81, p-value: 0.001) and a good inverse correlation was found between the BOT2-SF results and the MACS level (ρ: -0.64, p-value: 0.007), meaning that children with a higher MACS level, and therefore more severe manual impairment, obtained lower results both on the BOT2-UL and on the BOT2-SF. No significant correlation was found between the BOT2-SF results and the GMFCS score (ρ: -0.35, p-value: 0.19).
Internal consistency of the BOT2-UL and SF were excellent and good, respectively (Cronbach’s α coefficient of 0.94 and 0.89, respectively), thus indicating sufficient homogeneity of both tests. For both tests, intra- and inter-rater reliability were excellent (ICC > 0.95). In other words, the results obtained by one same evaluator at two different times, or by two different evaluators, are comparable. For the BOT2-UL, the MDC95 for intra- and inter-rater reliability were 8.7 and 8.4, respectively. This indicates that when a same patient is assessed before and after a treatment, either by one same or two different evaluators, results must differ by around 9 points for them not to be attributed to measurement error. For the BOT2- SF, the MDC95 for intra- and inter-rater reliability were 9.5 and 5.8, respectively.
The present study is the first to evaluate concurrent validity and reliability of the BOT2-UL and BOT2-SF in CP children. Our results suggest that the BOT2-UL and the BOT2-SF can be used as reliable, valid tools to assess gross motor function in CP children presenting a GMFCS level 1-3 and a MACS level 1-4 as we obtained a good inverse correlation with the MACS level, and excellent interand intra-rater reliability for both tests. The tests were also feasible, as all children were able to perform them.
We found a good inverse correlation between the BOT2-UL results and the MACS level. For the BOT2-SF, we also found a good inverse correlation with the MACS level and a moderate inverse correlation with the GMFCS level, although the latter was not statistically significant. This is well illustrated in Figure 2C and could be explained by our small sample and the fact that it did not include children with GMFCS level ≥4. Moreover, the GMFCS level classifies children according to their functional ability based on self-initiated movement, focusing on sitting, transfers and the use of handheld mobility devices or wheeled mobility 4. Given the large heterogeneity in motor impairment in the CP population, an important limitation in self-initiated movement may not necessarily be associated with an important upper-limb impairment 1,17. This may result in inhomogeneous scores on the BOT2-SF, where only 21% of the items evaluate lower limbs motor function exclusively.
Our results on concurrent validity, both for the BOT2-UL and the BOT2-SF, correspond to Bruiniks original findings on healthy children. Few other studies validating the BOT2 exist and were carried out mainly in healthy children. For instance, Hassan et al. evaluated validity and reliability by comparing the sub-tests to the global score and validated the BOT2-SF in the Arab healthy population. Fransen et al.  investigated convergent and discriminant validity of the BOT2-SF in the Flemish population, comparing it to the Korper Koordination Test. They calculated a Pearson correlation and found a ρ=0.61 and validated it in this population . However, no similar study was performed in CP children. To sum up, our results show that the BOT2-SF correlates significantly with the MACS level, suggesting that it is a valid tool to evaluate upper limb activities in CP children with MACS level 1-4. However, further studies are needed to confirm our findings regarding the correlation of the BOT2-SF and the GMFCS level.
Reliability was assessed by calculating the ICC, the MDC  and internal consistency . Reliability is defined as the extent to which measurements can be replicated and MDC corresponds to the change in score that exceeds measurement error and indicates whether the observed change in score is statistically significant . The internal consistency of the total score was excellent for the BOT2-UL and good for the BOT2-SF. Our results are comparable to those obtained originally by Bruininks in healthy children (Cronbach’s α=0.95, ICC>0.92), as well as those obtained in children with intellectual disabilities (Cronbach’s α=0.92, ICC=0.99) [9,19]. Both the BOT2-UL and the BOT2-SF presented excellent ICC values and low MDC values (less than 10% of the overall score), both for intra- and inter-rater reliability. Low MDC values indicate greater responsiveness20. This could be a useful parameter for clinicians to objectify the progression of a patient while taking into account the measurement error .
MDC was found to be slightly lower for inter-rater than for intra-rater, which is quite unusual, as we would expect to find greater variability between different evaluators. This may be due to the moment of the week the tests were performed; inter-rater reliability was tested the same day, usually at the beginning of the week, whereas intra-rater reliability was calculated from results obtained at the beginning and the end of the week. Various factors may slightly influence the results, such as participation to physical activities or to rehabilitation sessions [26,27]. Our results, both for ICC and MDC values, were in accordance to those obtained by Bruininks et al.  for healthy children and those obtained by Lucas et al.  who studied the BOT2 in children with foetal-alcohol spectrum disorder (FASD), who also obtained lower MDC values for interrater than for intra-rater reliability [9,27]. Our absolute MDC values are also comparable to those obtained by Wuang et al.  who studied the BOT2 in children with intellectual deficiencies. To sum up, our results showed low MDC values both for the BOT2-UL and the BOT2-SF, thus suggesting a good responsiveness of both tests, making them appropriate for clinical follow-up.
We did not observe a ceiling or floor effect in our sample, however, none of the children obtained scores higher than 75%, both for the BOT2-UL and BOT2-SF. We obtained a homogenous distribution of the participants throughout the scores, as shown in Figure 2. In Wuang et al.  study, ceiling and floor effect concerned less than 15% of the participants, which was considered acceptable.
Limits and Perspectives of the Study
One of the main limits of our study is our small sample (n=15 for each test), especially regarding the distribution in the different GMFCS levels (1, 2 and 3). However, we have obtained very reproducible results, suggesting that our results are robust. We have compared the BOT2 with MACS and GMFCS levels because these two classifications provide a global picture of the motor abilities of the CP child. It is an important starting point, but these findings need to be completed by comparing the BOT2 with GMFM66 and to other tests in the different ICF domains. The lower limb items of the BOT2-CF should be evaluated in a similar study to complete our findings.
Our results suggest that the BOT2-UL and the BOT2-SF are valid, reproducible tools to evaluate upper-limb fine and gross motor function in CP children with a GMFCS level 1-3 and a MACS level 1-4. Both can be implemented in clinical practice and in research for the evaluation and the follow-up of CP children. Further studies are needed to fully validate the BOT2-SF and to evaluate concurrent validity and reliability of the BOT2-CF.
This research did not receive any specific grant from funding agencies in the public, commercial, or notfor- profit sectors. Clémence Chéreau, Mathilde Debeker, Lucile Woirin and Marion Zeller contributed to the data collection. This work would not have been possible without their help and participation of the staff of the CBIMC and IRAHM and of the children with CP attending these schools. We would also like to thank their parents and careers.
- Cans C (2000) Surveillance of cerebral palsy in Europe: a collaboration of cerebral palsy surveys and registers. Developmental Medicine & Child Neurology 42(12): 816-824.
- Santos CA, Franco de Moura RC, Lazzari RD, Dumont AJ, Braun LA, et al. (2015) Upper limb function evaluation scales for individuals with cerebral palsy: a systematic review. Journal of physical therapy science. 27(5): 1617-1620.
- (2007) International Classification of Functioning, Disability, and Health: Children & Youth Version: ICF-CY. World Health Organization.
- Palisano R, Rosenbaum P, Bartlett D, Livingston M (2007) Gross Motor Function Classification System: Expanded and Revised.
- Eliasson AC, Krumlinde Sundholm L, Rösblad B, Beckung E, Arner M, et al. (2006) The Manual Ability Classification System (MACS) for children with cerebral palsy: scale development and evidence of validity and reliability. Developmental medicine and child neurology 48(7): 549-554.
- Lundkvist Josenby A, Jarnlo GB, Gummesson C, Nordmark E (2009) Longitudinal construct validity of the GMFM-88 total score and goal total score and the GMFM-66 score in a 5-year follow-up study. Physical therapy 89(4): 342-350.
- Russell DJ, Rosenbaum PL, Wright M, Avery LM (2002) Gross motor function measure (GMFM-66 & GMFM-88) user’s manual. Vol: 159.
- Avery LM, Russell DJ, Rosenbaum PL (2013) Criterion validity of the GMFM-66 item set and the GMFM66 basal and ceiling approaches for estimating GMFM-66 scores. Developmental Medicine & Child Neurology 55(6): 534-538.
- Bruininks RH (2005) Bruininks-Oseretsky test of motor proficiency. AGS Publishing Circle Pines, MN
- Deitz JC, Kartin D, Kopp K (2007) Review of the Bruininks-Oseretsky test of motor proficiency, (BOT-2). Physical & occupational therapy in pediatrics 27(4): 87-102.
- Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, et al. (2010) The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for healthrelated patientreported outcomes. Journal of clinical epidemiology 63(7): 737-745.
- Sivan M, OConnor RJ, Makower S, Levesley M, Bhakta B (2011) Systematic review of outcome measures used in the evaluation of robotassisted upper limb exercise in stroke. Journal of rehabilitation medicine 43(3): 181-189.
- Stratford PW, Binkley JM, Riddle DL (1996) Health status measures: strategies and analytic methods for assessing change scores. Physical therapy 76(10): 1109-1123.
- Hassan MM (2001) Validity and reliability for the Bruininks-Oseretsky Test of Motor Proficiency-short form as applied in the United Arab Emirates culture. Perceptual and motor skills 92(1): 157-166.
- Venetsanou F, Kambas A, Aggeloussis N, Serbezis V, Taxildaris K (2007) Use of the Bruininks–Oseretsky Test of Motor Proficiency for identifying children with motor impairment. Developmental Medicine & Child Neurology 49(11): 846-848.
- Gordon AM, Schneider JA, Chinnan A, Charles JR (2007) Efficacy of a hand-arm bimanual intensive therapy (HABIT) in children with hemiplegic cerebral palsy: a randomized control trial. Developmental medicine and child neurology 49(11): 830-838.
- Rosenbaum P, Paneth N, Leviton A, Goldstein M, Bax M, et al. (2007) A report: the definition and classification of cerebral palsy April 2006. Developmental medicine and child neurology Supplement 109: 8-14.
- Andresen EM (2000) Criteria for assessing the tools of disability outcomes research. Archives of physical medicine and rehabilitation 81(12 Suppl 2): S15-20.
- Wuang YP, Lin YH, Su CY (2009) Rasch analysis of the BruininksOseretsky Test of Motor Proficiency Second Edition in intellectual disabilities. Research in developmental disabilities 30(6): 1132-1144.
- Wagner JM, Rhodes JA, Patten C (2008) Reproducibility and minimal detectable change of three-dimensional kinematic analysis of reaching tasks in people with hemiparesis after stroke. Physical therapy 88(5): 652-663
- de Vet HC, Terwee CB, Knol DL, Bouter LM (2006) When to use agreement versus reliability measures. Journal of clinical epidemiology 59(10): 1033-1039.
- Hallgren KA (2012) Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in quantitative methods for psychology 8(1): 23-34.
- Gilliaux M, Lejeune TM, Detrembleur C, Sapin J, Dehez B, et al. (2014) Using the robotic device REAplan as a valid, reliable, and sensitive tool to quantify upper limb impairments in stroke patients. Journal of rehabilitation medicine 46(2): 117-125.
- Fransen J, DHondt E, Bourgois J, Vaeyens R, Philippaerts RM, et al. (2014) Motor competence assessment in children: convergent and discriminant validity between the BOT-2 Short Form and KTK testing batteries. Research in developmental disabilities 35(6): 1375-1383.
- Koo TK, Li MY (2016) A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. Journal of chiropractic medicine 15(2): 155-163.
- Bartlett DJ, Chiarello LA, McCoy SW, Palisano RJ, Jeffries L, et al. (2014) Determinants of gross motor function of young children with cerebral palsy: a prospective cohort study. Developmental medicine and child neurology 56(3): 275-282.
- Lucas BR, Latimer J, Doney R, Ferreira ML, Adams R, et al. (2013) The Bruininks-Oseretsky Test of Motor Proficiency-Short Form is reliable in children living in remote Australian Aboriginal communities. BMC pediatrics 13: 135.