Segmented regression
From Wikipedia, the free encyclopedia
Segmented regression is a method in regression analysis in which the independent variable is partitioned into intervals and a separate line segment is fit to each interval. Segmented regression can also be performed on multivariate data by partitioning the various independent variables. Segmented regression is useful when the independent variables, clustered into different groups, exhibit different relationships between the variables in these regions. The boundaries between the segments are breakpoints.
Segmented linear regression is segmented regression whereby the relations in the intervals are obtained by linear regression.
Contents |
[edit] Segmented linear regression
Segmented regression can be useful to detect an abrupt change of the response function at an increase or decrease of an influential factor. The breakpoint can be taken as a critical or safe value beyond or below which (un)desired effects occur. The breakpoint can be important in decision making (ref. Oosterbaan 1995 [1]).
The data may show many trends (ref. Oosterbaan et al. 1990 [2], Oosterbaan, 2002 [3] ), see the figures at the right.
In the determination of the most suitable trend, statistical tests must be performed to ensure that the trends are reliable. When no significant breakpoint can be detected, one must fall back on a regression without breakpoint.
The following statistical tests are used to determine the type of trend:
- significance of the breakpoint (BP) by expressing BP as a function of regression coefficients A1 and A2 and the means Y1 and Y2 of y and the means X1 and X2 of x (left and an right of BP), using the laws of accumulation of errors in additions and multiplications to compute the standard error (SE) of BP, and applying Student's t-test
- significance of A1 and A2 applying Student's t-distribution and the SE of A1 and A2
- significance of the difference of A1 and A2 applying Student's t-distribution using the SE of the difference.
- significance of the difference of Y1 and Y2 applying Student's t-distribution using the SE of the difference.
In addition, use is made of the correlation coefficient, the coefficient of determination (explanation), confidence intervals of the regression functions and Anova analysis (ref. Oosterbaan, 2002 [4]).
[edit] References
- ^ R.J.Oosterbaan, 1994, Frequency and Regression Analysis. In: H.P.Ritzema(ed.), Drainage Principles and Applications, p.175-224, ILRI publ. 16, Wageningen, The Netherlands. ISBN 90 70754 3 39. Free download from ILRI-Alterra.
- ^ R.J.Oosterbaan, D.P.Sharma, K.N.Singh and K.V.G.K.Rao, 1990, Crop production and soil salinity: evaluation of field data from India by segmented linear regression. In: Proceedings of the Symposium on Land Drainage for Salinity Control in Arid and Semi-Arid Regions, February 25th to March 2nd, 1990, Cairo, Egypt, Vol. 3, Session V, p. 373 - 383. Free download from www.waterlog.info.
- ^ R.J.Oosterbaan, 2002, Data analysis in drainage research. Free download from www.waterlog.info
- ^ R.J.Oosterbaan, 2002, Statistical significance of segmented linear regression with break-point using variance analysis and F-tests. On website www.waterlog.info
[edit] External links
- SegReg, free download of software for segmented linear regression

