Quality Magazine logo
search
cart
facebook twitter linkedin youtube
  • Sign In
  • Create Account
  • Sign Out
  • My Account
Quality Magazine logo
  • NEWS
  • PRODUCTS
    • SUBMIT YOUR PRODUCT
  • CHANNELS
    • AUTOMATION
    • MANAGEMENT
    • MEASUREMENT
    • NDT
    • QUALITY 101
    • SOFTWARE
    • TEST & INSPECTION
    • VISION & SENSORS
  • MARKETS
    • AEROSPACE
    • AUTOMOTIVE
    • ENERGY
    • GREEN MANUFACTURING
    • MEDICAL
  • MEDIA
    • A WORD ON QUALITY PUZZLE
    • EBOOK
    • PODCASTS
    • VIDEOS
    • WEBINARS
  • EVENTS
    • EVENT CALENDAR
    • QUALITY SHOW
    • IMTS
  • DIRECTORIES
    • BUYERS GUIDE
    • NDT SOURCEBOOK
    • VISION & SENSORS
    • TAKE A TOUR
  • INFOCENTERS
    • NEXT GENERATION SPC & QUALITY ANALYTICS
  • AWARDS
    • ROOKIE OF THE YEAR
    • PLANT OF THE YEAR
    • PROFESSIONAL OF THE YEAR
  • MORE
    • eNEWSLETTER
    • INDUSTRY LINKS
    • THE LEADERSHIP SURVEY
    • CLASSIFIEDS
    • MARKET RESEARCH
    • PRODUCT SPOTLIGHTS
    • QUALITY STORE
    • WHITE PAPERS
  • EMAG
    • eMAGAZINE
    • ARCHIVES
    • CONTACT
    • ADVERTISE
  • SIGN UP!
Software

Software

CART Analysis: Classification and Regression Trees

It’s a machine learning tool that makes analyzing multiple variables user friendly.

By Eric Hayler
QM0225-FEAT-Software-Hayler-p1FT-GettyImages-1171686909.jpg

Image Source: Laurence Dutton / E+ / Getty Images

January 21, 2025
✕
Image in modal.

I have been teaching Lean and Six Sigma tools to Black Belt and Green Belt candidates for about 20 years. The most difficult parts of the training for many students are the more intensive statistical methods such as ANOVA (analysis of variance) and regression. A typical Black Belt class is 16 to 20 days long and covers a variety of Six Sigma, Lean, and project management tools. The statistical tools are taught and practiced with example data sets and data generated from in-class exercises. Often help from a coach is needed the first time a student attempts to practically apply the ANOVA general linear model or multiple regression analysis. Collecting and formatting the data, verifying the fulfillment of data assumptions, successfully running the analysis, and interpreting the results can be tricky.

Data comes in two varieties: continuous (also called variable) data, and attribute data. Continuous data can be divided into smaller and smaller pieces down to the resolution of the measurement system. Examples include length, weight, and time. Attribute data falls into discrete buckets. Examples are geographic region, material, and supplier. In graphing and analyzing data there are input (X) variables and output (Y) variables, each of which may be continuous or attribute. This gives rise to four different combinations which in turn gives rise to the many types of graphs and analytics methods that are available. Tools that relate one input to one output are simpler and easier to think about than tools that relate multiple inputs to multiple outputs. Additional complexity is added when sets of inputs or outputs contain both continuous and attribute variables. The more complex the analysis the more potential caveats exist when interpreting the results. A skilled practitioner is needed to understand these caveats to create useful models that help people to make better decisions. This is part of the art of problem-solving.

In the years before the easy access to computers and user friendly statistical software packages, running complex analysis required many hours of study and practice, not to mention the time needed to perform the necessary calculations. Hardware and software advances have made these analyses accessible to a much larger audience. AI (artificial intelligence) promises another quantum leap in accessibility to complex analyses. The latest wave of tools involves several machine learning methods including classification and regression trees, or CART analysis. CART regression analysis relates one continuous output variable to multiple input variables. The input variables may be continuous or categorical (attribute). CART classification relates one categorical output variable to multiple input variables. The input variables may be continuous or categorical.

Consider a painting process where the goal is to reduce defects of all types and maximize the pass rate. Important input variables as determined by the problem-solving team includes air pressure, ambient temperature, paint viscosity, production shift, part type, and paint supplier. Pressure, temperature, and viscosity are continuous predictors (inputs). Shift, part, and supplier are attribute. A few rows of the data are displayed in figure 1. The variable input dialog box is shown in figure 2. The application used for the analysis is Minitab version 21.4. Purists will note that in this example pass rate is being treated as a continuous variable although percentages resulting from counting data are technically attribute.

QM0225-FEAT-Software-Hayler-fig1.jpg 

Figure 1

 

 QM0225-FEAT-Software-Hayler-fig2.jpg

Figure 2

 

The optimal tree diagram from the CART regression analysis is shown in figure 3. Node 1 at the top of the tree diagram shows the average and standard deviation for all 145 pass rates. In the analysis the most influential input variables on pass rate are determined, as well as the split points that result in maximum differential. The first split was made by shift. In node 2, the 24 pass rates from shifts 1 and 2 average 79.9% versus an average of 90.3% for the 121 pass rates from shift 3 shown in node 4. Note that the overall average pass rate is 88.6% as seen in node 1.

Node 2 is next split by viscosity. The largest differential occurs at a viscosity of 17.2 sec. Node 3 is the average of the 12 pass rates for viscosity above 17.2 sec. The average of the 12 pass rates with viscosity below 17.2 sec is 83.9%. The node is not split further in the analysis and is labeled terminal node 1. However, node 3 is split further in the analysis by shift. The average of shift 2 is 72.6% (terminal node 2) and the average of shift 1 is 78.4% (terminal node 3).

Returning to node 4 the pass rates for shift 3, the data is next split by viscosity using a split point of 17.0 sec. The higher pass rates in node 5 are further split by viscosity using a split point of 16.5 sec. Comparing all the terminal nodes, it is observed that the largest average pass rate is 93.3% in terminal node 4 and the smallest is 72.6% in terminal node 2. The difference is 20.7%. The best pass rates are found in the data from shift 3 with low viscosity. The worst pass rates are found in the data from shift 2 with high viscosity. The overall Relative Variable Importance is shown in figure 4.

QM0225-FEAT-Software-Hayler-fig3.jpg 

Figure 3

 QM0225-FEAT-Software-Hayler-fig4.jpg

Figure 4

 

A more conventional approach would be to perform a multiple regression analysis using both continuous and categorical predictors and go through a series of refinements. Alternatively, an ANOVA general linear model approach could be taken using a combination of categorical factors and continuous covariates. While it is possible these methods could result in several different competitive models, this author used both approaches to achieve the same model. The resulting Analysis of Variance table and regression equations are shown in figure 5. The low p-values in the ANOVA table show viscosity and shift as important drivers of pass rate, the same conclusion from the CART analysis. The three regression equations confirm that shift 3 has the highest relative pass rates (constant of 190.8) and shift 2 the lowest (constant of 179.0). The negative viscosity constants reflect that lower viscosity results in higher pass rates. Again, the same conclusion as the CART analysis.

QM0225-FEAT-Software-Hayler-fig5.jpg

Figure 5

 

For the CART analysis, once the data is collected and formatted the only knowledge needed is for the user to understand if a variable is an input or an output, and whether it is continuous or attribute data. To understand the results, the user needs to understand how to read the optimal tree diagram which is not greatly different from other types of tree diagrams. In the background the regression equations are calculated so the user can easily predict the pass rate for any set of conditions. To run the multiple regression analysis and the ANOVA general linear model, the user needs to be able to run multiple refinements, understand several goodness of fit criteria, run the tests for data assumptions, understand the assumption caveats, and at the end be able to interpret the results.

To summarize, CART (Classification and Regression Trees) is a machine learning tool that can handle both continuous and attribute data sets to identify important variables and assess their impact. It is more user accessible compared to traditional approaches.

READ MORE

  • ASQ Q&A
  • Control Charting Integration with Process Capability and Enhanced KPI Reports

LISTEN

  • Podcast: Low Cost Technology Applications in Lean
KEYWORDS: cart analysis lean manufacturing manufacturing metrology Six Sigma

Share This Story

Looking for a reprint of this article?
From high-res PDFs to custom plaques, order your copy today!

Eric Hayler is the Principal of the Hayler Group and a Lean Six Sigma Master Black Belt. He has led Continuous Improvement efforts at BMW Manufacturing and Amazon. Eric was the 2017 ASQ (American Society for Quality) Chair of the Board of Directors.  He is a graduate of Rutgers University, holds a PhD in Solid State Inorganic Chemistry, and is currently Adjunct Professor of Business Analytics at the University of South Carolina Upstate.

Recommended Content

JOIN TODAY
to unlock your recommendations.

Already have an account? Sign In

  • 2024 Quality Rookie of the Year Justin Wise 1440x750px banner with "Quality Rookie of the Year" logo inset

    Meet the 2024 Quality Rookie of the Year: Justin Wise

    Justin Wise is an exceptional individual who has been...
    Aerospace
    By: Michelle Bangert
  • Man with umbrella and coat stands outside while it rains at night looking at a building.

    Nondestructive Testing: Is there an ethics problem?

    I was a whistleblower who exposed fraudulent activities...
    NDT
    By: Dale Norwood
  • Unraveling Deflategate: Football stadium with closeup of football on field

    Unraveling the Tom Brady Deflategate

    The Deflategate scandal erupted following the 2014 AFC...
    Measurement
    By: Greg Cenker and Henry Zumbrun
Subscribe For Free!
  • eMagazine Subscriptions
  • eNewsletters
  • Online Registration
  • Subscription Customer Service
  • Manage My Preferences

More Videos

Popular Stories

Technician working with the Vision Engineering LVC200.

Difference Between Calibration and Verification

Woman working in quality control, measuring a workpiece.

AI’s Double-Edged Sword: Security and Compliance in Manufacturing

QM0525-FEAT-A3-Automation-p1FT-Quality-Inspection.jpg

The Next Frontier of Automation: Quality Assurance in an AI-Driven Era

May 21 Quality Hexagon Live Webinar

Events

May 21, 2025

The Evolution of Laser Radar: Measuring Large Scale From Distance With High Accuracy

This webinar, featuring a live demonstration, will showcase the evolution of Hexagon’s direct scanning laser trackers: cutting-edge technology that now delivers traditional reflector-tracking accuracy to non-contact, large-part scanning.

View All Submit An Event

Products

Lean Manufacturing and Service Fundamentals, Applications, and Case Studies

Lean Manufacturing and Service Fundamentals, Applications, and Case Studies

See More Products
Play Quality's captivating word-guessing game! There's a new word every Friday.

Related Articles

  • QM 0122 Software & Analysis: Calibration

    Using Simple Linear Regression For Instrument Calibration?

    See More
  • QM 1023 Software and Analysis MSA feature

    Measurement Systems Analysis

    See More
  • Engineer touching a car part during inspection in a car factory.

    Failure Modes and Effects Analysis (FMEA)

    See More

Events

View AllSubmit An Event
  • March 18, 2025

    Streamline CMM Programming and Data Management With PC-DMIS Powered by Nexus

    On Demand Nexus, Hexagon’s cloud-based innovation platform, boosts productivity for measurement creation, analysis, and asset management. Together, PC-DMIS and Nexus elevate metrology production operations to new heights.
View AllSubmit An Event
×

Stay in the know with Quality’s comprehensive coverage of
the manufacturing and metrology industries.

eNewsletter | Website | eMagazine

JOIN TODAY!
  • RESOURCES
    • Advertise
    • Contact Us
    • Directories
    • Store
    • Want More
  • SIGN UP TODAY
    • Create Account
    • eMagazine
    • eNewsletter
    • Customer Service
    • Manage Preferences
  • SERVICES
    • Marketing Services
    • Market Research
    • Reprints
    • List Rental
    • Survey/Respondent Access
  • STAY CONNECTED
    • LinkedIn
    • Facebook
    • YouTube
    • X (Twitter)
  • PRIVACY
    • PRIVACY POLICY
    • TERMS & CONDITIONS
    • DO NOT SELL MY PERSONAL INFORMATION
    • PRIVACY REQUEST
    • ACCESSIBILITY

Copyright ©2025. All Rights Reserved BNP Media.

Design, CMS, Hosting & Web Development :: ePublishing