This thesis develops Cost-Sensitive Ranking, Supervised Learning to directly minimize costs and risks from real industrial resource deployment policies, supported by two case studies with the power utility BC Hydro: Cost-Sensitive Learning to Rank and Risk Clearance. Our first solution is motivated to guide resource deployment within the top-k highest predicted impact events in sets of possible threats/items. For example, British Columbia storms can result in hundred thousands of customers losing power with global yearly storm outage losses exceeding billions of dollars. To prevent the most costly power outages before a storm, BC Hydro could apply data analytics models to recommend the top-k highest damage areas so they can act in these areas given their limited response resources. Though Regression could return the top-k highest predicted outage damage areas, Regression's emphasis on numeric prediction accuracy even on ``low damage events'' can compromise the ranking and thus the resulting preventative actions. Similarly, existing ranking solutions in Learning to Rank focus on web page search domains with vastly different priorities than deploying preventative resources as shown in our extensive studies. We thus define Cost-Sensitive Learning to Rank, a ranking problem of directly maximizing a company's cost-saving resulting from a ranking model; our suite of Cost-Sensitive Adaptions adapt most existing ranking models to this new problem, achieving up to 20% more cost-saving than competitors for identifying forest fires, crimes, quality assurance, and outages. While outage prevention was limited by available resources, our second problem is limited by ensuring regulatory standards while minimizing expensive costs. For example, BC Hydro's asset management for toxic electrical transformers must balance the cost of identification, breaking $100,000s of equipment to test for contaminated oil, with the risk of leaving a hazard unidentified, an unlikely toxic transformer leak's health and environmental damage. Because of the extreme difficulty in estimating health/environmental risks as a dollar value, company decision-making is instead guided by the ``clearance threshold'', a maximum acceptable probability of toxicity in unknown outcome transformers. Such ``under threshold'' transformers are within safety guidelines and are cleared from expensive tests. Other domains, such as regulations on maximum tolerable risk of device failures, rely on similar clearance thresholds to guide recalls or other actions. We therefore develop data analytics via clearance thresholds: Categorical Risk Clearance, to clear many future low-risk items to conserve expensive actions with a binary training set of hazards or non-hazards, and Numeric Risk Clearance, clearance with a numeric data set where an item is only a hazard if a numeric value exceeds a maximum such as exceeding 50 parts per million of toxins. The Categorical version only considers the all-or-nothing hazard/non-hazard label while the Numeric version can exploit the more informative numeric toxicities or badness. Because current Classification or Regression solutions assume inapplicable costs or fail to adapt to the specific goals of varying thresholds, we develop clearance guided solutions, CUT+ for Categorical Risk Clearance and RiskClear for Numeric Risk Clearance; in experiments, our methods are consistently more cost-effective in 56 of 57 tests across varying thresholds.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Wang, Ke
Member of collection