运筹学常用优化方法（运筹学应用之顶刊论文综述）

威哥 2023-05-19 07:36:48 773

运筹学常用优化方法（运筹学应用之顶刊论文综述）1. 动态契约中的最优监督设计Adverse events are harmful to a firm or to the society. In many occasions better effort in safeguarding a system can reduce the chance of such events. Consider the scenario in which a company (i.e. “principal”) hires a subcontractor (i.e. “agent”) to fulfill the duty of a safeguard. The agent’s effort is unobservable and the principal may from time to time conduct on-site moni

优化｜运筹学应用之顶刊Operations Research论文综述(68(5)期)

运筹学常用优化方法（运筹学应用之顶刊论文综述）(1)

『运筹OR帷幄』原创

作者：陈宇文（牛津大学工程系在读博士）翁欣（清华大学在读博士）

编者按

运筹学遍布世界的各个角落，从数学到计算机，从控制论到自动化，从机器学习到深度学习，从数字化到社会治理，我们都可以看到运筹学方法的影子。本篇推送我们从运筹优化的角度出发，为大家总结了19篇优秀的运筹学与其他学科交叉的论文的摘要供大家阅读。所有摘要均有中英文翻译，并附有原文链接。

OR 68(5) 期

Optimal Monitoring Schedule in Dynamic Contracts

Adverse events are harmful to a firm or to the society. In many occasions better effort in safeguarding a system can reduce the chance of such events. Consider the scenario in which a company (i.e. “principal”) hires a subcontractor (i.e. “agent”) to fulfill the duty of a safeguard. The agent’s effort is unobservable and the principal may from time to time conduct on-site monitoring. However monitoring is often costly to the principal. In a dynamic environment in which the principal can use both monetary payments and monitoring to induce effort how to optimally schedule them is a challenging and important problem. In “Optimal Monitoring Schedule in Dynamic Contracts ” Chen Sun and Xiao provide theoretical guidance on designing the optimal monitoring and payment schedules that always induce full effort from an agent. The authors formulate the contract design problem as a stochastic optimal control model and provide a complete characterization of the optimal solution. Their analysis suggests that the optimal dynamic contracts are simple to describe easy to compute and implement and intuitive to explain.

1. 动态契约中的最优监督设计

不良事件对企业或社会都是有害的。在许多情况下，我们可以采取一些系统保障措施可以减少此类事件的发生。考虑一个公司（“委托人”）雇佣一个分包商（“代理人”）来履行保障责任的情形：代理人的努力是难以观测的，委托人可能会不时地进行现场监督。然而，监督对委托人来说通常是成本昂贵的。在一个动态环境中，委托人可以使用金钱支付和监督这两种方式来促进代理人努力，如何最优地组合这两种方式是一个具有挑战性的重要问题。本文对如何设计使代理方始终尽最大努力的最优监督和支付计划提供了理论指导，将契约设计问题表示为随机最优控制模型，并给出最优解的完整描述。分析表明，最优动态契约是易于描述、容易计算实现、并且能够直观解释的。

作者：Mingliu Chen Peng Sun Yongbo Xiao

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2019.1968

Improving Ambulance Response Times in Developing Urban Centers

The lack of emergency medical transportation is viewed as the main barrier to the access and availability of emergency medical care in low- and middle-income countries (LMICs). Designing emergency response systems for urban centers in LMICs presents several unique challenges not encountered in high-income countries. For example it is not the cultural norm and often not physically possible because of congestion for motorists to yield for emergency vehicles. In “Ambulance Emergency Response Optimization in Developing Countries ” Boutilier and Chan develop a prediction-optimization framework tailored for emergency response optimization in developing urban centers and provide an in-depth investigation into various policy questions. To do this the authors traveled to Dhaka Bangladesh one of the largest cities in the world to conduct field research and collect data. The data were initially used to develop machine learning models that estimate the spatial-temporal demand and travel times through the city—a first in a developing urban center. The predictions were then leveraged to create uncertainty sets as part of a robust-optimization model that provides a unified framework for emergency response optimization under travel time and demand uncertainty. Their policy experiments found that the performance of the current system can realized with one third of the current resources and that a fleet of small motorcycle-style ambulances can capture three times more demand and reduce response times by up to 35%.

2. 发展中国家救护车应急响应优化

缺乏紧急医疗交通被视为中低收入国家提供紧急医疗服务的主要障碍。在中低收入国家的城市中心设计应急系统面临着在高收入国家没有的一些独特挑战。例如，文化规范没有要求驾驶员在交通拥挤时为紧急车辆让路（实际上在交通拥堵时也很难做到这一点）。本文针对发展中国家城市中心的应急响应优化开发了一个预测-优化框架，并对各种政策问题进行了深入调查。为了实现这一目标，作者们前往世界上最大的城市之一——孟加拉国首都达卡，进行实地研究并收集数据。这些数据最初被用于开发第一个针对发展中国家城市中心的、能估计城市的时空需求和出行时间的机器学习模型。然后，这些预测被用于创建作为鲁棒优化模型一部分的不确定性集：该模型为在出行时间和需求不确定性下的应急响应优化提供了统一框架。实验结果发现，当前系统的性能可以用当前资源的三分之一来实现，而一支小型摩托车样式的救护车车队可以捕获三倍以上的需求，并将响应时间最高减少35%。

作者：Justin J. Boutilier Timothy C. Y. Chan

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2019.1969

Dynamic Inventory and Price Controls Involving Unknown Demand on Discrete Nonperishable Items

When a new product has just been introduced or the economy has just entered a new phase a firm is often at a loss as to what the underlying demand pattern has become let alone how best to respond to it. A natural idea is to manage ordering and pricing activities while simultaneously learning about the demand pattern. In “Dynamic Inventory and Price Controls Involving Unknown Demand on Discrete Nonperishable Items ” Katehakis Yang and Zhou formalize this idea. They design control policies that adapt with demand observations made over time. A policy’s regret is the lag of its performance behind that of an allknowing one tailor-made for a specific demand pattern. For the proposed policies the worst regrets over large ambiguity sets of unknown demand patterns are found to grow reasonably slowly over time.

3 包含非腐败物品未知需求的动态库存与价格控制

在刚推出新产品或经济进入新阶段时，企业往往对潜在的需求模式会感到茫然无措，更不用说如何最好地应对它了。一个自然的想法是在管理订购和定价活动的同时，了解需求模式。本文就是基于这一想法构建的：作者根据随着时间推移进行的需求观察来设计控制策略。通常来说，一项策略的损失函数是用来描述策略表现的滞后性（相比于针对特定需求模式量身定制的策略的滞后性）。根据作者提出的策略，在未知需求模式具有大量不确定性的条件下的最坏损失随着时间的推移保持在合理的缓慢增长速度。

作者：Michael N. Katehakis Jian Yang Tingting Zhou

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2019.1974

Waterfall and Agile Product Development Approaches: Disjunctive Stochastic Programming Formulations

When engaging in the development of new products the primary objective of start-up companies is to generate a specified return level quickly and with high confidence. Achieving this goal is complicated because of uncertainties in projects’ returns and durations. In “Technical Note—Waterfall and Agile Product Development Approaches: Disjunctive Stochastic Programming Formulations ” Kettunen and Lejeune develop new disjunctive chance-constrained programming models that capture this goal. The first static model reflects the traditional waterfall product development process whereas the second one is dynamic and depicts the agile product development process. Kettunen and Lejeune design a novel reformulation method and a decomposition algorithm and use them on a new product development problem encountered by a U.S.-based software start-up company. The results reveal that high confidence in reaching a certain return can be achieved by investing in projects with a longer development time and higher risk. Additionally overlooking the capability to make dynamic decisions as allowed by the agile approach leads to overestimating the time needed to obtain the targeted return.

4. 瀑布式和敏捷式产品开发方法:分离随机规划建模

在从事新产品开发时，初创公司的首要目标是快速确定一个具有大概率可能性的特定回报水平。由于项目收益和时长的不确定性，实现这一目标很复杂。本文开发了新的分离机会约束规划模型来实现这个目标。第一个静态模型反映了传统的瀑布式产品开发过程，而第二个模型则是动态的，并且描述了敏捷式产品开发过程。作者设计了一种新的重塑方法和分解算法，并将它们用于一家美国软件初创公司遇到的新产品开发问题。获得的结果是反直觉的。结果表明，通过对开发时间较长、风险较高的项目进行投资，可以实现获取一定回报的高可靠性。此外，忽视敏捷方法所允许的动态决策能力，会导致高估获得目标回报所需的时间。

作者：Janne Kettunen Miguel A. Lejeune

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2019.1977

Pricing and Prioritization in a Duopoly with Self-Selecting Heterogeneous Time-Sensitive Customers Under Low Utilization

Research in service operations management generally considers prioritization beneficial for service providers (SPs) as they can better differentiate their customers. The healthcare entertainment restaurant and airline industries use prioritization in some form for their service offerings. However what happens when SPs compete? How will they provide service? In “Technical Note—Pricing and Prioritization in a Duopoly with Self-Selecting Heterogeneous Time-Sensitive Customers Under Low Utilization ” Sainathan answers these questions by analyzing a three-stage game between two SPs. In the first stage they decide whether to prioritize; in the second stage they set their prices; and in the third stage customers self-select and purchase their best service options. The author identifies six possible equilibria: high price low price regular extreme reduced class and differentiated. He also derives conditions for each of them to be the overall equilibrium. In particular the author shows there are many situations in which at least one SP chooses not to prioritize.

5. 在低利用率下的具有自选择、异构、时间敏感的客户的双头垄断市场定价和优先排序

服务运营管理的研究通常认为设置优先级对服务提供商有益，因为他们可以更好地区分客户。医疗、娱乐、餐饮和航空业都以某种形式为其服务设置优先级。然而，当服务提供商之间竞争时会发生什么呢？他们将如何提供服务？本文通过分析两个服务提供商之间的三阶段博弈来回答这些问题。在第一阶段，他们决定是否设置优先级；在第二阶段，他们确定价格；在第三阶段，顾客可以自行选择并购买自己最喜欢的服务。作者确定了六种可能的均衡：高价格均衡、低价格均衡、普通均衡、极端均衡、简化均衡和差异化均衡，还推导出了每一个成为总体均衡的条件。同时，作者特别指出在许多情况下，至少有一个服务提供商选择不设置优先级。

作者：Arvind Sainathan

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.1983

A Stochastic Integer Programming Approach to Air Traffic Scheduling and Operations

To mitigate flight delays air traffic managers have access to two main levers: ground-holding programs (operational interventions at the tactical level) and demand management (scheduling interventions at the strategic level). In “A Stochastic Integer Programming Approach to Air Traffic Scheduling and Operations ” Wang and Jacquillat optimize these two sets of decisions in a network of airports under weather-related uncertainty. The problem is formulated as a two-stage stochastic program with integer recourse. To solve it the authors propose a decomposition algorithm based on new optimality cuts (leveraging the dual relaxation of the second-stage problem) and accelerate it by using neighborhood constraints (switching from exploration to exploitation in later iterations). Results suggest that limited scheduling interventions can lead to large delay reductions and that significant benefits can be achieved through scale integration (by optimizing scheduling and operating decisions in the entire network) and scope integration (by optimizing scheduling and operating decisions together).

6. 一种用于航空交通调度与运营的随机整数规划方法

为了减少航班延误，航空交通管理人员可以使用两种主要手段：地面等待（战术层面的运营干预）和需求管理（战略层面的调度干预）。本文中，作者在与天气相关的不确定性的条件下优化了机场网络中的这两组决策。该问题被建模为具有整数搜索的两阶段随机规划。为了求解上述模型，作者提出了一种基于新的最优分割（利用第二阶段问题的对偶松弛）的分解算法，并通过使用邻域约束（在后续迭代中从探索转换为利用）来进行算法加速。结果表明，通过规模集成（通过优化整个网络的调度和运营决策）和范围集成（通过同时优化调度和运营决策），有限的调度干预可以减少大量延迟。

作者：Kai Wang Alexandre Jacquillat

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.1985

Adaptive Matching for Expert Systems with Uncertain Task Types

Two-sided online markets often propose matches based on imperfect knowledge of the characteristics of the two parties to be matched. Such uncertainty may result in inferior matches and may in turn incur negative externalities: A matched resource becomes unavailable to a more suitable party at least for a while. For example in labour platforms if an expert is matched to a task which does not meet his or her expertise then the other tasks which meet the expert’s expertise may suffer. Similarly in hospitality platforms an economical accommodation becomes unavailable to a financially constrained customer if it is matched to a flexible customer. Thus to optimize the long-term efficiency a platform needs to optimize exploration-exploitation trade tradeoff while accounting for the resource usage. In “Adaptive Matching for Expert Systems with Uncertain Task Types ” Shah Gulikers Massoulié and Vojnovi´c address this challenge by developing a new model and optimal algorithm for a task-expert matching system where a task is matched to an expert using not only the prior information about the task but also the feedback obtained from the past matches.

7. 具有不确定任务类型的专家系统自适应匹配

双边在线市场通常在不完全了解待匹配双方特征的基础上提出匹配。这种不确定性可能会导致较差的匹配结果，并进而导致负外部性：至少在一段时间内，更合适的一方无法获得所需资源。例如，在劳务平台中，如果一名专家负责一项不符合其专长的任务，则符合该专家专长的其他任务可能会受到影响。同样地，在酒店服务平台中，如果与没有财务限制的客户相匹配，经济实惠的住宿对于财务拮据的客户来说就变得不可能了。因此，为了优化长期效率，平台需要在优化探索-利用平衡的同时考虑资源利用情况。本文中，作者提出一种用于任务-专家匹配系统的新模型和优化算法，当该系统将一个任务分配给一名专家时，不仅会使用关于该任务的先验信息，也考虑来源于历史匹配的反馈。

作者：Virag Shah Lennart Gulikers Laurent Massoulié Milan Vojnović

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2019.1954

Production Scheduling for Strategic Open Pit Mine Planning: A Mixed-Integer Programming Approach

Production scheduling is a large-scale optimization problem that must be solved on a yearly basis by every open pit mining project throughout the world. Surprisingly however this problem has only recently started to receive much attention from the operations research community. In “Production Scheduling for Strategic Open Pit Mine Planning: A Mixed-Integer Programming Approach ” Rivera Espinoza Goycoolea Moreno and Muñoz propose an integer programming methodology for tackling this problem that combines new classes of preprocessing schemes cutting planes heuristics and branching mechanisms. This methodology is shown to compute near-optimal solutions on a number of real-world planning problems whose complexity is beyond the capabilities of preexisting approaches.

8. 有策略性的露天矿生产计划：混合整数编程方法

生产计划是一个大规模的优化问题，全世界每个露天采矿项目的生产计划都必须按年为时间单位做规划。但是，令人惊讶的是，这个问题直到最近才开始受到运筹学界的广泛关注。本文中，作者提出了一种整数规划方法来解决此问题，该方法结合了新的预处理方案、切割平面、启发式方法，以及分支机制。该方法被证明可以对许多现实世界中的规划问题计算出接近最优的解决方案，而这些问题的复杂性超出了现有方法的解决能力。

作者：Orlando Rivera Letelier Daniel Espinoza Marcos Goycoolea Eduardo Moreno Gonzalo Muñoz

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2019.1965

Data-Based Dynamic Pricing and Inventory Control with Censored Demand and Limited Price Changes

Pricing and inventory replenishment are important operations decisions for firms such as retailers. To make these decisions effectively a firm needs to know the demand distribution and its dependency on selling price which is usually estimated using sales data at various testing price levels. Although more testing prices can lead to a better estimation of the demand–price relationship frequent price changes are costly and come with adverse effect such as customers’ negative perception. In “Technical Note—Data-Based Dynamic Pricing and Inventory Control with Censored Demand and Limited Price Changes ” by Chen Chao and Wang data-driven algorithms are developed that learn the demand structure with constraints on the number of price changes. These algorithms are shown to converge to the optimal *clairvoyant* solution and the convergence rates are the best possible in terms of profit loss.

9. 需求和价格变动受限的基于数据的动态定价和库存控制

定价和库存补充是零售商之类公司的重要运营决策。为了有效地做出这些决策，公司需要了解需求分布及其对售价的依赖性，这通常是通过使用各种测试价格水平下的销售数据来估算需求的分布。尽管更多样的测试价格可以更好地估计需求价格关系，但频繁的价格变动会带来很高的成本，并会随之带来例如客户负面评价等的负面影响。~~ ~~在本文中，作者提出一种新的数据驱动算法，该算法可在带价格变化次数的约束条件下学习需求结构。这些算法被证明可以收敛到最优结果，并且收敛速度就利润损失而言是最好的。

作者：Boxiao Chen Xiuli Chao Yining Wang

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2020.1993

Robustness of Linear Contracts Under Moral Hazard

Linear contracts and their variants are quite popular in practice for example salesforce incentives and chief executive officer compensation. However agency theory typically stipulates complex contract forms. In “Robust Contract Designs: Linear Contracts and Moral Hazard ” Yu and Kong provide an alternative explanation for the popularity of linear contracts: The robustness to model uncertainty renders the linear or generalized linear forms of the contracts under moral hazard. They adopt the worst case decision criterion and robust incentive compatibility to ensure that the agent always behaves. The results are robust to general effort-contingent distributions and the risk-averse agent. These findings also shed light on how to design robust contracts when firms are facing model uncertainty or incomplete model information.

10. 道德风险下线性契约的鲁棒性

线性契约及其变体的应用在现实生活中非常常见，例如销售人员激励和首席执行官薪酬。但是，代理理论通常规定了复杂的合同形式。本文中，作者为线性契约的流行提供了另一种解释：模型不确定性的鲁棒性使道德风险下的契约具有线性或广义线性形式。作者采用最坏情况的决策标准和具有鲁棒性的激励兼容机制来确保代理人始终表现良好，结果显示此方法对于一般的基于努力程度的分布和对风险规避的代理人都具有鲁棒性。同时，这些发现还阐明了当公司面临模型不确定性或模型信息不完全时，如何设计契约使之具有鲁棒性。

作者：Yimin Yu Xiangyin Kong

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2020.1994

Interior-Point-Based Online Stochastic Bin Packing

Static bin packing is the problem of partitioning a set of items (with scalar sizes) into identical bins of a given capacity under the constraint that the total size of no partition exceeds the bin capacity. In the online variant the list of items is revealed one at a time—an item must be assigned irrevocably to a partition at the time it is revealed. In “Interior-Point Based Online Stochastic Bin Packing ” Gupta and Radovanovi´c look at the online variant under the assumption that the item sizes are independent samples from a common unknown distribution. The authors show that a greedy heuristic that assigns items so as to minimize a carefully chosen penalized Lagrangian of the offline math program is asymptotically optimal. Since the proposed heuristic is greedy in nature it extends readily to nonstationary item size sequences for which the authors prove a novel family of performance bounds.

11. 基于内点法的在线静态装箱问题

静态装箱问题是在物品总体积超过箱容量的约束条件下，将一组物品划分到具有相同容量的箱子中的问题。在线装箱问题中，每次更新只显示一个新的物品-这件物品必须在显示物品时立即进行分配装箱且之后不可更改。本文中，作者在假设商品大小是来自共同的未知分布的独立样本的前提下研究了在线装箱问题。作者表明，一种贪婪的启发式方法是渐进最优的（以最小化一种精心设计的离线数学规划的惩罚拉格朗日法来进行物品分配）。由于提出的启发式方法本质上是贪婪的，因此该方法能够很容易地拓展到非平稳物品大小的序列的情形；针对这种情形，作者给出了一系列的性能边界结果。

作者：Varun Gupta Ana Radovanović

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2019.1914

Nonstationary Bandits with Habituation and Recovery Dynamics

In many sequential decision-making settings where there is uncertainty about the reward of each action frequent selection of specific actions may reduce expected reward while choosing less frequently selected actions could lead to an increase. These effects are commonly observed in settings ranging from personalized healthcare interventions and targeted online advertising. In “Nonstationary Bandits with Habituation and Recovery Dynamics ” Mintz Aswani Kaminsky Flowers and Fukuoka address this problem. The authors propose a new class of models called ROGUE (reducing or gaining unknown efficacy) multiarmed bandits. They present a maximum likelihood approach to estimate the parameters of these models and we show that these estimates can be used to construct upper confidence bound algorithms and epsilon-greedy algorithms for optimizing these models with strong theoretical guarantees. The authors conclude with a simulation study to show that these algorithms perform better than current nonstationary bandit algorithms in terms of both cumulative regret and average reward.

12. 具有惯性和恢复动态的非平稳老虎机

在许多顺序决策环境中，每个动作的回报都不确定，频繁选择特定动作可能会降低预期回报，而选择不那么频繁的动作反而可能得到奖励增加。这样的效果通常在个性化医疗保健干预措施和有针对性的在线广告等环境中可以观察到。本文中，作者解决了这个问题。作者提出了一类新的模型，称为ROGUE（减少或获得未知功效的）多臂老虎机。同时，他们提出了一种基于最大似然方法来估计这些模型的参数的办法，并且从理论上证明了这些估计值可用于构造上置信界算法和epsilon-greedy算法来优化模型。作者通过仿真研究得出结论，表明在累积遗憾和平均回报方面，这些算法的性能要优于当前的非平稳老虎机算法。

作者：Yonatan Mintz Anil Aswani Philip Kaminsky Elena Flowers Yoshimi Fukuoka

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2019.1918

Fast Best-Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

In several scientific and industrial applications it is desirable to build compact interpretable learning models where the output depends on a small number of input features. Recent work has shown that such best-subset selection-type problems can be solved with modern mixed integer optimization solvers. Despite their promise such solvers often come at a steep computational price when compared with open-source efficient specialized solvers based on convex optimization and greedy heuristics. In “Fast Best-Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms ” Hazimeh and Mazumder push the frontiers of computation for best subset-type problems. Their algorithms deliver near-optimal solutions for problems with up to a million features—in times comparable with the fast convex solvers. The author’s work suggests that principled optimization methods play a key role in devising tools central to interpretable machine learning which can help in gaining a deeper understanding of their statistical properties.

13. 快速最佳子集选择：坐标下降和局部组合优化算法

在一些科学和工业应用中，需要构建紧凑的，可解释的学习模型，同时我们期望模型的输出取决于少量的输入功能。最近的研究表明，此类最佳子集选择类型问题可以使用现代混合整数优化求解器来解决。尽管有其前景，但与基于凸优化和贪婪启发式的开源高效专业求解器相比，此类求解器通常要付出高昂的计算成本。在论文“快速的最佳子集选择：坐标下降和局部组合优化算法”中，作者将最佳子集类型问题的计算能力进行了提升。他们的算法可以为多达一百万个特征的问题提供近乎最优的解决方案，同时其时间可与快速凸型求解器相媲美。作者的工作表明，有原则的优化方法(principled optimization methods)在设计对可解释机器学习工具方面起着关键作用，可以帮助人们加深对其统计特性的理解。

作者：Hussein Hazimeh Rahul Mazumder

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2019.1919

Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors

Sequential decision-making problems with high measurement noise usually involve nonconcave value of information that is the value of information is not concave in the amount of information collected. In such cases existing single-period lookahead policies may be ineffective. In “Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors ” Han and Powell propose multiperiod lookahead policies that are effective in measuring the value of information and yet computationally tractable. They also establish consistency for the proposed policies a rare result for online learning. Finally they demonstrate the effectiveness of the approach in three applications: a health setting where one makes medical decisions to maximize healthcare response a dynamic pricing setting where one makes pricing decisions to maximize the cumulative revenue and a clinical pharmacology setting where one makes dosage controls to minimize the deviation between actual and target effects.

14. 基于离散先验的非线性信念模型的最优在线学习

具有高测量噪声的顺序决策问题通常包含信息的非凹值，即信息的值在收集的信息量中非凹。在这种情况下，现有的单周期前瞻策略可能无效。Han和Powell在论文“使用离散先验的非线性信念模型的最佳在线学习”中提出了多周期的超前策略，这些策略可有效地测量信息价值，但在计算上却易于处理。他们还为提出的策略建立了一致性，这是在线学习中很罕见的。最后，他们在三种应用中证明了该方法的有效性：一是健康医疗场景下，决策者做出医疗诊断来最大化健康医疗响应；二是动态定价场景下，决策者做出定价决定，以最大化累积收益；三是一个临床药理学场景下，决策者通过剂量控制以最小化实际效果与目标效果之间的偏差。

作者：Weidong Han Warren B. Powell

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2019.1921

Central Limit Theorems for Estimated Functions at Estimated Points

The need to estimate a function value from sample data at a point that is itself estimated from the same data set arises in many application settings. Such applications include value at risk conditional value-at-risk and other so-called distortion risk measures. In “Technical Note—Central Limit Theorems for Estimated Functions at Estimated Points ” Glynn Fan Fu Hu and Peng provide a simple proof for a central limit theorem for such estimators and provide an accompanying batching/sectioning methodology that can be used to construct large-sample confidence intervals in the presence of such estimators.

15. 估计函数在估计点的中心极限定理

在许多应用设置中都有这样的需求：根据一个点的样本数据估计函数值，而该点样本数据是从同一数据集估计得到的。此类应用包括风险价值(VaR)，有条件的风险价值(CVaR)和其他所谓的失真风险衡量。在本文中，作者提供了一个有关此类估计器的中心极限定理的简单证明，并提供了相应的批处理/切片方法，能够在此类估计器下构造大样本置信区间。

作者：Peter W. Glynn Lin Fan Michael C. Fu Jian-Qiang Hu Yijie Peng

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2019.1922

Decision Making When Things Are Only a Matter of Time

Suppose that “risk” does not concern “what” may happen but “when” something may happen. In “Decision Making When Things Are Only a Matter of Time ” Ebert analyzes time risk preferences: that is preferences toward the risk of something happening sooner or later. The author defines and characterizes “prudence” and “temperance” for discount functions in analogy to the corresponding concepts for utility functions. Prudent discounting and temperate discounting matter for time risk preferences and thus decisions in which an important event is only a matter of time. For example the disutility from climate change—which is only a matter of time—is underestimated when ignoring the uncertainty about when its undesired consequences come into effect.

16. 当事物仅与时间有关时的决策

本文假设“风险”与可能发生的“什么”无关而与可能发生的“何时”有关。作者在本文中分析了时间风险偏好：即对事物早晚发生的风险的偏好。作者利用了类似于效用函数(utility functions)中的对应概念来定义和表征折现函数(discount functions)中的“谨慎(prudence)”和“节制(temperance)”。谨慎和节制的折现对时间风险偏好很重要，也因此，它们对仅与时间相关因素的重要事件的决策同样重要。例如，当忽略负面后果何时生效的不确定性时，（只与时间有关的）气候变化的负效用被低估了。

作者：Sebastian Ebert

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2019.1923

Time Inconsistency of Optimal Policies of Distributionally Robust Inventory Models

In “Technical Note—Time Inconsistency of Optimal Policies of Distributionally Robust Inventory Models ” Shapiro and Xin extend previous studies of time inconsistency to risk averse (distributionally robust) inventory models and show that time inconsistency is not unique to robust multistage decision making but may happen for a large class of risk averse/distributionally robust settings. In particular the authors demonstrate that if the respective risk measures are not strictly monotone then there may exist infinitely many optimal policies which are not base-stock and not time consistent. This is in a sharp contrast with the risk neutral formulation of the inventory model where all optimal policies are base-stock and time consistent.

17. 分布式鲁棒库存模型的最优策略的时间不一致问题

本文中，作者将过往关于时间不一致性的研究扩展到了规避风险（分布式鲁棒）的库存模型，并表明时间不一致并不是鲁棒的多阶段决策所独有的，还可能发生在一大类风险规避/分布式鲁棒的环境中。特别是，作者证明了，如果相应的风险度量不是严格单调的，那么可能存在无限多种没有基本库存(base-stock)且时间不一致的最优政策。这与库存模型的风险中性公式形成鲜明对比，在该模型中，所有最佳策略都是有基本库存且时间一致的。

作者：Alexander Shapiro Linwei Xin

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2019.1932

Learning in Combinatorial Optimization: What and How to Explore

When moving from the traditional to combinatorial multiarmed bandit setting addressing the classical exploration versus exploitation trade-off is a challenging task. In “Learning in Combinatorial Optimization: What and How to Explore ” Modaresi Sauré and Vielma show that the combinatorial setting has salient features that distinguish it from the traditional bandit. In particular combinatorial structure induces correlation between cost of different solutions thus raising the questions of what parameters to estimate and how to collect and combine information. The authors answer such questions by developing a novel optimization problem called the lower-bound problem (LBP). They establish a fundamental limit on asymptotic performance of any admissible policy and propose near-optimal LBP-based policies. Because LBP is likely intractable in practice they propose policies that instead solve a proxy for LBP which they call the optimality cover problem(OCP). They provide strong evidence of practical tractability of OCP and illustrate the markedly superior performance of OCP-based policies numerically.

18. 组合优化中的学习：怎样去探索，如何去探索

从传统的多臂老虎机问题转变为考虑组合多臂老虎机问题时，解决经典的探索vs.利用策略平衡是一项艰巨的工作。作者指出，组合问题具有明显的、使其具有区别于传统的老虎机的特征。特别是，组合结构引起了不同解决方案成本之间的相关性，从而引发了有关估计哪些参数以及如何收集和组合信息的问题。作者通过设计一种称为下界问题（LBP）的新型优化问题来回答此类问题。他们对任何可接受的策略的渐近性能建立了基本的限制，并提出了基于LBP的近乎最优的策略。由于LBP在实践中可能很棘手，因此作者提出了解决LBP的替代策略，他们将其称为最优覆盖问题（OCP）。作者提供了OCP实用易处理性的有力证据，并从数值计算上说明了基于OCP的策略的显着优越性能。

作者：Sajad Modaresi Denis Sauré Juan Pablo Vielma

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2019.1926

Optimization of Tree Ensembles

Predictive models based on ensembles of trees such as random forests and gradient boosted trees are widely used in machine learning and data science. In many applications the features that these models use are controllable and can be regarded as decision variables. This leads to a natural prescriptive analytics problem: How should these features be set so as to maximize the value predicted by the tree ensemble model? In “Optimization of Tree Ensembles” Misic proposes a mixed-integer optimization (MIO) model of this problem proposes a hierarchy of approximations to this formulation based on truncating the trees at a particular depth and develops two specialized constraint generation methods for solving the problem at scale. Using real data sets including two detailed case studies in drug design and customized pricing the author shows how this approach can efficiently solve large-scale problem instances to full or near optimality and outperforms solutions obtained by heuristic approaches.

19. 树集成模型的优化

基于树的集合（例如随机森林和梯度增强树）的预测模型已广泛用于机器学习和数据科学中。在许多应用中，这些模型使用的特征是可控制的，因此可以视为决策变量。这导致了一个自然的规范分析问题：应如何设置这些功能，以使树集成模型预测的价值最大化？作者针对这一问题提出了一种混合整数优化（MIO）模型，基于在特定深度处截断树型，提出了对此公式的近似层次，并开发了两种专门的约束生成方法来大规模解决问题。利用真实的数据集，包括两个针对药物设计和定制定价的详细案例研究，作者展示了该方法如何有效地解决大规模问题实例，使最终结果能达到或接近最优、并且优于通过启发式方法获得的解决方案。

作者：Velibor V. Mišić

原文：

https://pubsonline.informs.org/doi/10.1287/opre.2019.1928

网站首页

返回栏目

运筹学常用优化方法（运筹学应用之顶刊论文综述）

猜您喜欢：

相关文章