About

1. Introduction

StarCraft II is a real time strategy game where players manage an economy, research technology, and build an army with the eventual goal of defeating an opponent. At any given time a player may be faced with 15 different ways to invest their cash flow, and a single game may be comprised of over 100 of these consecutive decisions. A significant challenge in StarCraft decision making is determining the optimal line of play to achieve a desired army as quickly as possible.

Professional players will use a series of heuristics, developed over tens of thousands of games, to determine what action to take. However, even with a professional’s experience, this process can be challenging. StarCraft does not lend itself to a simple ROI calculation; the payout of an action may not be fully realized until 40 actions later, and the optimal decision to execute is highly dependent upon what actions will be taken in the future.

ProBO Engine attempts to solve the following problem: Minimize the makespan required to achieve a given set of Protoss units. This was accomplished by decoupling the search space into two regimes: economic decisions and military decisions. The military regime was solved using a global branch and bound technique with several significant reframing’s of the search space, whereas the economic portion was solved using a heuristic-based local search assisted by a neural network trained on simulated data. These two regimes were iterated through repeatedly until convergence was achieved.

Previous work has applied genetic algorithms to this area of problems. Evolution Chamber, http://www.teamliquid.net/forum/starcraft-2/160231-zerg-build-order-optimizer, followed by SCFusion, http://www.teamliquid.net/forum/starcraft-2/168348-scfusion-wol-hots-and-lotv-build-order-optimizer are two examples. These have been particularly strong at solving basic strategies and handling nuanced conditions; however, the search time grows rapidly with builds over 40 actions and genetic algorithms tend to perform poorly at optimizing more subtle elements such as chrono boost. The proposed search algorithm is able to identify near optimal solutions, and often times optimal solutions, in under 5 seconds of computation time on a single pc for a wide range of inputs.

The following sections discuss the problem statement, separation of the military and economic regimes, optimization of the military regime, optimization of the economic regime, summary, and future areas of work.

2. Problem Statement

The problem in question can be framed as a resource constrained scheduling problem with the following characteristics:

11 renewable resources: nexus, production facilities, upgrade facilities
3 nonrenewable resources: minerals, vespene gas, chrono boost
Y required tasks, where Y is the number of units given + precedence items for these units
An additional set of 10 optional tasks that may each be performed an unlimited number of times: Probe, Nexus, Pylon, Assimilator, more of each production facility
The nonrenewable resource cost of a given task is paid for in full at the start of its execution
The renewable resource cost of a given task is paid for evenly across its entire duration

And the following distinctions:

The availability of each renewable and nonrenewable resource is a function of past decisions
The duration of each task is a function of past decisions, due to chrono boost and warp gates

A given action affects the current state through a complex function generated by the StarCraft game engine. Consequently, the search space is highly non-differentiable and discrete optimization techniques must be employed. The total search space for a standard set of units is on the order of 10⁷⁰.

3. Separation of the Military and Economic Regimes

The intuition for decoupling the decision space into two components is first explained, followed by a walkthrough of how this technique was implemented in StarCraft II.

3.1 Intuition

Consider the following hypothetical. A nation is entering into a war that is expected to last for 10 years and has two advisers: an economic adviser and military adviser. The best military policy is highly dependent upon how the economy will develop over the next ten years, and the best economic policy is highly dependent upon how many resources will be required by the military to win the war. In other words, both advisors need to know the other’s policy and forecasts in order to form their own policy. In this situation the military adviser may start with a vague idea of how many resources will be available, and generate a military policy under these vague economic projections. The economic adviser can then take this military policy and develop the economic policy best suited to meet the military’s needs. The military adviser can then repeat the process, now with a more refined economic policy. The process repeats until neither of the adviser’s policy’s change with respect to the other adviser’s policy, at which point a local optimum has been achieved.

The advantage of this problem architecture over attempting to develop a single policy is twofold:

Decisions made by each advisor can incorporate future projections by the other advisor, and are therefore more likely to be optimal.
The search space is exponentially reduced. If each advisor has 35 binary decisions in a given policy, the total search space for both advisors is 2*2³⁵ per iteration. However, a single collective policy would invoke a search space of 2⁷⁰.

The downside of this problem architecture is that convergence between the two advisors is not guaranteed to result in a global optimum. This can be largely addressed by restarting the algorithm with a different initial economic forecast, and ensuring that these initial forecasts cover a wide enough range of the total search space.

3.2 Implementation in StarCraft

The technique discussed above is implemented in StarCraft by separating events into two regimes:

Military Decisions. This includes units, production buildings, technology buildings, and upgrades, all of which may require or impose precedence constraints.
Economic Decisions. This includes probes, assimilators, nexi, and pylons, none of which require or impose precedence constrains.

The following algorithm is followed until the solution converges:

Develop an initial forecast of the availability of minerals, vespene gas, and chrono boost over the first 500 seconds of gameplay.
Given the defined forecast of nonrenewable resources, solve for the optimal configuration of all military decisions.
Given the ordering of military decisions, solve for the optimal configuration of all economic decisions
Determine the resulting economy from the fully developed build order.
Repeat step 2 with the new economic approximation from step 4 until the solution converges.

While this technique greatly reduces the search space, each configuration must still be processed by a simulated version of the game engine. This process is relatively time intensive, and therefore further reduction and intelligent search techniques must be applied to provide an optimal solution in a reasonable time. These techniques, applied to steps two and three of the above algorithm, are discussed in the following sections.

4. Optimization of the Military Regime

A basic search algorithm is first outlined to build a basic level of intuition behind the nature of the military search space. It should be noted that this algorithm was not directly used. Its inefficiencies are discussed, followed by a description of how the final implemented algorithm successfully reduces the search space to a manageable size.

4.1 Basic Search Algorithm

First recall that all economic decisions (probes, pylons, assimilators, and nexi) are not presently considered, and are instead replaced by the current iteration’s approximation of available minerals, gas, and chrono boost at each point in time. The military search space can be navigated with the following algorithm:

Determine the list of available actions based on the current state
Select a given action and perform at earliest time allowed by resources
Update current state to reflect time and resource change.
Repeat step one until available list of actions is zero
Record final time and repeat process with a different line of actions

An example of this process is illustrated below based on the build order “2 zealots and 5 stalkers”.

Initially a pylon is the only available option.
Pylon is selected, and performed when “available minerals > costPylon”
Available minerals is reduced by costPylon and time is updated to purchase point
Step one is repeated, now with gateway being the only option

In the next iteration of step 1 gateway, zealot, and cybernetics core are the available options. Assuming cybernetics core is selected, the future options are now gateway, zealot, and stalker. At this point the process iterates with these three options until all required units have been produced.

The primary challenge of this search procedure is that the search space grows exponentially. Consider a build order of 20 stalkers, 4 immortals, and 1 void ray. Once the basic technology structures have been built, a player is faced with six decisions to make for about 26 iterations, resulting in 6²⁶ total build order combinations. Chrono boost complicates this procedure further.

Chrono boost is a resource provided about every 60 seconds by the nexus that allows a player to accelerate the production rate of a building by 50% for 20 seconds. If chrono boost is selected as the next option, it may not be optimal to perform once it is immediately available. If the future time space is discretized into 100 nodes, then any available chrono boost results in 100 new branches of each existing build order. A standard build order may contain about 8 chrono boosts. The above search space is therefore in the neighborhood of 6²⁶ *100⁸.

This exponential explosion of the search space makes global search techniques appear out of the question. However, there is one principle that when used with constraint programming techniques significantly reduces the search space:

Acquired nonrenewable resources do not decay over time in StarCraft II.

When systematically employed, this principle allows for a powerful reduction technique that eliminates over 99% of the search space. Its implications and why it makes certain global search techniques feasible are later discussed. First, however, the constraint space is more formally defined to lay a foundation for the preceding topic.

4.2 Constraint Space Definition

Any input to the optimization problem in question will have a series of required tasks and optional tasks. Only the required tasks are considered for the time being. Assume that a single build iteration has been performed and generated a list of n tasks ordered by start time T₁…T_n. Task i can be referenced by T_i. If T_i also uses a renewable resource (production facility), it can also be referenced by R_i. R₁…R_j represents all tasks in order that used the same renewable resource.

The intent of this notation is to be able to quickly reference the successor and predecessor on the same renewable resource R and the successor and predecessor on the general event timeline T.

t_T_i = time task T_i starts

C_T_i = cost of task T_i (vector with two terms-minerals & vespene gas)

D_T_i= duration of task T_i

P_T_i = List of all tasks that must be completed before T_i can start

Q_Ti= List of all tasks that require T_i to be done before they may start

M = makespan of total build

A(mineralCost, gasCost) = function that returns earliest time total cost can be afforded

Given this notation, a series of constraints can be written around the allowable values of t_Ti:

(1)	t_Ti <= M - D_Ti	General makespan constraint
(2)	t_Ti <= P_Ti - D_Ti	Precedence constraint
(3)	t_Ri-1+ D_Ri-1 <= t_Ri<=t_Ri+1 - D_Ri	Renewable resource constraint
(4)	t_Ti >= A(_Th)	Nonrenewable resource constraint

Now assume that the value of t is fixed for all tasks except for one, T_i. Combining the four constraints listed above, t_Ti can take on any value that meets the following condition:

Max(A(_Th), t_Ri-1+ D_Ri-1) <= t_Ti <= Min(M - D_Ti, P_Ti - D_Ti, t_Ri+1 - D_Ri)

Altering t_Ti within this range has no direct effect on M, but it does affect the constraints imposed on the other tasks. This effect is outlined in the following table.

Decrease t_Ti	Increase t_Ti
Constraint 2 relaxes for Q_Ti	Constraint 2 restricts for Q_Ti
Constraint 2 restricts for P_Ti	Constraint 2 relaxes for P_Ti
Constraint 3 relaxes for t_Ri+1	Constraint 3 restricts for t_Ri+1
Constraint 3 restricts for t_Ri-1	Constraint 3 relaxes for t_Ri-1
Constraint 4 is constant or restricts for all tasks	Constraint 4 is constant or relaxes for all tasks

The final row entries are justified as follows. The summation within A() only covers all tasks with start times before the selected task i. As t_Tidecreases, other tasks begin to have their start times after t_Ti, in which case C_T_ienters that task’s A() function, and the constraint restricts for that task. On the other hand, if t_Tiincreases, t_Tiis guaranteed to grow faster than A() because all nonrenewable resources do not decay with time. It is this factor that breaks the symmetry and allows the system to be solved.

4.3 Solution Algorithm

The final built order is solved in reverse by starting at an arbitrary end time M and working backwards through time. To build the intuition for how this is done, first consider a sequence that results in a makespan of M. Imagine that the goal exists to reduce M by an arbitrarily small amount to M*. Note that for any task T_i, increasing t_Tiwill cause Constraint 2 to restrict for Q_Ti and t_Ri+1. However, Q_Tiis null if T_iis not a precedence constraint for any other tasks, and t_Ri+1is null if T_iis the last unit produced out from its respective renewable resource.

Given that troops in StarCraft do not impose precedence constraints on other tasks, and each renewable resource must have a final task that uses it, there will always be a task were these two restrictions are null. If this task’s end time is not M, its start time is increased until t_Ti = M* - D_Ti, where M* is a time arbitrarily smaller than M. The following algorithm can perform this iteration until M has been successfully reduced to M*:

(To complete write up as time allows)