patrolling games on general graphs with time-dependent ...gursoy.rutgers.edu/papers/patrolling games...

Patrolling Games on General Graphs with1

Time-Dependent Node Values*2

Abdolmajid Yolmeh1 and Melike Baykal-Gursoy*�13

1Industrial and Systems Engineering Department, Rutgers University, 964

Frelinghuysen Rd, Piscataway, NJ 088545

Abstract6

Scheduling and deployment of patrols is an important operational decision in safe-7

guarding an area against adversarial invasion or illicit activity. Patrolling game models8

are useful when dealing with strategic adversaries. Most models of patrolling games on9

graphs assume that all nodes have the same value or that their values are fixed through-10

out time. In addition, the time needed to initiate a successful attack on a node, called11

the attack time, may not depend on the node. However, in reality, the nodes may have12

different values and these values may change over time. In this paper, we consider a13

general patrolling game model on a general graph with time-dependent node values and14

node-specific attack times. The model allows for multiple attackers and patrollers con-15

trolled by a single adversary and a single defender, respectively. This assumption leads to16

exponentially large number of possible pure strategies for both the defender and the ad-17

versary. In order to tackle this problem, we propose a solution approach that depends on18

both column and row generation. We then run numerical experiments to investigate the19

efficiency of the proposed approach and to gain insight about the value of the patrolling20

game. Finally, we apply our method to a real case of an urban rail network in a major21

US city. Our results show the efficiency of the proposed solution approach. They also22

demonstrate a diminishing returns effect when increasing the number of patrollers.23

*This material is based upon work supported by the National Science Foundation under Grant No.1436288�Email address: [email protected]; Corresponding author

1

Keywords: patrolling games, time-dependent node values, multiple patrollers, multiple1

attackers, zero-sum games2

1 INTRODUCTION3

One of the most important issues in homeland security is protecting critical infrastructures4

against terrorist attacks (Moteff, 2005). Among these infrastructures, transportation systems,5

serving 32 million passengers daily in the United States, are critical for supporting the national6

security and economic well-being. Public surface transportation systems such as trains, metros,7

subways and buses offer terrorists easy access to crowds of people. This makes them especially8

attractive to terrorists seeking high body counts. Such open systems are considered to be9

soft targets by the terrorists. Bombings in Brussels and Istanbul along with many other cases10

indicate that terrorists tend to target such large crowds to cause mass human casualties in11

addition to panic and chaos. Therefore, it is important to protect such infrastructures.12

Analyzing the risk associated with attack to each infrastructure component, mitigation13

planning and designing efficient response policies could substantially reduce the threat to these14

infrastructures. One component of such planning is designing efficient patrols to secure vul-15

nerable areas. One of the challenges involved in designing patrol schedules to safeguard open16

mass transit systems and other soft targets is time-dependent node values. Because the adver-17

sary’s primary objective is to inflict human casualties, the node values depend on the number18

of people residing in those nodes. These numbers change over time and the terrorists tend to19

time their attacks according to these changes (Jenkins and Trella, 2012). Another challenge is20

to develop efficient methods to design patrols for a general network. In this paper, we try to21

address these challenges in a patrolling game setting.22

Patrolling problems arise in many situations in real life. Police officers patrol cities; security23

officers patrol terminals at airports and transportation centers; security guards patrol museums24

and shopping malls. Patrolling problems involve decisions on how to route a patroller through25

many locations in order to safeguard the area from adversarial intruders or illicit activity.26

With recent advancements in technology, patrolling decisions arise even more frequently with27

applications in routing unmanned aerial vehicles and robots.28

The patrolling problems have been studied since 1970s. Several studies have focused on29

2

allocating patrols to different areas to optimize performance measures such as patrol delays,1

average waiting time and total response time (Chaiken and Dormont, 1978; Chelst, 1978; Lar-2

son, 1972; Olson and Wright, 1975). These studies assume that crime frequency in different3

regions remain fixed and known to the patroller. However, this is not a realistic assumption4

due to the strategic behaviour of the adversaries. In other words, the adversaries can change5

their strategy in response the patroller’s strategy. Therefore, game theoretic analysis of such6

problems yields more realistic results.7

Basilico et al. (2012) introduce a two-player multi-stage security game with an underlying8

infinite horizon setting in which there are potentially infinitely many decision nodes. In this9

model, the attacker has the complete knowledge of the strategy to which the patroller com-10

mitted to. The attacker can also observe the location and movements of the patroller at any11

time and chooses his best attack strategy based on this information. They study Markovian12

strategies of different orders for this problem and show that, even though first order Markovian13

strategies may not always be optimal, they have comparable quality with respect to higher14

order Markovian strategies. Basilico et al. (2017) consider a similar patrolling game model15

with the patroller employing spatially uncertain alarm signals. They prove that this problem is16

NP-hard for a general graph, they also show that for special graphs, like paths or cycle graphs,17

the optimal strategy can be found in polynomial time. Infinite horizon nature of the games18

studied in Basilico et al. (2012) and Basilico et al. (2017) leads to the application of stationary19

Markovian strategies by the patroller. This means that the timing of attacks becomes irrelevant20

in such games. However, this may not be valid in realistic situations, for example, when node21

values change over time.22

Alpern et al. (2011) introduce a finite horizon patrolling game played on a graph, Q, with n23

nodes. The game has two players, a defender patrolling a set of nodes on Q, and an adversary24

targeting a node to attack. The adversary needs m consecutive periods, uninterrupted by the25

defender, to successfully damage the node. He aims to maximize the probability of a successful26

attack, while the defender tries to minimize this probability. Hence, the proposed model is a27

zero-sum game and the solution to this game is called a saddle point (Fudenberg and Tirole,28

1991). Papadaki et al. (2016) study the same patrolling problem on a line graph. They solve29

this patrolling game for any values of m and n, to find a saddle point.30

3

Both Alpern et al. (2011) and Papadaki et al. (2016) assume that all nodes have the same1

value, they also assume that the attack time, m, is fixed and does not depend on the node2

under attack. However, as we have discussed earlier in this section, these assumptions may3

not be valid in reality. Especially in public transportation systems and other soft targets when4

node values represent the number of people present, called occupancy level, different nodes may5

have different values and these values may change over time. Morevover, some nodes may be6

harder to attack than others, therefore, it may take more time to launch a successful attack.7

Lin et al. (2013) attempt to address this gap by studying patrolling models with different8

node values and attack time distributions. They consider both random and strategic attackers.9

A random attacker uses a fixed and known probability distribution to launch attacks on nodes;10

while a strategic attacker plays a zero-sum game with the patroller. The authors develop11

linear programming models to find the optimal solutions for both players. They also propose12

index-based heuristics to solve the problems of larger size. Lin et al. (2014) extend the model13

of Lin et al. (2013) by allowing imperfect detection. In other words, there is a possibility of14

observing a false negative when the patroller inspects a node. They introduce efficient index15

based heuristics to obtain near optimal policies in a reasonable amount of time. Although Lin16

et al. (2013, 2014) resolve some of the shortcomings of the previous models, their models still17

do not consider time-dependent node values and the importance of the timing of attacks.18

There are a number of studies accommodating multiple patrollers in their models. Jain19

et al. (2010) study Stackelberg security games with arbitrary schedules and multiple patrollers.20

They develop a branch and price algorithm to efficiently solve this game. Their algorithm21

involves a column generation step that exploits a novel network flow representation avoiding22

the combinatorial explosion of schedule assignments. Korzhyk et al. (2011) investigate the23

case with multiple defenders and multiple attackers where the attacker can attack multiple24

targets. They propose a polynomial time algorithm to find the Nash equilibrium for this game.25

Hochbaum et al. (2014) consider a patrolling problem with multiple patrollers (vehicles) on26

a network with edges targeted by a strategic adversary. They present a novel decomposition27

approach that requires the solution of a multivehicle rural Chinese postman problem (Guan,28

1962). Lou et al. (2017) model a security game with multiple decentralized defenders in charge of29

defending disjoint subsets of, possibly interdependent, targets. They analyze the existence of a30

4

Nash equilibrium for this game under various conditions. Lagos et al. (2017) study a Stackelberg1

security problem with multiple patrols and multiple targets. They propose a branch and price2

approach to efficiently solve this problem. McGrath and Lin (2017) investigate a patrol problem3

with multiple patrollers and dispersed heterogeneous attack locations. Their model accounts for4

the travel time between nodes, and includes node-specific features such as the inspection time,5

the time required for the adversary to carry out an attack, and the cost of a successful attack.6

They show that, for the case of a single patroller, the optimal solution can be obtained by7

solving a linear program. For the case with multiple patrollers, they propose heuristic solutions8

based on shortest paths and set partitions.9

Majority of the papers in the literature of patrolling games assume that a single adversary10

chooses a target to attack, the target values are fixed over time, and some even assume that11

all targets are indistinguishable, i.e., they all have the same value. However, this is not the12

case in many realistic situations. For example, at a transportation facility, the number of13

people, occupancy level, at each location may be considered as the value of that location.14

Moreover, occupancy levels may change over time, it is expected that during the rush hours15

the occupancy levels would be higher than normal hours. In this paper, we study a patrolling16

game model with time-dependent node values, node-specific attack times, multiple patrollers17

and multiple attackers. We propose a solution approach to efficiently solve the game under18

general graphs. The computational results show the efficiency of the proposed approach. The19

rest of this paper is organized as follows. In section 2, the problem under consideration is20

described. The proposed solution approach is explained in section 3. Section 4 presents the21

computational results. Finally, the main conclusion of the paper and suggestions for future22

research are presented in section 5.23

2 PROBLEM DESCRIPTION24

In this section, we describe the problem under consideration. Our work extends the model25

of Alpern et al. (2011) by considering different and time-dependent node values, node-specific26

attack times, multiple patrollers and multiple attackers. Here is a list of parameters of the27

model:28

� N : Number of nodes.29

5

� N = {1, 2, . . . , N} : Set of nodes.1

� i, j ∈ N : Node indices.2

� d ∈ D : Index of patrollers.3

� a ∈ A : Index of attackers.4

� T : Number of patrolling time periods.5

� T = {0, 1, . . . , T − 1} : Set of time periods in the time horizon.6

� t, τ ∈ T : Time period indices.7

� mi : Attack time, consecutive number of periods needed to attack node i. Let m =8

(m1,m2, . . . ,mN).9

� E : Set of edges, where (i, j) ∈ E if there is an edge between nodes i and j.10

� cit : Value of node i at time t. Let c = [cit] be a N ×T matrix containing all of cit values,11

with element in row i and column t being cit.12

The patrolling game G = G(Q, T,m, c) introduced in this paper is a zero-sum game between13

a defender (she) and an adversary (he). The defender controls a set of patrollers D. The14

adversary controls a set of attackers A. The game is played on a connected graph Q = (N , E)15

with the set of nodes N and the set of edges E over the time horizon T .16

A pure strategy for the adversary is to select a pair (ia, Ia) for each attacker, a ∈ A, where17

ia ∈ N is the target node and Ia is the attack interval defining the beginning, τa, and the end of18

the attack, which is a set of mj consecutive time periods, i.e., Ia = {τa, τa+1, . . . , τa+mj−1} ∈19

T , where j = ia. We can also represent an attack strategy as (ia, τa). Note that, for each attacker20

a, the start of attack interval, τa, should be early enough for the attack interval to finish before21

the end of the time horizon T , i.e., τ ≤ T −mj, where j = ia. We assume that the adversary22

cannot assign an attack pair to more than one attacker.23

We define a patrol as a walk P : T → Q on graph Q during the time horizon T . A pure24

strategy for the defender is to select a patrol P d for each patroller d ∈ D. If ia ∈ P d(Ia) for some25

d ∈ D, i.e., patroller d interrupts the attacker a, the attack will be unsuccessful. Otherwise, if26

ia /∈ P d(Ia) ∀d ∈ D the attacker a successfully damages node ia, the adversary gains a payoff27

6

of cj,τa+mj−1, where j = ia, and the defender loses a payoff of cj,τa+mj−1. The defender aims1

to minimize the expected total damage incurred from all attackers and the adversary wants to2

maximize it.3

The players play a zero-sum matrix game with the defender playing as the row player and

the set of all possible defense strategies constituting the rows of the matrix. The adversary plays

as the column player, with the set of all possible attack strategies constituting the columns of

the game matrix. We use K to denote the set of all possible defense strategies and k to index

them. Let xk be the probability of using defense strategy k in the defender’s mixed strategy.

Hence x = (x1, x2, . . . , x|K|) represents a mixed strategy of the defender, where |K| denotes the

cardinality of K, xk ≥ 0 ∀k ∈ K and∑k=|K|

k=1 xk = 1. Similarly, we use L to denote the set

of all possible attack strategies and index them by l. Let yl denote the probability of using

attack strategy l. Hence, a mixed strategy of the adversary is denoted as y = (y1, y2, . . . , y|L|),

yl ≥ 0 ∀l ∈ L, and∑l=|L|

l=1 yl = 1. The saddle point (Nash equilibrium) of the game is a point

(x∗,y∗) at which the following inequalities hold:

v(x∗,y) ≤ v(x∗,y∗) ≤ v(x,y∗),

where v(x,y) is the expected damage if the defender and the adversary use mixed strategies x4

and y, respectively.5

Although our model is a generalization of the model proposed by Alpern et al. (2011), some6

of their results are still valid for our model. We will use the following lemma that has been7

proved in Alpern et al. (2011) directly since the proof does not depend on the node values.8

Lemma 1. Suppose Q is connected, T ≥ 3 and mi ≥ 2,∀i. Then patrols that stay on any node9

for three consecutive periods are dominated.10

The game can be solved by generating all of the possible strategies for both players, however11

this may not be efficient for games of larger size. In the next section, we develop a solution12

approach to obtain a saddle-point equilibrium for this game.13

3 SOLUTION PROCEDURE14

In this section, a solution algorithm based on column and row generation (Riedel et al., 2012;15

7

Muter et al., 2013) is developed to obtain a saddle point for the patrolling game described in1

the previous section. The main challenge that may arise when developing a column and row2

generation algorithm is that the structure of the sub-problems may be destroyed due to the3

addition of new rows (Barnhart et al., 1998). However, in our case, since the new rows only4

affect the objective function coefficients, this difficulty does not arise. The solution method5

can also be described as a modification of the algorithm proposed by Godinho and Dias (2010,6

2013).7

Because this is a zero-sum game, a linear program (LP) can be developed to obtain a saddle-8

point equilibrium of this game. To formulate the LP for this game, we use the following binary9

parameters:10

� wkiτ =

1 if defense strategy k interrupts attack pair (i, τ),

0 Otherwise.

11

� zliτ =

1 if attack strategy l involves attack pair (i, τ),

0 Otherwise.

12

Using this notation, the following LP formulation can be developed to obtain a saddle-point

equilibrium for this game:

Minimize u

subject to u ≥∑k∈K

∑i,τ

ci,τ+mi−1zliτ (1− wkiτ )xk, ∀l ∈ L,

∑k∈K

xk = 1,

xk ≥ 0, ∀k ∈ K.

This problem is called the linear programming master (LPM). In this formulation, xk is a

decision variable representing the probability of using defense strategy k ∈ K in the defender’s

mixed strategy. In general, the sets K and L may be exponentially large; however, the number

of used strategies is expected to be much smaller. The proposed solution algorithm uses this

idea to start with a small subsets K′ ⊂ K and L′ ⊂ L of defense and attack strategies and

generates them as needed. In other words, we generate the defense strategies (columns) and

8

attack strategies (rows) on the fly. The starting subsets K′ and L′ could be any set of strategies.

Using the restricted set of strategies K′ and L′, we obtain the following LP:

Minimize u (1)

subject to u ≥∑k∈K′

∑i,τ

ci,τ+mi−1zliτ (1− wkiτ )xk, ∀l ∈ L′ (2)

∑k∈K′

xk = 1, (3)

xk ≥ 0, ∀k ∈ K′. (4)

This problem is called Restricted LPM (RLPM). The dual of RLPM is:

Maximize v (5)

subject to v ≤∑l∈L′

∑i,τ

ci,τ+mi−1zliτ (1− wkiτ )yl, ∀k ∈ K′, (6)

∑l∈L′

yl = 1, (7)

yl ≥ 0, ∀l ∈ L′. (8)

In this formulation, yl is the dual variable corresponding to constraint 2 in RLPM. This variable1

represents the probability of using attack strategy l in the adversary’s mixed strategy. Moreover,2

v is the dual variable corresponding to constraint 3 which represents the minimum expected3

damage. Next step is to find new strategies in K\K′ and L\L′ that could improve the current4

optimal solution for the corresponding players. Given the optimal dual solution yl of RLPM,5

the reduced cost of defense strategy k ∈ K\K′ is given by∑

l∈L′∑

i,τ ci,τ+mi−1zliτ (1−wkiτ )yl−v.6

Based on the concept of duality in linear programming, optimality of RLPM is equivalent7

to the feasibility of its dual. Therefore, defense strategies that violate the constraint v ≤8 ∑l∈L′

∑i,τ ci,τ+mi−1z

liτ (1 − wkiτ )yl can improve the current optimal solution. Thus, one should9

look for a defense strategy k with wkiτ such that: v >∑

l∈L′∑

i,τ ci,τ+mi−1zliτ (1 − wkiτ )yl. Note10

that yl’s are fixed, and the problem is to find a defense strategy k with wkiτ such that v >11 ∑l∈L′


liτ (1− wkiτ )yl. In other words, one looks for a new defense strategy k that12

leads to a smaller expected total damage,∑

l∈L′∑

i,τ ci,τ+mi−1zliτ (1 − wkiτ )yl, than the current13

expected total damage, v. To obtain an improving attack strategy for the adversary, consider14

9

RLPM. Given the optimal solution xk of RLPM, the current total expected damage is u. The1

total expected damage incurred by using attack strategy l is∑

k∈K′∑

i,τ ci,τ+mi−1zliτ (1−wkiτ )xk.2

Therefore, for fixed values of xk, one should find a new attack strategy l with zliτ such that3 ∑k∈K′


liτ (1 − wkiτ )xk > u. Subsection 3.1 develops mathematical programs to4

solve the defender’s sub-problem. Subsection 3.2 discusses the adversary’s sub-problem and5

subsection 3.3 describes the overall solution algorithm.6

3.1 Mathematical Formulations for the Defender’s Sub-problem7

In this section, we present two mathematical formulations to solve the defender’s sub-problem:8

A hop-type formulation and a flow-type formulation. We will compare the performance of these9

formulations numerically in section 4. Here is a list of binary variables used to formulate the10

defender’s sub-problem:11

� vdit =

1 if patroller d visits node i at time t,

0 Otherwise.

12

� wiτ =

1 if a patroller visits node i at time interval [τ, τ +mi − 1],

0 Otherwise.

13

Using this notation, the following hop-type formulation can be developed for the defender’s

sub-problem:

Maximize∑i,τ

ci,τ+mi−1wiτ∑l∈L′

ylzliτ (9)

subject to wiτ ≤∑d∈D

τ+mi−1∑t=τ

vdit, ∀i, τ, (10)

N∑i=1

vdit = 1, ∀t, d, (11)

vdit + vdj,t+1 ≤ 1, ∀i, j, t, d|i 6= j, (i, j) /∈ E , (12)

wiτ ∈ {0, 1}, ∀i, τ, (13)

vdit ∈ {0, 1}, ∀i, t, d. (14)

In this formulation, equation 9 represents the objective function which is minimizing the ex-14

10

pected damage. Note that, the expected damage is equal to∑

l∈L′∑

i,τ ci,τ+mi−1zliτ (1−wiτ )yl =1 ∑

l∈L′∑

i,τ ci,τ+mi−1zliτyl −

∑l∈L′


liτwiτyl where the first term is constant; hence,2

minimizing the expected damage is equivalent to maximizing∑

l∈L′∑

i,τ ci,τ+mi−1zliτwiτyl =3 ∑

i,τ ci,τ+mi−1wiτ∑

l∈L′ ylzliτ . The term

∑l∈L′ ylz

liτ in the objective function can be interpreted4

as the probability of using attack pair (i, τ) by the adversary. Equation 10 ensures that, if no5

patroller interrupts attack pair (i, τ), then wiτ is equal to zero. Equation 11 indicates that, for6

each patroller d, at each time t the patroller can be at exactly 1 node. Equation 12 ensures7

that, the patroller can not move from node i to node j if there is no edge between these nodes.8

Constraints 13 and 14 are the integrality constraint for variables wiτ and vdit.9

Note that, lemma 1 can be used to incorporate a new constraint to this formulation. Specifi-10

cally, the constraint vdi,t+vdi,t+1+v

di,t+2 ≤ 2, ∀i, t, d can be added to the formulation to eliminate11

the patrols that stay in the same node for three consecutive time periods. In the numerical12

experiments section, we will study the effect of this constraint on the overall performance of13

the algorithm.14

A flow-type mathematical formulation can also be developed to solve the defender’s sub-

problem. The sub-problem is formulated as follows:

Maximize∑i,τ

ci,τ+mi−1wiτ∑l∈L′

ylzliτ (15)

subject to wiτ ≤τ+mi−1∑t=τ

∑j

f tij, ∀i, τ, (16)

∑i,j

f 0ij = |D|, (17)

∑i

f tij =∑l

f t+1jl , ∀j, t, (18)

wiτ ,∈ {0, 1}, ∀i, τ, (19)

f tij,∈ Z+, ∀i, j, t. (20)

In this formulation, f tij is an integer variable that represents the flow of patrollers from node i to15

node j at time t. Equation 15 presents the objective function, which is identical to the objective16

function in equation 9. Constraint 16 ensures that, if no patroller interrupts (i, τ) attack pair,17

then wiτ is equal to zero. Constraint 17 indicates that, the initial flow of patrollers should be18

11

equal to the number of available patrollers, i.e. |D|. Constraint 18 is the flow conservation1

constraint. It ensures that, for each node j, the total incoming flow at time t is equal to2

the outgoing total flow at time t+ 1. Constraints 19 and 20 are the integrality constraints for3

variables wiτ and f tij, respectively. The flows obtained from this formulation can be transformed4

into patrols using the well-known flow decomposition algorithm (Orlin et al., 1993).5

Theorem 1. The defender’s sub-problem is NP-hard.6

Proof. See Appendix A7

Remark 1. Theorem 1 is valid even if ciτ = 1,∀i, τ and mis are equal to each other. In other8

words, even if we solve the model proposed by Alpern et al. (2011) using our proposed solution9

method, under a general graph, the defender’s sub-problem will still remain NP-hard.10

Even though Theorem 1 indicates that the defender’s sub-problems are hard to solve, the11

computational results show that, for problems with up to 30 nodes, the solution algorithm is12

able to find the Nash equilibrium by directly solving the sub-problem formulations.13

3.2 Mathematical Formulation for the Adversary’s Sub-problem14

The adversary’s sub-problem is formulated as follows:

Maximize∑i,τ

ci,τ+mi−1ziτ∑k∈K′

(1− wkiτ )xk (21)

subject to∑i,τ

ziτ ≤ |A|, (22)

ziτ ∈ {0, 1}, ∀i, τ. (23)

In this formulation, ziτ is a binary variable equal to 1 if an attacker is assigned attack pair (i, τ).15

Equation 21 presents the objective function, which is maximizing the expected damage. In this16

expression, the term∑

k∈K′(1−wkiτ )xk can be interpreted as the probability of not interrupting17

attack pair (i, τ). Constraint 22 indicates that, at most |A| attack pairs can be chosen to assign18

to attacker. Finally, constraint 23 is the integrality constraint for variable ziτ .19

The attacker’s sub-problem is a special case of 0-1 knapsack problem with unit item weights.20

Note that this problem can be solved in polynomial time by sorting the attack pairs (i, τ) in a21

non-increasing order of ci,τ+mi−1∑

k∈K′(1− wkiτ )xk and choosing the first |A| attack pairs.22

12

3.3 Overall Solution Procedure1

Algorithm 1 provides the pseudo-code for the overall solution procedure. The algorithm starts2

by randomly generating a set of initial strategies. Then, using this set of strategies, the RLPM3

is solved to obtain a solution x and a vector of dual values y. Dual values y are then used in4

the defender’s sub-problem to generate a new defense strategy. If a new defense strategy with5

a smaller expected damage is obtained, it is added to K′. Then the adversary’s sub-problem6

is solved to generate a new attack strategy. If a new attack strategy with a greater expected7

damage is obtained, it is added to L′. If, during the last two steps, either K′ or L′ has been8

updated, then the process is repeated; otherwise the procedure terminates. Because the number9

of possible strategies for both players is finite, the algorithm terminates after a finite number10

of iterations. Moreover, when the algorithm terminates, no player can improve the expected11

damage in their own favor by changing their strategies. Therefore, by definition, the algorithm12

returns a saddle-point upon termination.13

Algorithm 1: Pseudo-code for the overall solution algorithm

1 Initialize sets K′ and L′.2 Solve RLPM. Let x = [xk], y = [yl] and u be the optimal primal solution, dual solution

and objective function value, respectively.3 Solve the defender’s sub-problem using y as dual values and let w* = [w∗iτ ] denote the

optimal solution.4 if v >

∑l∈L′


liτ (1− w∗iτ )yl then

5 Add the new defense strategy w* to K′.6 end

7 Solve the attacker’s sub-problem using x as primal values and let z* = [z∗iτ ] be theoptimal solution.

8 if∑

k∈K′∑

i,τ ci,τ+mi−1z∗iτ (1− wkiτ )xk > v then

9 Add the new attack strategy z* to L′.10 end11 if K′ or L′ has been updated then12 Go to Line 2.13 else14 Return v as the value of the game.15 Terminate the procedure.

16 end

13

4 NUMERICAL EXPERIMENTS1

In this section, we perform computational experiments to investigate the efficiency of the pro-2

posed solution approaches and gain insight on some of its properties. The algorithms are coded3

in C++ and CPLEX 12.6 solver is used to solve the LPs and the defender’s sub-problems. A4

computer with 2.4 GH processor and 4 GB of RAM is used to run the numerical experiments.5

Our base set of test instances consists of randomly generated instances with underlying graph6

types including paths, cycles, grids and general planar graphs. To generate general planar7

graphs, the expected edge density (measured as |E||N |(|N |−1) , where we do not consider self-loop8

edges in calculating the edge density) of 15% is used, and the number of nodes, N , ranges from9

10 to 30. In generating general graphs, we first start with a random tree and then add random10

edges (such that the graph remains planar) until the edge density reaches 15%.11

In our first experiment, we compare the performances of different sub-problem formulations12

for the defender. Specifically, we consider three cases: the hop-type formulation of 9 to 1313

without the use of lemma 1 (HT), the hop-type formulation of 9 to 13 with the use of lemma14

1 (HTL) and the flow-type formulation of 15 to 20 (FT). We consider 45 instances for each15

problem size, with various values of T ∈ {5, 6, . . . , 9}, |D| ∈ {1, 2, 3} and |A| ∈ {1, 2, 3}. Tables16

1 and 2 show the average computation times observed for these three approaches for paths,17

cycles, general planar graphs and grids. In these tables, the best results are highlighted in bold.18

As seen in the tables, the flow-type formulation performs significantly better than the other19

formulations. Moreover, for most of the instances, the use of lemma 1 is not helpful and it20

leads to higher CPU times than when the lemma is not used. In the remaining experiments21

in this section, we will always use the FT formulation to solve the defender’s sub-problem;22

as it is the most efficient formulation out of the three possible ones. Figure 1 shows the23

convergence trajectory of the solution algorithm for the general planar graphs under various24

values of N . In this figure, the vertical axis denotes the relative percent deviation (RPD) from25

the expected damage in equilibrium. One can see that, expected damage values stabilize way26

before the algorithm terminates, implying that, after the expected damage values stabilize, we27

can terminate the algorithm without undermining the solution quality drastically.28

14

Table 1: Obtained CPU times (in seconds) for paths and cycles

Path Cycle

N HT HTL FT HT HTL FT

10 32.83 29.94 11.71 38.61 41.44 29.4415 120.03 119.11 25.91 301.74 332.30 35.9120 252.39 408.31 30.12 402.73 436.94 36.1125 435.88 464.59 46.97 412.95 427.83 65.9230 496.84 529.23 61.98 542.12 572.56 65.13

Table 2: Obtained CPU times (in seconds) for general graphs and grids

General Grid

N HT HTL FT HT HTL FT

10 18.64 18.84 8.22 22.37 23.83 10.3015 207.52 221.02 37.49 153.42 162.17 34.9820 404.96 431.14 100.41 396.91 463.22 41.7925 539.08 574.49 124.97 543.61 573.61 68.4130 700.48 742.31 341.36 690.21 714.65 93.46

-60

-40

-20

0

20

40

60

0 10 20 30 40 50 60RP

D

Iteration

N=10

N=15

N=20

N=25

N=30

Figure 1: Convergence of the solution algorithm

Next, we study the effect of the number of patrollers and attackers on the expected damage1

in equilibrium. An experiment is designed with general planar graphs, |D| ∈ {1, 2, . . . , 10} and2

|A| ∈ {1, 2, . . . , 5}. Figure 2 exhibits the effect of number of defenders, |D|, on the equilibrium3

expected damage for various number of attackers, |A|. Figure 2 demonstrates that, as the4

number of defenders increases, the expected damage decreases. Moreover, a diminishing returns5

effect is visible in the reduction in expected damage for each unit increment in |D|.6

15

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9 10

Exp

ecte

d D

amag

e

|D|

|A|=1

|A|=2

|A|=3

|A|=4

|A|=5

Figure 2: The effect of number of defenders on the expected damage in equilibrium

Next, we study the size of patrol portfolio for the case of |D| = |A| = 1. As mentioned1

earlier, the total number of defense strategies may be exponentially large; however, the number2

of defense strategies used in the saddle-point equilibrium is expected to be much smaller. In3

fact, when |D| = |A| = 1, the number of defense strategies used in the equilibrium with a4

positive probability is at most equal to the number of constraints in the LPM, which is limited5

from above by N × T + 1. Figure 3 displays the size of patrol portfolio as a percentage of6

N × T + 1 for different values of N and T . As seen in this figure, the actual size of the patrol7

portfolio is significantly smaller than N × T + 1. It is always less than 50 percent of N × T + 18

and can be as low as 10 percent. Moreover, as T increases, the percentage decreases. However,9

not much can be said about the effect of N on the patrol portfolio size as a percentage of10

N × T + 1.11

0

5

10

15

20

25

30

35

40

5 6 7 8 9

Per

cent

T

N=10

N=15

N=20

N=25

N=30

Figure 3: Effect of N and T on the size of patrol paortfolio

In our next experiment, we demonstrate that one of the results obtained in Alpern et al.12

16

(2011) is not valid for our more general model. Specifically, Alpern et al. (2011) prove that1

if all nodes have the same fixed value, then attacks on penultimate nodes are dominated. A2

penultimate node is defined as a non-leaf node adjacent to a leaf node. We show that, if the3

node values are different, then attacks on penultimate nodes may not be dominated. To this4

end, we use an instance of a game played on a line graph with T = 5, N = 8, |D| = |A| = 1 and5

mi = 3,∀i ∈ N . Node values are assumed to be fixed over the time horizon. Figure 4 shows6

the graph of this game with corresponding node values represented above each node. As seen7

in this figure, node 2 is a penultimate node with value c; other nodes all have a value of 1 unit.8

1

1

2

c

3

1

4

1

5

1

6

1

7

1

8

1

Figure 4: Patrolling game example on a line graph

We solve this game for different values of c under two cases: unconstrained case where the9

attacker is free to attack any node, and constrained case where the attacker cannot attack node10

2. Figure 5 shows the results for various values of c. As seen in this figure, the unconstrained11

attacker can cause more damage than the constrained attacker. Moreover, as the value of c12

increases, the difference between constrained and unconstrained case increases and the attacker13

has more incentive to attack node 2. In other words, by being able to attack node 2, the14

attacker can increase the damage. This means that attacking node 2 dominates not attacking15

node 2. Therefore, attacks on penultimate nodes may not be dominated.16

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 10 20 30 40 50

Exp

ecte

d D

amag

e

c

Unconstrained

Constrained

Figure 5: Comparison of constrained and unconstrained cases

In our final set of experiments, we study a real case of an urban rail network with 51 nodes17

17

used by Yolmeh and Baykal-Gursoy (2018). In this case, the nodes in the network represent the1

stations and the edges represent the connections among these stations. The rail network for this2

case consists of two main lines that are connected with a free interchange point between them.3

Node values represent the time-dependent occupancy levels in each station. We consider a 12-4

hour work-shift, starting from 5:00 AM and ending at 5:00 PM, for this case. For more details5

about this case, please refer to Yolmeh and Baykal-Gursoy (2018). We used our proposed6

solution approach to solve this problem for |A| = 3 and |D| = 10. For this instance of the7

problem, the algorithm terminates after 88 minutes. Figure 6 shows the obtained expected8

damage in the first 100 iterations of the solution algorithm for this case. As seen in this figure,9

after around 90 iterations, the expected damage value stabilizes and does not change drastically10

after this point. This observation is in line with our previous experiments.11

0

1000

2000

3000

4000

5000

6000

7000

0 20 40 60 80 100

Exp

ecte

d D

amag

e

Iteration

Figure 6: Convergence of the solution algorithm for the case study

Next, we study the distribution of patroller visits across different stations. Figure 7 shows12

the expected number of visit in a 1-month period for 5 most important stations for the case13

with 10 patrollers and 3 attackers. As seen in this figure, for most stations, the visits are almost14

equally distributed throughout the time horizon with a noticeable valley in the beginning hours15

and two slight peaks: one starts around 7:00 AM and ends around 9:00 AM, another one starts16

around 2:00 PM and ends around 4:00 PM.17

18

0

5

10

15

20

25

30

5:00

AM

6:00

AM

7:00

AM

8:00

AM

9:00

AM

10:00

AM

11:00

AM

12:00

PM

1:00

PM

2:00

PM

3:00

PM

4:00

PM

Exp

ecte

d N

um

ber

of

Vis

its

Time of Day

Station 41

Station 38

Station 6

Station 17

Station 24

Figure 7: Expected number of visits for 5 most visited stations

Next, we study the distribution of expected damage across most vulnerable stations. Figure1

8 shows the distribution of expected damage over the time horizon for 10 most vulnerable2

stations, for the case with 10 patrollers and 3 attackers. As seen in this figure, for most3

stations, the expected damage concentrates on two time intervals: one starts around 7:00 AM4

and ends around 9:00 AM, another one starts around 1:00 PM and ends around 4:00 PM.5

0

2

4

6

8

10

12

14

16

18

20

5:00

AM

6:00

AM

7:00

AM

8:00

AM

9:00

AM

10:00

AM

11:00

AM

12:00

PM

1:00

PM

2:00

PM

3:00

PM

4:00

PM

Exp

ecte

d D

amag

e

Time of Day

Station 34 Station 47 Station 17 Station 22 Station 6

Station 38 Station 27 Station 44 Station 13 Station 10

Figure 8: Distribution of expected damage for 10 stations with highest expected damage values

Next, we study the effect of the number of patrollers and attackers on the expected damage.6

Figure 9 shows the expected damage in equilibrium for different values of number of patrollers,7

|D|, and number of attackers, |A|. As seen in this figure, as the number of patrollers increases,8

the expected damage decreases. There is also a visible diminishing returns effect. Meaning9

that, as the number of patrollers increases, the reduction in expected damage by adding one10

more patroller, decreases.11

19

0

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6 7 8 9 10

Exp

ecte

d D

amag

e

|D|

|A|=1

|A|=2

|A|=3

|A|=4

|A|=5

Figure 9: The effect of number of patrollers

5 CONCLUSIONS AND FUTURE RESEARCH1

In this paper, we propose a patrolling game model on general graphs with time dependent node2

values and node dependent attack times. We also consider multiple patrollers and attackers in3

our model. We then develop an efficient algorithm to obtain a Nash equilibrium for this game.4

Our computational results show the efficiency of the proposed solution approach. Our results5

also show a diminishing returns effect when increasing the number of patrollers.6

This paper addresses a gap in the literature by considering a patrolling game model with7

time dependent node values, node based attack times, multiple patrols and multiple attackers.8

It also develops a solution algorithm to solve the problem under general graphs. However, there9

are still some limitations that need to be addressed in future research. Patrolling games under10

incomplete information, where the defender does not know about attacker’s valuation of the11

targets or the number of attackers is a possible avenue for future research in this area. Moreover,12

extending the model to accommodate uncertainty of parameters, such as attack times and node13

values, is another avenue for future research in this area.14

References15

Steve Alpern, Alec Morton, and Katerina Papadaki. Patrolling games. Operations Research,16

59(5):1246–1257, 2011.17

Cynthia Barnhart, Ellis L Johnson, George L Nemhauser, Martin WP Savelsbergh, and18

20

Pamela H Vance. Branch-and-price: Column generation for solving huge integer programs.1

Operations research, 46(3):316–329, 1998.2

Nicola Basilico, Nicola Gatti, and Francesco Amigoni. Patrolling security games: Definition3

and algorithms for solving large instances with single patroller and single intruder. Artificial4

Intelligence, 184:78–123, 2012.5

Nicola Basilico, Giuseppe De Nittis, and Nicola Gatti. Adversarial patrolling with spatially6

uncertain alarm signals. Artificial Intelligence, 246:220–257, 2017.7

Jan M Chaiken and Peter Dormont. A patrol car allocation model: Capabilities and algorithms.8

Management Science, 24(12):1291–1300, 1978.9

Kenneth Chelst. An algorithm for deploying a crime directed (tactical) patrol force. Manage-10

ment Science, 24(12):1314–1327, 1978.11

Drew Fudenberg and Jean Tirole. Game theory, 1991. Cambridge, Massachusetts, 393(12):80,12

1991.13

Michael R Gary and David S Johnson. Computers and intractability: A guide to the theory of14

np-completeness, 1979.15

Pedro Godinho and Joana Dias. A two-player competitive discrete location model with simul-16

taneous decisions. European Journal of Operational Research, 207(3):1419–1432, 2010.17

Pedro Godinho and Joana Dias. Two-player simultaneous location game: Preferential rights18

and overbidding. European Journal of Operational Research, 229(3):663–672, 2013.19

Meigu Guan. Graphic programming using odd and even points. Chinese Math., 1:237–277,20

1962.21

Dorit S Hochbaum, Cheng Lyu, and Fernando Ordonez. Security routing games with multive-22

hicle Chinese postman problem. Networks, 64(3):181–191, 2014.23

Manish Jain, Erim Kardes, Christopher Kiekintveld, Fernando Ordonez, and Milind Tambe.24

Security games with arbitrary schedules: A branch and price approach. In AAAI, 2010.25

21

Brian Michael Jenkins and Joseph Trella. Carnage interrupted: An analysis of fifteen terrorist1

plots against public surface transportation. Technical report, 2012.2

Dmytro Korzhyk, Vincent Conitzer, and Ronald Parr. Security games with multiple attacker3

resources. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence,4

volume 22, page 273, 2011.5

Felipe Lagos, Fernando Ordonez, and Martine Labbe. A branch and price algorithm for a6

Stackelberg security game. Computers & Industrial Engineering, 111:216–227, 2017.7

Richard C Larson. Urban police patrol analysis, volume 28. MIT Press Cambridge, MA, 1972.8

Kyle Y Lin, Michael P Atkinson, Timothy H Chung, and Kevin D Glazebrook. A graph patrol9

problem with random attack times. Operations Research, 61(3):694–710, 2013.10

Kyle Y Lin, Michael P Atkinson, and Kevin D Glazebrook. Optimal patrol to uncover threats11

in time when detection is imperfect. Naval Research Logistics (NRL), 61(8):557–576, 2014.12

J. Lou, A. M. Smith, and Y. Vorobeychik. Multidefender security games. IEEE Intelligent13

Systems, 32(1):50–60, Jan 2017. ISSN 1541-1672. doi: 10.1109/MIS.2017.11.14

Richard G McGrath and Kyle Y Lin. Robust patrol strategies against attacks at dispersed15

heterogeneous locations. International Journal of Operational Research, 30(3):340–359, 2017.16

John Moteff. Risk management and critical infrastructure protection: Assessing, integrating,17

and managing threats, vulnerabilities and consequences. DTIC Document, 2005.18

Ibrahim Muter, S Ilker Birbil, and Kerem Bulbul. Simultaneous column-and-row generation for19

large-scale linear programs with column-dependent-rows. Mathematical Programming, 14220

(1-2):47–82, 2013.21

David G Olson and Gordon P Wright. Models for allocating police preventive patrol effort.22

Journal of the Operational Research Society, 26(4):703–715, 1975.23

JB Orlin, RK Ahuja, and TL Magnanti. Network flows, theory, algorithms and applications.24

Prentice Hall, 1993.25

22

Katerina Papadaki, Steve Alpern, Thomas Lidbetter, and Alec Morton. Patrolling a border.1

Operations Research, 64(6):1256–1269, 2016.2

Sebastian Riedel, David Smith, and Andrew McCallum. Parse, price and cut: delayed column3

and row generation for graph based parsers. In Proceedings of the 2012 Joint Conference4

on Empirical Methods in Natural Language Processing and Computational Natural Language5

Learning, pages 732–743. Association for Computational Linguistics, 2012.6

Abdolmajid Yolmeh and Melike Baykal-Gursoy. Urban rail patrolling: a game theoretic ap-7

proach. Journal of Transportation Security, 11(1-2):23–40, 2018.8

A Proof of Theorem 19

Proof. Our proof is similar to the NP-hardness proof of the DET-STRAT problem (Theorem10

4.4) in Basilico et al. (2012). We first prove the result for the case with |D| = |A| = 1. Once11

the NP-hardness of this case is established, NP-hardness of the case with multiple patrollers12

and attackers follows due to the added complexity. To prove NP-hardness, we prove that the13

corresponding decision problem is NP-complete. Here is the corresponding decision problem:14

Given a patrolling game G = G(Q, T,m, c) and dual values [yiτ ] (note that we assumed there15

is only one attacker and yiτ is the probability of using attack pair (i, τ) by this single attacker)16

is there a patroll with [wiτ ] such that:17

N∑i=1

T−mi∑τ=0

yiτci,τ+mi−1wiτ ≥ L?

For future references, we call this problem DP. It is easy to see that DP is in NP. A non-18

deterministic machine could guess a patrol with wiτ and check if∑N

i=1

∑Tτ=0 yiτci,τ+mi−1wiτ ≥ L19

is true, in polynomial time. Its NP-completeness can be shown by reducing the Hamiltonian20

path (HP) problem to DP. HP is a well-known NP-complete problem (Gary and Johnson, 1979).21

It is the problem of determining if a Hamiltonian path, i.e., a path that visits each vertex22

exactly once, exists on a given graph. Following Basilico et al. (2012) let us consider a generic23

HP problem given by a graph Gh = (V h, Ah) where V h is the set of vertices and Ah is the set24

of edges. From this graph, an instance of the DP problem with G = G(Q, T,m, c), L and [yiτ ]25

23

can be constructed in polynomial time by setting Q = Gh, T = |V h|, mi = |V h|, yiτ = 1, ∀i, τ ,1

ciτ = 1,∀i, τ and L = |V h|. It is easy to see that a solution to DP with G = G(Q, T,m, c),2

L and [yiτ ], if exists, is a Hamiltonian path. To achieve L = |V h|, every node should be fully3

covered by the patrol i.e. all attack pairs (i, τ) should be caught by the patrol. Note that, for4

every node i, the only admissible attack start time is τ = 0. Therefore, there are |V h| attack5

pairs and to achieve L = |V h|, all of these attack pairs should be interrupted. To interrupt each6

attack pair (i, τ), it is enough to visit node i during time interval [0, T − 1]. Because the length7

of time horizon is equal to the number of nodes, the patrol should visit each node exactly once,8

because otherwise at least one node will not be covered. Therefore, computing a solution for9

DP with G = G(Q, T,m, c), L and [yiτ ] provides a solution for the HP problem with Gh. Thus10

the HP problem can be reduced to DP. This proves that DP is NP-complete.11

24

patrolling games on general graphs with time-dependent ...gursoy.rutgers.edu/papers/patrolling games...

Documents