Cristin-resultat-ID: 915144
Sist endret: 29. november 2013, 09:03
NVI-rapporteringsår: 2012
Resultat
Vitenskapelig artikkel
2013

Efficient local search on the GPU-Investigations on the vehicle routing problem

Bidragsytere:
  • Christian Ferdinand Schulz

Tidsskrift

Journal of Parallel and Distributed Computing
ISSN 0743-7315
e-ISSN 1096-0848
NVI-nivå 2

Om resultatet

Vitenskapelig artikkel
Publiseringsår: 2013
Publisert online: 2012
Trykket: 2013
Volum: 73
Hefte: 1
Sider: 14 - 31

Importkilder

Scopus-ID: 2-s2.0-84869496633
Isi-ID: 000311921300003

Klassifisering

Vitenskapsdisipliner

Informasjons- og kommunikasjonsvitenskap

Emneord

Optimering • GPGPU

Beskrivelse Beskrivelse

Tittel

Efficient local search on the GPU-Investigations on the vehicle routing problem

Sammendrag

We study how to implement local search efficiently on data parallel accelerators such as Graphics Processing Units. The Distance-constrained Capacitated Vehicle Routing Problem, a computationally very hard discrete optimization problem with high industrial relevance, is the selected vehicle for our investigations. More precisely, we investigate local search with the Best Improving strategy for the 2-opt and 3-opt operators on a giant tour representation. Resource extension functions are used for constant time move evaluation. Using CUDA, a basic implementation called The Benchmark Version has been developed and deployed on a Fermi architecture Graphics Processing Unit. Both neighborhood setup and evaluation are performed entirely on the device. The Benchmark Version is the initial step of an incremental improvement process where a number of important implementation aspects have been systematically studied. Ten well-known test instances from the literature are used in computational experiments, and profiling tools are used to identify bottlenecks. In the final version, the device is fully saturated, given a large enough problem instance. A speedup of almost an order of magnitude relative to The Benchmark Version is observed. We conclude that, with some effort, local search may be implemented very efficiently on Graphics Processing Units. Our experiments show that a maximum efficiency, however, requires a neighborhood cardinality of at least one million. Full exploration of a billion neighbors takes a few seconds and may be deemed too expensive with the current technology. Reduced neighborhoods through filtering is an obvious remedy. Experiments on simple models of neighborhood filtering indicate, however, that the speedup effect is limited on data parallel accelerators. We believe these insights are valuable in the design of new metaheuristics that fully utilize modern, heterogeneous processors.

Bidragsytere

Christian Ferdinand Schulz

  • Tilknyttet:
    Forfatter
    ved Mathematics and Cybernetics ved SINTEF AS
1 - 1 av 1