迅速化

WordNet

travel at an excessive or illegal velocity; "I got a ticket for speeding"
distance travelled per unit time (同)velocity
changing location rapidly (同)speeding, hurrying
a rate (usually rapid) at which something happens; "the project advanced with gratifying speed" (同)swiftness, fastness

PrepTutorEJDIC

速力増加 / 生産能率の促進
〈U〉(運動・動作の)『速いこと』,速さ・〈C〉『速度』,速力(velocity) ・〈C〉(自動車などの)変速ギア・〈U〉〈C〉(写真で) / (フィルムの)感度,感光性 / レンズの明るさ・シャッター速度・覚醒(かくせい)剤,興奮剤 / ・『急ぐ』,疾走する・《通例現在分詞形で》制限速度以上で運転する,反則の速度で走る・〈仕事・車など〉‘の'速度を上げる

Wikipedia preview

出典(authority):フリー百科事典『ウィキペディア（Wikipedia）』「2014/06/10 21:41:55」(JST)

wiki en

[Wiki en表示]

Look up speedup in Wiktionary, the free dictionary.

In the field of computer architecture, speedup is a metric for relative performance improvement when executing a task. The notion of speedup was established by Amdahl's law, which was particularly focused in the context of parallel processing. However, speedup can be used more generally to show the effect of any performance enhancement.

Definition

Speedup can be defined for two different types of values: throughput and latency.^[1] Throughput will be given in the general form of completions per unit of time. In computer architecture the common throughput metric is instructions per cycle, denoted IPC. The reciprocal of this is cycles per instruction or CPI; this is a latency quantity because it is the length of time between successive completions or occurrences. Speedup is defined differently for each type so that it is a consistent metric. One of the most common measurements in computer architecture — the execution time of a program — can be seen as a latency quantity because it is in seconds per program.

For latency values, speedup is defined by the following formula:^[2]

where:

is the resultant speedup.
is the old execution time, i.e., without the improvement.
is the new execution time, i.e., with the improvement.

For throughput values, which are also called performance quantities, the enhanced performance will be in the numerator and the original performance will be in the denominator.^[3] Notice that speedup is a unit-less quantity (the units cancel). This is because it is a relative quantity, i.e., we are comparing two specific instances of execution. Speedup is only useful when the experimental data is run on the same system, just with the slight tweak for which the speedup test is being run.

Speedup in Parallel Contexts

When applied in the parallel case, speedup is Amdahl's Law.

Examples

Using Execution Times

We are testing the effectiveness of a branch predictor on the execution of a program. First, we execute the program with the standard branch predictor on the processor, which yields an execution time of 2.25 seconds. Next, we execute the program with our modified (and hopefully improved) branch predictor on the same processor, which produces an execution time of 1.50 seconds. Using our speedup formula, we know

Our new branch predictor has provided a 1.5x speedup over the original.

Using Cycles per Instruction

We have the same circumstance as above, but we are measuring CPI instead. First, we execute the program with the standard branch predictor, which yields a CPI of 2. Next, we execute the program with our modified branch predictor, which yields a CPI of 3. Using our speedup formula, we know

We achieve the same 1.5x speedup, though we measured different quantities.

Additional Details

Let be the speedup for processors. Linear speedup or ideal speedup is obtained when . When running an algorithm with linear speedup, doubling the number of processors doubles the speed. As this is ideal, it is considered very good scalability.

Efficiency is a performance metric defined as

.

It is a value, typically between zero and one, estimating how well-utilized the processors are in solving the problem, compared to how much effort is wasted in communication and synchronization. Algorithms with linear speedup and algorithms running on a single processor have an efficiency of 1, while many difficult-to-parallelize algorithms have efficiency such as ^{[citation needed]} that approaches zero as the number of processors increases.

In engineering contexts, efficiency is more often used for graphs than speedup, since

all of the area in the graph is useful (whereas in a speedup curve 1/2 of the space is wasted)
it is easy to see how well parallelization is working
there is no need to plot a "perfect speedup" line

In marketing contexts, speedup curves are more often used, largely because they go up and to the right and thus appear better to the less-informed.

Super linear speedup

Sometimes a speedup of more than p when using p processors is observed in parallel computing, which is called super linear speedup. Super linear speedup rarely happens and often confuses beginners, who believe the theoretical maximum speedup should be p when p processors are used.

One possible reason for a super linear speedup is the cache effect resulting from the different memory hierarchies of a modern computer: In parallel computing, not only do the numbers of processors change, but so does the size of accumulated caches from different processors. With the larger accumulated cache size, more or even all of the working set can fit into caches and the memory access time reduces dramatically, which causes the extra speedup in addition to that from the actual computation.^[4]

An analogous situation occurs when searching large datasets, such as the genomic data searched by BLAST implementations. There the accumulated RAM from each of the nodes in a cluster enables the dataset to move from disk into RAM thereby drastically reducing the time required by e.g. mpiBLAST to search it.

Super linear speedups can also occur when performing backtracking in parallel: An exception in one thread can cause several other threads to backtrack early, before they reach the exception themselves.^{[citation needed]}

References

^ Martin, Milo, Performance and Benchmarking, retrieved 5 June 2014
^ Hennessy, John L.; David A., Patterson (2012). Computer Architecture: A Quantitive Approach. Waltham, MA: Morgan Kaufmann. pp. 46–47. ISBN 978-0-12-383782-8 Check |isbn= value (help).
^ Baer, Jean-Loup (2010). Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors. New York: Cambridge University Press. p. 10. ISBN 978-0-521-76992-1.
^ John Benzi; M. Damodaran (2007). "Parallel Three Dimensional Direct Simulation Monte Carlo for Simulating Micro Flows". Parallel Computational Fluid Dynamics 2007: Implementations and Experiences on Large Scale and Grid Computing. Parallel Computational Fluid Dynamics. Springer. p. 95. Retrieved 2013-03-21. Cite uses deprecated parameters (help)

English Journal

Exploiting the locality of periodic subsystem density-functional theory: efficient sampling of the Brillouin zone.

Genova A1, Pavanello M.
Journal of physics. Condensed matter : an Institute of Physics journal.J Phys Condens Matter.2015 Dec 16;27(49):495501. doi: 10.1088/0953-8984/27/49/495501. Epub 2015 Nov 24.
In order to approximately satisfy the Bloch theorem, simulations of complex materials involving periodic systems are made [Formula: see text] times more complex by the need to sample the first Brillouin zone at [Formula: see text] points. By combining ideas from Kohn-Sham density-functional theory (
PMID 26596499

A Massively Parallel Computational Method of Reading Index Files for SOAPsnv.

Zhu X1, Peng S2, Liu S1, Cui Y1, Gu X1, Gao M1, Fang L3, Fang X3.
Interdisciplinary sciences, computational life sciences.Interdiscip Sci.2015 Dec;7(4):397-404. doi: 10.1007/s12539-015-0123-x. Epub 2015 Sep 7.
SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup
PMID 26343781

Accelerating Very Deep Convolutional Networks for Classification and Detection.

Zhang X, Zou J, He K, Sun J.
IEEE transactions on pattern analysis and machine intelligence.IEEE Trans Pattern Anal Mach Intell.2015 Nov 20. [Epub ahead of print]
This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs [1] that have substantially impacted the computer vision community. Unlike previous methods that are designed for approximating linear filters or linear responses, our method ta
PMID 26599615

Japanese Journal

フラグメント分子軌道法に現れるFock行列計算のGPGPU化

梅田宏明,塙敏博,庄司光男,朴泰祐,稲富雄一
情報処理学会論文誌. コンピューティングシステム 6(4), 26-37, 2013-10-30
OpenFMOプログラムのFock行列計算についてCUDAによるGPGPU化を行った.コストの高いアトミック加算についてはこれを回避するFock行列計算手法を提案し実装した.さらにスクリーニング過程の分離や動的負荷分散の実現,基底関数のソートなど多くの高速化の技法を実装することにより1CPUコアに対し13倍から22倍程度の性能を実現した.より高速なFock行列計算を目指し,MPIを利用した複数GP …
NAID 110009616691

GPUスパコンにおける1億個のスカラー粒子計算の強スケーリングと動的負荷分散

都築怜理,青木尊之,下川辺隆史
情報処理学会論文誌. コンピューティングシステム 6(3), 82-93, 2013-09-25
近接相互作用に基づく粒子法の大規模計算では,時間的に粒子の空間分布が大きく偏ることによるノード間の計算負荷の不均一や,ノード間の移動にともなうメモリの断片化が並列化実行性能を大きく低下させる.本論文では,メモリが階層的に分散するGPUスパコンにおいて,与えられた速度場に基づいて移動するパッシブ・スカラー粒子の計算を例題とし,分割した領域間の粒子数の不均一を定期的に解消する方法と,粒子の再整列により …
NAID 110009606662

GPUを用いた分枝限定法におけるメモリ参照効率を高めるための配列パッキング手法

重岡謙太朗,伊野文彦,萩原兼一
情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2013-HPC-141(2), 1-7, 2013-09-23
本稿では,GPU (Graphics Processing Unit) における分枝限定法の高速化を目的として,メモリ参照効率を高めるための配列パッキング手法を提案する.提案手法は,CPU の替わりに GPU が配列を詰め込むことにより,CPU・GPU 間のデータ転送を最小限に抑える.詰め込みにより配列を密に維持でき,分枝限定操作におけるメモリ参照効率を高める.GPU 上の詰め込みは,よく知られた …
NAID 110009606423

Related Pictures

SpeedUp Pad Pro 2 - Tablet Karya Anak Modem Tercepat Speedup | all about speedup INFEXI.COM (FREE SPEEDUP COVER 7.85 inch SpeedUp Pad Gold TA-735, Tablet Slim Harga SpeedUp Pad 8 – Tablet Lokal Terjangkau SpeedUp Pad Slim S5, Tablet ICS Murah Programmer's Notes: SpeedUp® Pad PRO | It SpeedUp Pad Ice, Tablet Android 4.0 Dari