ftune.html



Glossary of Terms Used in FORTRAN Tuning Co-Guide

address:  numerical location of code or data in (virtual) memory (not normally visible to Fortran programmer)

alias: a possible hidden dependency, as when 2 arrays overlap or may do so

alignment:  placing data or code at addresses which are even or higher powers of 2, to facilitate hardware performance

alpha: an architecture originated by Digital Equipment as the successor to VAX, production rights sold to Intel in 1997 in package deal settling patent infringement suit against Intel; future development presumably dependent on Compaq

branch: an instruction which may cause the order of execution of compiled instructions to depart from sequential

branch not taken: a conditional branch which does not cause the order of execution to depart from sequential; a common condition for forward branches (branches which skip ahead)

C9X: a future C standard, presented for public comment in 1998 Among other things, aims to make C more competitive with FORTRAN. Includes control of float rounding modes, more standard math library functions, ability to declare non-aliased data areas... In turn, some of the new features should show up in gnu and f2000.

cache: Most modern computers have a multiple level system of progressively larger and slower data and program storage. Often, there is Level 1 cache which may be on the CPU chip, perhaps 8K bytes each program instructionand data cache, and Level 2 cache of 256K to 4M bytes. The cache is setup to speed access to data located in main memory immediately following a previous access.

cache line: the block of data transferred into or out of cache

cache miss: when the data required by the program are not already in the cache, and must be copied from a slower place in the memory system

clean-up loop: supposing that a loop is unrolled so that it executes a fixed number of loop iterations in each pass, a clean-up loop is neededto take care of the remainder iterations

Cray, Seymour: the late founder of a once famous company, now ownedby SGI, which developed the vector super-computer market, and much of the FORTRAN compiler optimization and f90 implementation techniques; before that, involved at IBM in originating many modern floating point hardware implementation techniques

directive: a comment line which contains suggestions to the compiler e.g.
CDIR$ IVDEP (Cray style: ignore possible aliasing dependencies)
C*$* UNROLL(3) (Kuck style: unroll next loop by 3)

direct mapped cache: one where the location of data in cache is determined uniquely by their storage location in main memory; may be fast but productive of conflicts where cache lines try to use a small part of the cache and knock each other out of cache

dynamic: something which changes during the course of execution of a program, according to the data being analyzed or the subroutines currently activated

egcs (experimental gnu compiler system) a compiler suite including gcc and g77 development versions, with open participation in development

EPC (Edinburgh Portable Compilers) a company prominent in f90 compilers

extended precision: a floating point format which carries extra width both in mantissa and exponent, in order to protect against over/under-flowof intermediate results or loss of precision. IEEE double precision serves as extended single, although the minimum requirement is 42 bits, not 64.  Intel, and that endangered species, the 68k series, have an 80-bit double extended format.

f2c: a compiler/translator, derived from original Bell Labs f77, which translates f77 input to C output

f66: original standard (10 years late) for FORTRAN computer programming compiler

f77: 1978 ANSI/1980 ISO standard FORTRAN, superseded by f90

f90: short-lived standard, superseded by f95 while still immature

f95: standard issued at the end of 1997

fused MAC: a combined floating point multiply and add or subtract instruction, often implemented without intermediate rounding

gnu: a family of Unix-like software sponsored by Free Software Foundation

g77: an f77 compiler, originally part of gnu project of Free Software Foundation; also part of egcs development system; the most widely available FORTRAN

HP-PA: a series of computer architectures developed by HP HPUX: Hewlett-Packard version of Unix operating system

IEEE: Institute of Electrical and Electronics Engineers: among its many functions, a computer hardware standards making organization

Instruction Level Parallelism: the ability of a processor to be working on more than one thread of execution by overlapping instructions, typically by a combination of issuing multiple instructions per clock cycle, and instructions continuing to execute while subsequent instructions are issued, with hardware synchronization to stall dependent instructions until "resources" are available.   Degree of parallelism may be characterized in terms such as number of independent arithmetic units times latency divided by rate of instruction execution.

Intel: earliest and biggest player in microprocessor design and manufacture;a series of architectures designed in accordance with IEEE extended precision, a so-called Complex Instruction Set with small number of program-accessible registers and large number of specialized operations and immediate constants (contained in instruction)

Kuck,David: founder of a company known for selling optimizers and optimizing pre-processors for FORTRAN and C compilers

latency: number of clock cycles required for an operation, from when the operands are available, until the results are available for another operation

linux: a low priced family of somewhat Unix-like operating systems;often refers to the Intel-only version

Livermore: a USA government-sponsored laboratory (in Livermore CA) known for a benchmark which tests execution speed of various 'kernel" loops.

loop fission: splitting a loop into 2 or more loops

loop fusion: combining 2 or more loops

MIPS: a computer architecture owned by Silicon Graphics Inc

NT: a Microsoft operating system, oriented toward networks of single-user computers, somewhat portable but with abortive support on non-Intel architectures

optimization: result of analysis by a compiler to determine an efficient instruction sequence to implement a program

out-of-order execution: ability of a processor to move ahead and execute instructions even though some instructions are being stalled until dependencies are resolved

pipelining: the characteristic of being able to begin the execution of instructions before previous instructions have completed

PowerPC: an architecture sponsored by IBM, Motorola, and Apple

predication: the practice of calculating a result speculatively and keeping it in reserve until the program has determined whether to use or discard it

pre-fetch: the practice of initiating data transfer in anticipation of later requirements -- a very old scheme as applied to reading sequential data files, now used in filling cache

ratfor: a translator which converts ratfor, a semi-structured sort of cross between Fortran and C, into Fortran.

recursion: a sequence of operations which depend on results of predecessor sequences i.e. results of one loop iteration are needed in the next

register: fast temporary data storage location, directly accessible to the corresponding arithmetic processor

register remapping: ability of a processor to find another register to substitute for one specified by compiled code, to avoid stalling while that register is busy

set associative cache: a cache which has several possible locationsin cache for a given cache line from main memory, to reduce the frequency of cache mapping conflicts

shadow register: a register which is assigned dynamically to substitutefor one specified by a compiled program; a way for hardware to resolve dependencies arising from a program trying to use a register for more than one purpose

shift: a low-level integer instruction which moves the bits over a specified number of positions; under certain circumstances, equivalent to multiplication or division by a power of 2. In some architectures, there are combined shift and add instructions to assist in low-level expansion of multiplication by a constant.

SPARC: a computer architecture sponsored by Sun Microsystems

speculative execution: executing code before determining whether the result will be used

spill: when a program wants to use more registers than are available,it has to store some temporarily; particularly burdensome with out-of-order execution and register remapping

static: something which is a fixed feature of a program, not depending on what data set is executed

stride: size of increment in array subscripts, measured in storage units; incrementing the second subscript by one gives a stride of the size of the first dimension

struct: a translator which takes f66 input and makes ratfor output;part of Unix V7; may be used to restructure obsolescent style Fortran code

tlb (Translation Look-aside Buffer): largely mysterious to us FORTRAN programmers, a table where the operating system keeps track of which sections of main memory are mirrored in cache, and whether the main memory is out of date; a potential source of serious performance problems when program needs more blocks of data in cache than tlb is sized for

Unix: a family of portable operating systems, written mostly in C, originated at Bell Labs

unrolling: a technique employed by the programmer or the compiler, where multiple copies of a section of code are written out; in the simplest case, rather than a loop which executes 2 or 3 times, write out straight  line code

VAX: a computer architecture, now obsolescent, originated by Digital Equipment Corporation

vectorizable: a sequence of identical operations on strings of data,with no dependency of any sequence on its predecessor; terminology from the Cray 1 super-computer era; resembles the criteria for f90 array assignments.