GCOBOL(1) | 1 (gcc cobol compiler) | GCOBOL(1) |
gcobol
—
GCC COBOL Front-end
gcobol |
[-D name[= value]]
[-E ]
[-fdefaultbyte = value]
[-fsyntax-only ]
[-I copybook-path]
[-fmax-errors = nerror]
[-nomain | -main
filename |
-main = filename
-main = filename:program-id]
[-fcobol-exceptions
exception[,exception...]]
[-copyext ext]
[-ffixed-form |
-ffree-form ]
[-findicator-column ]
[-finternal-ebcdic ]
[-dialect dialect-name]
[-include filename]
[-preprocess
preprocess-filter]
[-fflex-debug ]
[-fyacc-debug ] filename
[...] |
gcobol
compiles COBOL source code to
object code, and optionally produces an executable binary or shared object.
As a GCC component, it accepts all options that affect code-generation and
linking. Options specific to COBOL are listed below.
-main
filenamegcobol
will generate a
main
()
function as an entry point calling the first PROGRAM-ID in
filename.
-main
is the default. When none of
-nomain
, -c
, or
-shared
, is present, an implicit
-main
is inserted into the command line ahead of
the first source file name.
-main
=
filenamemain
()
entry point calling the first PROGRAM-ID in
filename-main
=
filename:program-idmain
() entry point that calls the
program-id entry point-nomain
main
() entry point will be generated by this
compilation. The -nomain
option is incompatible
with -main
, and is implied by
-shared
. It is also implied by
-c
when there is no -main
present.
See below for examples showing the use of
-main
and -nomain.
-D
name[=
expr]-E
Unlike the C compiler, This option does not prevent compilation. To prevent compilation, use the option
-
fsyntax-only-fdefaultbyte
=
value-fsyntax-only
-copyext
ext-copyext
option
extends the names searched to include ext. If
ext is all uppercase or all lowercase, both forms
are tried, with preference given to the one supplied. If
ext is mixed-case, only that version is tried. For
example, with
-copyext
.abcgcobol
will add to possible names searched
‘name.abc
’ and
‘name.ABC
’ in that order.-ffixed-form
-ffree-form
*
’ at the start of a line acts as a
comment marker. Equivalent to -indicator-column
0.-findicator-column
Although
reference
format, strictly speaking, ignores data after column 72, with
this option gcobol
accepts long COBOL lines,
sometimes known as extended source format. Text past
column 72 is treated as ordinary COBOL text. (Line continuation remains
in effect, however, provided no text appears
past
column 72.)
There is no maximum line length. Regardless of source code format, the entire program could appear on one line.
By default, gcobol
auto-detects the
source code format by examining the line that contains the text
"program-id". When there are characters on past column 72 on
that line, the file is assumed to be in extended source
format, with the indicator area in column 7. Otherwise, columns 1-6
are examined. If those characters are all digits or blanks, the file is
assumed to be in fixed-form reference format, also
with the indicator in column 7. If not auto-detected as
fixed-form reference format or extended
source format, the file is assumed to be in free-form
reference format.
-fcobol-exceptions
exception [,exception...]The value of exception is a Level 1, 2,
or 3 exception condition name, as described by ISO/IEC 1989:2023.
‘EC-ALL
’ means enable all
exceptions.
The -fno-cobol-exceptions
form turns
off exception, just as though
Not all exception conditions are implemented. Any that are not produce a warning message.
-fmax-errors
=
nerrorgcobol
attempts to
recover from a syntax error by resuming compilation at the next statement,
continuing until end-of-file. With it, gcobol
counts the messages as they're produced, and stops when
nerror is reached.-fstatic-call
,
-fno-static-call
-fno-static-call
,
gcobol
never uses static linking for
-fstatic-call
, if
program is an alphanumeric literal,
gcobol
uses static linkage, meaning the compiler
produces an external symbol program for the linker
to resolve. (In the future, that will work with CONSTANT
data items, too.) With static linkage, if program is
not supplied by the source code module or another object file or library
at build time, the linker will produce an “unresolved
symbol” error. With -fno-static-call
,
gcobol
always uses dynamic linking.
This option affects the CALL statement for literals only. If program is a non-constant data item, it is always resolved using dynamic linking, with dlsym(3), because its value is determined at run time.
-dialect
dialect-namegcobol
accepts COBOL syntax as defined
by ISO/IEC 1989:2023, with some extensions for backward compatibility with
COBOL-85. To make the compiler more generally useful, some additional
syntax is supported by this option.
The value of dialect-name may be
Only a few such non-standard constructs are accepted, and
gcobol
makes no claim to emulate other
compilers. But to the extent that a feature is popular but nonstandard,
this option provides a way to support it, or add it.
-include
filenameIf multiple -include
options are
given, the files are included in the order they appear on the command
line.
-preprocess
preprocess-filterTo supply options to
preprocess-filter, use a comma-separated string,
similar to how linker options are supplied to
-
Wl.
(Do not put any spaces after the commas, because the shell will treat it
as an option separator.) gcobol
replaces each
comma with a space when preprocess-filter is
invoked. For example,
-preprocess
tee,output.cbl
gcobol
searches the current working
directory and the PATH environment variable directories for an
executable file whose name matches
preprocess-filter. The first one found is used. If
none is found, an error is reported and the compiler is not invoked.
The -preprocess
option may appear more
than once on the command line. Each
preprocess-filter is applied in turn, in order of
appearance.
The preprocess-filter should return a zero exit status, indicating success. If it returns a nonzero exit status, an error is reported and the compiler is not invoked.
-fflex-debug
,
-fyacc-debug
-fflex-debug
option prints the tokenized input
stream. The -fyacc-debug
option shows the shift
and reduce actions taken by the parser.These are equivalent. The xyz.cob code is
compiled and a
main
()
function is inserted that calls the first PROGRAM-ID in the
xyz.cob source file.
-nomain
option prevents a
main
()
function from being generated by the gcobol compiler. A
main
() entry point must be present in the
elsewhere.o module; without it the linker will report a
“missing main” error.
main
()
function that calls the first PROGRAM-ID in the aaa.cob
module.
-main
ccc.cobmain
()
entry point that calls the first PROGRAM-ID in bbb. The
fourth links the three .o files into an a.out.
The -finternal-ebcdic
option is useful
when working with mainframe COBOL programs intended for EBCDIC-encoded
files. With this option, while the COBOL text remains in ASCII, the
character literals and field initial values produce EBCDIC strings in the
compiled binary, and any character data read from a file are interpreted as
EBCDIC data. The file data are not
converted;
rather, the file is assumed to use EBCDIC representation. String literals in
the COBOL text
are
converted, so that they can be compared meaningfully with data in the
file.
Only file data and character literals are affected. Data read from and written to the environment, or taken from the command line, are interpreted according the locale(7) in force during execution. The same is true of ACCEPT and DISPLAY. Names known to the operating system, such as file names and the names of environment variables, are processed verbatim.
At the present time, this is an all-or-nothing setting. Support for USAGE and CODESET, which would allow conversion between encodings, remains a future goal.
See also Feature-set Variables, below.
Per ISO, an item that REDEFINES another may not
be larger than the item it redefines, unless that item has LEVEL 01 and is
not EXTERNAL. In gcobol
, using
-dialect
ibm, this rule is
relaxed for REDEFINES with USAGE POINTER
whose redefined member is a 4-byte USAGE COMP-5 (usually
PIC S9(8)),
or vice-versa. In that case, the redefined member is re-sized to be 8 bytes,
to accommodate the pointer. This feature allows pointer arithmetic on a
64-bit system with source code targeted at a 32-bit system.
See also Feature-set Variables, below.
gcobol
is a gcc compiler, and follows gcc
conventions where applicable. Sometimes those conventions (and user
expectations) conflict with common Mainframe practice. Unless required of
the compiler by the ISO specification, any such conflicts are resolved in
favor of gcc.
Unlike, C, the COBOL CALL statement implies dynamic linking, because for
gcobol
supports static linking where
possible, unless defeated by -fno-static-call
. If
the parameter value is known at compile time, the compiler produces an
external reference to be resolved by the linker. The referenced program is
normally supplied via an object module, a static library, or a shared
object. If it is not supplied, the linker will report an “unresolved
symbol” error, either at build time or, if using a shared object,
when the program is executed. This feature informs the programmer of the
error at the earliest opportunity.
Programs that are expected to execute correctly in the presence of
an unresolved symbol (perhaps because the program logic won't require that
particular CALL) can use the
-no-static-call
option. That forces all
CALL statements to be resolved dynamically, at
runtime.
By default, per ISO, no EC is enabled. Implemented ECs may be enabled on the command line or via the TURN directive. Any attempt to enable an EC that is not implemented is treated as an error.
An enabled EC not handled by a DECLARATIVE is written to the system log and to standard error. (The authors intend to make that an option.) A fatal EC not handled with RESUME ends with a call to abort(3) and process termination.
Not all Exception Conditions are implemented. Any attempt to enable an EC that that is not implemented produces a warning message. The following are implemented:
As of this writing, no COBOL compiler documents a complete
implementation of ISO/IEC 1989:2023 Exception Conditions.
gcobol
will give priority to those ECs that the user
community deems most valuable.
Standard COBOL has no provision for environment variables as
defined by Unix and Windows, or command-line arguments.
gcobol
supports them using syntax similar to that of
GnuCOBOL. ISO and IBM also define incompatible ways to return the program's
exit status to the operating system. gcobol
supports
IBM syntax.
To read an environment variable:
FROM ENVIRONMENT
envarwhere target is a data item defined in DATA DIVISION, and envar names an environment variable. envar may be a string literal or alphanumeric data item whose value is the name of an environment variable. The value of the named environment variable is moved to target. The rules are the same as for MOVE.
To write an environment variable:
TO
sourcewhere source is a data item defined in DATA DIVISION, and envar names an environment variable. envar again may be a string literal or alphanumeric data item whose value is the name of an environment variable. The value of the named environment variable is set to the value of source.
To read command-line arguments, use the registers COMMAND-LINE and COMMAND-LINE-COUNT in an ACCEPT statement (only). Used without a subscript, COMMAND-LINE returns the whole command line as a single string. With a subscript, COMMAND-LINE is a table of command-line arguments. For example, if the program is invoked as
-i
input
outputthen
moves input into target. The program name is the first thing in the whole command line and is found in COMMAND-LINE(1) COMMAND-LINE table.
To discover how many arguments were provided on the command line, use
FROM COMMAND-LINE-COUNT
If ACCEPT refers to a nonexistent environment variable or command-line argument, the target is set to LOW-VALUES.
The system command line parameters can also be accessed through the LINKAGE SECTION in the program where execution starts. The data structure looks like this:
linkage section. 01 argc pic 999. 01 argv. 02 argv-table occurs 1 to 100 times depending on argc. 03 argv-element pointer. 01 argv-string pic x(100) .
procedure division using by value argc by reference argv. set address of argv-string to argv-element(3) display argv-string
The parser accepts lines in the form
In the phrase
filename may appear in quotes or not. If quoted, it represents a filename as known to the operating system. If unquoted, it names either a data element or an environment variable containing the name of a file. If filename matches the name of a data element, that element is used. If not, resolution of filename is deferred until runtime, when the name must appear in the program's environment.
gcobol
supports the following
USAGE IS clauses:
gcobol
supports ISO
integer
BINARY-<type>
types, most of which alias
COMP-5.
COMP-5 | Compatible | ||
Picture | BINARY Type | Bytes | Value |
BINARY-CHAR [UNSIGNED] | 1 | 0 — 256 | |
S9(1...4) | BINARY-CHAR SIGNED | 1 | -128 — +127 |
9(1...4) | BINARY-SHORT [UNSIGNED] | 2 | 0 — 65535 |
S9(1...4) | BINARY-SHORT SIGNED | 2 | -32768 — +32767 |
9(5...9) | BINARY-LONG [UNSIGNED] | 4 | 0 — 4,294,967,295 |
S9(5...9) | BINARY-LONG SIGNED | 4 | -2,147,483,648 — +2,147,483,647 |
9(10...18) | BINARY-LONG-LONG [UNSIGNED] | 8 | 0 — 18,446,744,073,709,551,615 |
S9(10...18) | BINARY-LONG-LONG SIGNED | 8 | -9,223,372,036,854,775,808 — +9,223,372,036,854,775,807 |
These define a size (in bytes) and cannot be used with a PICTURE clause. Per the ISO standard, SIGNED is the default for the BINARY-type aliases.
All computation — both integer and floating point — is done using 128-bit intermediate forms.
In gcobol
DISPLAY UPON
maps SYSOUT and STDOUT to standard output, and SYSPUNCH, SYSPCH and STDERR to standard error.
gcobol
supports the ISO syntax for
returning an exit status to the operating system,
In addition, gcobol
also supports the IBM
syntax for returning an exit status to the operating system. Use the
RETURN-CODE register:
MOVE ZERO TO RETURN-CODE. GOBACK.
The RETURN-CODE register is defined as a 4-byte binary integer.
The CDF should be used with caution because no comprehensive test suite has been identified.
gcobol
looks first for an environment variable
named copybook and, if found, uses the contents of
that variable as the name of the copybook file. If that file does not
exist, it continues looking for a file named one of:
in that order. It looks first in the same directory as the
source code file, and then in any copybook-path
named with the -I
option.
copybook-path may (like the shell's
PATH
variable) be a colon-separated list. The
-I
option may occur multiple times on the
command line. Each successive copybook-path is
concatenated to previous ones. Relative paths (having no leading
‘/
’) are searched relative to the
compiler's current working directory.
For example,
-I
/usr/local/include:include
gcobol
was invoked.
For the REPLACING phrase, both the modern pseudo-text and the COBOL/85 forms are recognized. (The older forms are used in the NIST CCVS/85 test suite.)
gcobol
supports the full ISO
REPLACE syntax.gcobol
.>>
END-IF-D
command-line parameter. If true, the COBOL text
text is compiled. If false,
else-text, if present, is compiled.
[IS
[NOT]]
DEFINED
is supported. Boolean literals are not supported....
] ...]
CHECKING
{[ON]
[[WITH] LOCATION] |
OFF}gcobol
reports at runtime the source filename and
line number of the statement that triggered the exception condition.Some command-line options affect CDF
feature-set
variables that are special to gcobol
. They can be
set and tested using
>>DEFINE
and >>IF, and are distinguished by a leading
‘%
’ in the name, which is otherwise
invalid in a COBOL identifier:
-finternal-ebcdic
.-dialect ibm
.To set a feature-set variable, use
>>SET
feature [AS] {ON
|
OFF}
To test a feature-set variable, use
>>IF
feature DEFINED
gcobol
implements all intrinsic functions
defined by ISO/IEC 1989:2023, plus a few others. They are listed
alphabetically below.
How the DISPLAY presents binary floating point numbers depends on the value.
When a value has six or fewer decimal digits to the left of the decimal point, it is expressed as 123456.789....
When a value is less than 1 and has no more than three zeroes to the right of the decimal point, it is expressed as 0.0001234....
Otherwise, exponential notation is used: 1.23456E+7.
In all cases, trailing zeroes on the right of the number are removed from the displayed value.
Those digit counts are consistent with the IEEE 754 requirements for information interchange. As one example, the description for COMP-2 binary64 values (per Wikipedia).
If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.
17 digits was chosen so that the DISPLAY statement shows the contents of a COMP-2 variable without hiding any information.
During a MOVE statement, a floating-point value may be truncated. It will not be unusual for Numeric Display values to be altered when moved through a floating-point value.
This program:
01 PICV999 PIC 9999V999. 01 COMP2 COMP-2. PROCEDURE DIVISION. MOVE 1.001 to PICV999 MOVE PICV999 TO COMP2 DISPLAY "The result of MOVE " PICV999 " TO COMP2 is " COMP2 MOVE COMP2 to PICV999 DISPLAY "The result of MOVE COMP2 TO PICV999 is " PICV999
generates this result:
The result of MOVE 0001.001 TO COMP2 is 1.00099999999999989 The result of MOVE COMP2 TO PICV999 is 0001.000
However, the internal implementation can produce results that might be seem surprising:
The result of MOVE 0055.110 TO COMP2 is 55.1099999999999994 The result of MOVE COMP2 TO PICV999 is 0055.110
The source of this inconsistency is the way
gcobol
stores and converts numbers. Converting the
floating-point value to the numeric display value 0055110 is done by
multiplying 55.109999... by 1,000 and then truncating the result to an
integer. And it turns out that even though 55.11 can't be represented in
floating-point as an exact value, the product of the multiplication, 55110,
is an exact value.
In cases where it is important for conversions to have predictable results, we need to be able to apply rounding, which can be done with an arithmetic statement:
MOVE 1.001 to PICV999 MOVE PICV999 TO COMP2 DISPLAY "The result of MOVE " PICV999 " TO COMP2 is " COMP2 MOVE COMP2 to PICV999 DISPLAY "The result of MOVE COMP2 TO PICV999 is " PICV999 ADD COMP2 to ZERO GIVING PICV999 ROUNDED DISPLAY "The result of ADD COMP2 to ZERO GIVING PICV999 ROUNDED is " PICV999 The result of MOVE 0001.001 TO COMP2 is 1.00099999999999989 The result of MOVE COMP2 TO PICV999 is 0001.000 The result of ADD COMP2 to ZERO GIVING PICV999 ROUNDED is 0001.001
gcobol
attempts to do internal
computations using binary integers when possible. Thus, simple arithmetic
between binary values and numeric display values conclude with binary
intermediate results.
If a floating-point value gets included in the mix of variables specified for a calculation, then the intermediate result becomes a 128-bit floating-point value.
The cardinal rule when doing comparisons involving floating-point values: Never, ever, test for equality. It's just not worth the hassle.
For example:
WORKING-STORAGE SECTION. 01 COMP1 COMP-1 VALUE 555.11. 01 COMP2 COMP-2 VALUE 555.11. PROCEDURE DIVISION. DISPLAY "COMPARE " COMP1 " with " COMP2 IF COMP1 EQUAL COMP2 DISPLAY "Equal" ELSE DISPLAY "Not equal" END-IF MOVE COMP1 to COMP2 DISPLAY "COMPARE " COMP1 " with " COMP2 IF COMP1 EQUAL COMP2 DISPLAY "Equal" ELSE DISPLAY "Not equal" END-IF
the results:
COMPARE 555.1099854 with 555.110000000000014 Not equal COMPARE 555.1099854 with 555.1099853515625 Equal
Why? Again, it has to do with the internals of
gcobol
. When differently sized floating-point values
need to be compared, they are first converted to 128-bit floats. And it
turns out that when a COMP1 is moved to a COMP2, and they are both converted
to FLOAT-EXTENDED, the two resulting values are (probably) equal.
Avoid testing for equality unless you really know what you are doing and you really test the code. And then avoid it anyway.
Finally, it is observably the case that the
gcobol
implementations of floating-point conversions
and comparisons don't precisely match the behavior of other COBOL
compilers.
You have been warned.
COBPATH
gcobol
runtime library,
libgcobol.so, to locate shared objects. Like
LD_LIBRARY_PATH
, it may contain several directory
names separated by a colon (‘:
’).
COBPATH
is searched first, followed by
LD_LIBRARY_PATH
.
Each directory is searched for files whose name ends in
‘.so
’. For each such file,
dlopen(3) is attempted, and, if successful
dlsym(3). No relationship is defined between the
symbol's name and the filename.
Without COBPATH
, binaries produced by
gcobol
behave as one might expect of any program
compiled with gcc. Any shared objects needed by the program are
mentioned on the command line with a
-l
library option, and are
found by following the executable's RPATH or
otherwise per the configuration of the runtime linker,
ld.so(8).
UPSI
gcobol
, the settings are denoted
UPSI-0 through
UPSI-7,
where 0-7 indicates a bit position. The value of the UPSI switches is
taken from the UPSI
environment variable, whose
value is a string of up to eight 1's and 0's. The first character
represents the value of UPSI-0, and missing values are
assigned 0. For example,
UPSI=1000011
in the environment sets bits 0, 5, and 6 on, which means that
UPSI-0,
UPSI-5,
and
UPSI-6
are on.GCOBOL_TEMPDIR
GCOBOL_TEMPDIR
. If the value is just
“/”, the effect is different: each copybook read is reported
on standard error. This feature is meant to help diagnose mysterious
copybook errors.Executables produced by gcobol
require the
runtime support library libgcobol, which is provided
both as a static library and as a shared object.
The ISO standard leaves the default file organization up to the
implementation; in gcobol
, the default is
SEQUENTIAL.
Any ability to use files produced by other COBOL compilers, or for
those compilers to use files produced by gcobol
, is
the product of luck and intuition. Various compilers interpret the ISO
standard differently, and the standard's text is not always definitive.
For
ORGANIZATION IS LINE
SEQUENTIAL files (explicitly or by default),
gcobol
, absent specific direction, produces an
ordinary Linux text file: for each WRITE, the data are written, followed by
an ASCII NL (hex 0A) character. On READ, the record is read up to the size
of the specified record or NL, whichever comes first. The NL is not included
in the data brought into the record buffer; it serves only as an on-disk
record-termination marker. Consequently, SEQUENTIAL and
LINE SEQUENTIAL files work the same way: the COBOL program
never sees the record terminator.
When
READ and
WRITE are used with ADVANCING, however,
the game changes. If ADVANCING is used with
LINE SEQUENTIAL files, it is honored by
gcobol
.
Other compilers may not do likewise. According to ISO, in WRITE (14.9.47.3 General rules) ADVANCING is ignored for files for which “the physical file does not support vertical positioning”. It further states that, in the absence of ADVANCING, WRITE proceeds as if “as if the user has specified AFTER ADVANCING 1 LINE”. Some other implementations interpret that to mean that the first WRITE to a LINE SEQUENTIAL file results in a leading NL on the first line, and no trailing NL on the last line. Some furthermore prohibit the use of ADVANCING with LINE SEQUENTIAL files.
The reference standard for gcobol
is
ISO/IEC 1989:2023.
gcobol
compiles code consistent with that
standard, the resulting program should execute correctly; any other result
is a bug.gcobol
compiles code that does not comply with
that standard, but runs correctly according to some other specification,
that represents a non-standard extension. One day, the
-pedantic
option will produce diagnostic messages
for such code.gcobol
rejects code consistent with that
standard, that represents an aspect of COBOL that is (or is not) on the To
Do list. If you would like to see it compile, please get in touch with the
developers.Where gcobol
passes 100% of the tests in a
module, we exclude the (few) tests for obsolete features. The authors regard
features that were obsolete in 1985 to be well and truly obsolete today, and
did not implement them.
CCVS-85 modules not marked with above with any status (CM, and SG) are on the “hard maybe” list, meaning they await an interested party with real code using the feature.
gcobol
does not implement Report Writer or
Screen Section.
gcobol
increasingly implements ISO/IEC
1989:2023. For example,
DECLARATIVES
is not tested by CCVS-85, but are implemented by
gcobol
. Similarly, Exception Conditions were not
defined in 1985, and gcobol
contains a growing
number of them.
The authors are well aware that a complete, pure COBOL-85 compiler won't compile most existing COBOL code. Every vendor offered (and offers) extensions, and most environments rely on a variety of preprocessors and ancillary systems defined outside the standard. The express goal of adding an ISO COBOL front-end to GCC is to establish a foundation on which any needed extensions can be built.
COBOL, the language, may well be older than the reader. To the
author's knowledge, free COBOL compilers first began to appear in 2000.
Around that time an earlier COBOL for GCC project
cobolforgcc met
with some success, but was never officially merged into GCC.
This compiler, gcobol
, was begun by
COBOLworx in the fall of
2021. The project announced a complete implementation of the core language
features in December 2022.
gcobol
has been tested only on x64 and Apple M1
processors running Linux in 64-bit mode.gcobol
are not compatible with that of any
other COBOL compiler. Enhancements to the I/O support will be readily
available to the paying customer.February 2025 | Linux |