analyses
The analyses argument is the name of the file specifying the analyses to
perform. See section ANALYSES for details.
Simrisc was originally designed around 2010 by Marcel Greuter at the University Medical Center Groningen, and thereafter modified in 2015 by Chris de Jonge.
Short options are provided between parentheses, immediately following
their long option equivalents. Several parameters specify the path-names of
files produced by simrisc. If a path-name starts with a tilde character (~) then
the tilde is replaced by the user's home directory. An initial + is replaced
by the program's base directory (see option base). When an analysis uses
multiple iterations then `$' characters in filename specifications are
replaced by the analysis' interation index.
All single-letter options referring to filesystem entries (directories, filenames) are capitalized, all other single-letter options are lowercase.
basedir (-B)./. If basedir doesn't exist it is created by the program. If
the directory cannot be created and exception is thrown, terminating
the program. The basedir specifications may specify relative or
absolute directory locations;
type (-c)type) can be specified as breast to
perform breast cancer simulations. Breast cancer simulations are
performed by default when the --cancer option is not
specified. Alternatively, to perform lung cancer simulations type
must be specified as either male or female to perform
simulations for, respectively, male or female cases.
Be advised that the default configuration file specifies Screening
Mammo rounds, which must either be changed to CT in locally used
configuration files or in Analysis: sections (see section
ANALYSES below);
path (-C)~/.config/simrisc' is used;
path (-D)! (mnemonic: the logical not operator, i.e.,
--data !). See section OUTPUT for a description of the
generated data;
age (-a)tumor-age, and is mutually
exclusive with the case option;
nCases (-l)nCases cases have been analyzed and only
write the data for the final case to the data file. The rounds and
sensitivity files contain the summarized results of all nCases
analyzed cases;
begin end fname [set]begin thru end on the file
fname. This includes generated random values. Random values
generated in several contexts can be suppressed. Specify
Note that begin end defines an inclusive range. To log the process
for a single case (e.g. case 100) ignoring the L, U, and V generated
values specify --log "100 100 fname LUV". Also note that only
a single argument ispassed to --log. Therefore its argument
must be surrounded by quotes;
label
specifications. See section ANALYSES for details;
path (-P)--base (-B) option was specified then path is written in the
base directory if path does not contain a slash (/) (use
./path to write the parameters file in the current directory if
--base was specified);
path (-R)! (i.e., --rounds !). See section
OUTPUT for a description of the generated summary info;
path (-s)spread: true is specified (default:
'<base>/spread-$.txt'). If this file should not be written specify
! (mnemonic: the logical not operator, i.e., --spread !). If
a parameter doesn't use spreading then the 'using' part is
omitted. See section OUTPUT for a sample of its content;
path (-S)! (i.e.,
--sensitivity !). See section OUTPUT for a description of the
produced sensitivity summary;
TNM showing the (0-based)
TNM categories for cases having developed tumors (see also the
simiscparams(7) man-page). By default this column remains empty
for cases not having developed tumors, which may be inconvenient when
processing the data (e.g., to perform statistical analyses). When
specifying the --tnm option cases not having developed tumors
receive TNM column entries -1,0 to avoid missing data;
age (-t)death-age, and is mutually
exclusive with the case option;
Unless the --one-analysis option is used the program's first and only
required argument is the name of a file providing the details of the analyses
to perform. These files are called analysis files. These files must be a
standard ascii text files. I.e., they can only contain 7-bit ascii printable
and white-space characters. Identifiers used in analysis files and in
configuration files are interpreted case sensitively.
Configuration specifications starting with uppercase letters (like
Scenario: and Costs:) specify (sub)sections and don't contain
additional specifications. Specifications starting with lowercase letters
(like ageGroup:) are followed by actual parameter values. For a complete
overview refer to the simriscparams(7) man-page.
Analysis files may define multiple analyses. Each analysis specification must begin with a line containing
Analysis:
At each Analysis: specification the program's initial configuration is
reset.
Options specified on the command-line cannot be specified in
Analysis: sections and remain active while simrisc is running. The default
option values are reset at each separate Analysis: unless an option has
been specified on the command-line, in which case those option values are used
throughout the simrisc run.
Following Analysis: lines the characteristics of the analysis are
specified which can be specified for each Analysis: specification, in the
following order:
label: lines, when used, must immediately
follow Analysis: lines. The text following label: is
written at the top of the output files;
base: /tmp/
last-case: 20
All specifications in Analysis: sections are optional. An
Analysis: section merely containing the line Analysis: defines
an analysis using the explicitly specified command-line options or the default
program options and using the parameter specifications provided in the
configuration file.
Empty lines, initial and trailing white-space, and all characters on lines
starting at the hash-mark (#) are ignored and may be used anywhere in
analysis files.
Lines not conforming to the above description result in error messages, causing simrisc to end.
Filename specified in Analysis: sections may start with a tilde character
(~) which is replaced by the user's home directory, or they may start with an
initial + character, which is replaced by the program's base directory (see
option base). When an analysis performs multiple iterations then `$'
characters in filename specifications are replaced by the analysis' interation
index.
Multiple analysis sections should not specify identically named output files, as the output files are (re)written for each separate analysis.
Analysis sections are commonly used to alter the default specifications of the configuration file. E.g., the default number of iterations equals 1. By specifying
Scenario:
iterations: 3
the analysis performs 3 iterations.
Parameters are either read from the configuration file or they are redefined
in Analysis: sections. E.g., in de provided configuration file screening
rounds use two-year intervals between the ages of 50 and 74. To use screening
rounds using 5-year intervals, between ages 50 and 65, then an
Analysis: specification could be, e.g.,
Screening:
round: 50 Mammo MRI
round: 55 Mammo MRI
round: 60 Mammo MRI
round: 65 Mammo MRI
When the --one-analysis option is used parameters are modified by
providing comma-separated parameter specifications as program command-line
arguments. E.g., to perform one analysis, writing the data file to
/tmp/data, simulating 1000 cases, and using 20 as seed for the random
number generator the command
simrisc -D /tmp/data -o Scenario:, cases: 1000, seed: 20
can be used. Note that when using the one-analysis option parameter
section names must precede parameter specifications. E.g., since the
parameters cases and seed are defined in the `Scenario' section
(cf. simriscparams(7)) they must be preceded by the Scenario:
specification.
When an Analysis: specification modifies parameters, then subsequent
Analysis: sections start from the unmodified option and parameter
specifications.
Here is an example of an analysis file specifying two analyses:
Analysis:
base: 1
cancer: male
parameters: +params.txt
Scenario:
cases: 10
Screening:
round: 50 CT
round: 55 CT
Analysis:
base: 2
config: ~/src/simrisc/stdconfig/lung
parameters: +params.txt
cancer: breast
Scenario:
cases: 20
spread: true
Screening:
round: 50 Mammo MRI
round: 55 Mammo MRI
round: 60 Mammo MRI
round: 65 Mammo MRI
The first lines of the generated files contain time stamps showing the date
and time when the files were written and the used SimRisc version. Here is
an example, following the RFC 2822 format for the timestamp:
Fri, 23 Jan 2026 10:30:26 +0100 (SimRisc V. 16.06.00)
If label: lines are used then the time stamp is followed by the label
specifications, which is then followed by an empty line. After this header the
file's specific data are shown.
The data in all files (except for the file listing the actually used
parameters (option --parameters (P))) are written using the standard
comma-separated format (cf. RFC 4180). The initial lines contain table
headings and column labels documenting the meanings of the various
columns. Likewise there is a final line ending the tables.
Data of simulated cases (data-X.txt files)
Below the date and label lines the legend of the death-status (see below at
item death status) is shown. Each iteration generates a separate
data-X.txt file, where X indicates the iteration index (starting at
value 0).
For each simulated case the values of the following variables are written to file (one line of comma-separated values per simulated case):
case: the (1-based) case-index;
cause of death: either Natural or Tumor;
death age: the case's age of death;
natural death age: the case's natural age of death (if no tumor
occurs);
death status: a numeric index specifying how and at what stage the
case died:tumor present: Yes if the simulation resulted in a tumor, No
if no tumor occurred;
tumor detected: Yes if the tumor was detected, No if not;
interval tumor: Yes if the tumor was an interval tumor, No if
not;
tumor diameter: the tumor's diameter in mm when it was detected. 0.00
is shown if no tumor occurred. In the exceptional case where the
simulation produced a tumor whose diameter exceeded 1000 mm the value
1001 is shown.
tumor doubling days: the time (in days) it takes for the tumor to
double its size;
tumor preclinical period: the age at which the tumor is potentially
detectible by screening. -1.00 is shown if there's no pre-clinial
period;
tumor onset age: the age at which the tumor first occurred. -1.00 is
shown if no tumor developed;
tumor self-detect age: the age at which the tumor was
self-detected. This age is the result of the simulation, and may
exceed the case's actual death age (if so, the case's data report that
no tumor is present). -1.00 is shown if there's no self-detect age;
tumor death age: the age at which the tumor caused or would have
caused he case's death. The simulation process uses ages ranging from
0 through 100. If the age at which the tumor causes the case's death
exceeds 100, then 100.00 is reported. 0.00 is shown if no tumor
developed;
costs screening: the case's screening and (if appliccable) treatment
costs;
costs biop: the costs of a performed biopsy;
detected self: 1 if the tumor was self-detected, 0 if not
(also if there's no tumor);
detected by: the modalities defined in the configuration file are
associated with subsequent powers of two id-numbers. The numbers used
for the various modalities are listed in a legenda just above the
beginning of the data table. E.g.,
'detected by' legend:
1: Mammo
2: Tomo
4: MRI
The `detected by' column shows the id-number of the modality that
detected the tumor. Value 0 is used when no tumor was present or was
not detected by a modality (e.g., the tumor was self-detected or the
case died before the tumor was detected).
detected round: round number at which a tumor was detected, using
the following values (assuming N screening rounds were
specified):
0: no tumor was detected;
-1: tumor was self-detected before the 1st screening round;
1: tumor was detected during/after the 1st screening round
(and before the 2nd round);
*1: tumor was self-detected after the 1st screening round
(which was not attended);
...
3: tumor was detected during/after the 3rd screening round;
*3: tumor was self-detected after the 3rd screening round
(which was not attended);
...
N: tumor was detected during/after the Nth screening round;
*N: tumor was self-detected after the Nth screening round
(which was not attended).
screening rounds: this column show which screening rounds
were attended by the simulated cases, and if so whether false
negative or false positive diagnoses were made. The following digits
are used:
0: the case did not attend this screening round;
1: the case did attend this screening round.
There are as many digits as screening rounds. The leftmost digit refers to the first screening round, the rightmost digit to the last screening round. E.g., using 12 screening rounds the following indicators could be obtained:
001101111000
Using screening round indices (which are also used to refer to rounds
in the rounds-$.txt files), this case did not attent screening
rounds 1, 2, 5, 10, 11, and 12.
TNM: The TNM status/stage of the tumor (see also the
simiscparams(7) man-page). It specifies the row/column index of
table of probabilities of positive lymph nodes and metastatis, given
the size of the tumor (table S3, cf. Goldstraw et al., Journal of
thoracic oncology, 2016, 11(1): 39-51; Yuan et al., Scientific
Reports, 2016, 6(1): 1-9).
Actually used spread-values
When spread: true is specified then by default the actually used and
orgiginal parameter values are written to the file spread-$.txt, where
$ is replaced by the loop's iteration index. Here is a sample from the
content of such a file, showing the values of the Tumor: DoublingTime:
agegroups parameters:
Tumor:
DoublingTime
ageGroup: 1 - 50 configured: 4.38, using: 3.41972
ageGroup: 50 - 70 configured: 5.06, using: 4.83591
ageGroup: 70 - * configured: 5.24, using: 5.30492
The rounds-X.txt files
Each iteration generates a separate rounds-X.txt file, where X
indicates the iteration index (starting at value 0).
The rounds-files summarize the results of the simulations for each separate screening round. It contains at least eight columns plus a column for each used modality:
round: the round number (starting at 1);
false pos: the number of false positive screening results;
false neg: the number of false negative screening results;
number of tumors: the number of detected real tumors;
number of interval: the number of self-detected (interval) tumors;
number of trueInt: the number of self-detected (interval) real tumors;
screening costs: the total screening costs;
screening biop: the total costs of performed biopsies;
number of [modality]: the number of times each modality was used.
Configuration files
~/.config/simrisc: the default location of the program's
configuration file;
simrisc-VERSION/stdconfig/simrisc, where VERSION is
replaced by simrisc's actual release version;
.deb files) the default configuration file is commonly available
as /usr/shared/doc/simrisc/simrisc.gz
simriscparams(7)
Versions before version 15.03.00 should not be used for lung cancer simulations.