Next: , Previous: Reports, Up: Top


15 Utilities

With the details you already have, you should be able to conduct complicated simulation scenarios with various models and population sets. You are also able to produce reports to help analyze the simulations. Yet sometimes there is a need to get beyond the results of a single simulation, or there is a need to take the data outside the GUI. To support such manipulation, the system offers some python utilities that arrive with the system. The following text will explore these utilities and their usefulness by subject.

The utilities are python scripts that allow the user to perform special advanced tasks. Here is a brief list of these scripts:

15.1 Invoking Utility Scripts

All these scripts are invoked using a similar manner. Therefore for explanation purposes we will refer to the script name, including the .py extension as: PythonScript.py. Whenever the name PythonScript.py is encountered, it should be replaced with the script name of interest.

The above scripts all start from the command prompt / terminal window. In Linux you can open a terminal window. In windows you can select the command prompt under the program group called accessories when you click on the windows start button on the lower left corner. On the windows start menu, you can also select run and then type cmd and then press Enter to launch the command prompt.

Once you opened the terminal, you will have to change directory to your working directory by typing:

cd WorkingDirectoryFullPath

Recall that your working directory is the directory you installed IEST and WorkingDirectoryFullPath means the full path name. To write the full directory name you can use the tab completion feature, or use drag and drop of a file into the command prompt window in windows and make corrections to the name that appears. Note that the directory separator on PC is the backslash character \ while on Linux it is a slash character / .

Once you are in the correct directory, you can invoke the script PythonScript.py by typing:

On Linux:

python PythonScript.py

On Windows:

c:\Python27\python PythonScript.py

Note that if you defined the python installation directory to be in the windows path you can just write the script name PythonScript.py in the windows console. For further information on how to invoke python on Windows, see the following link. link.

For purposes of this tutorial we will always use the Linux form of invoking the program: python PythonScript.py

The invoked script will show you usage information and list its input variables. The program will then ask you to enter input through the console, each time prompting a single input. You can now follow the questions to run the script.

It is also possible to invoke the scripts with all their inputs from the command line and avoid asking the user for additional input. To do this, just add the input values after the script name:

python PythonScript.py InputVariable1 InputVariable2 ...

Note that each script will have different requests for input variables. And that in many cases, there may be defaults for some variables making them optional. Optional variables are displayed in brackets [] in the usage information if the script is invoked with no variables.

We will now continue to discuss each utility script separately.

15.2 Conversion of Data to Code

The script in focus for this topic is ConvertDataToCode.py.

If you think about the way you work with the system, entities are created in a certain order and reference each other. The order of entity creation is important to enable certain dependencies. For example you need to define a state before you include it in a process, you need to create a model before you use it in a project, and you need to create a parameter before you use it in an expression. It is somewhat similar to building a house: you first need to build the foundations, then the main body, and only then the roof, in that order. And just like in a house, after it is built it is sometimes difficult to make a correction in foundations. This analogy of building a house may be helpful later on, for now we will get back to our system and the GUI.

Each time you create a new entity the system will add it to the database. This database can be saved and loaded by the system as a zip file. This file is referred to many times as the data definitions file, since this file holds the entire database of entities that enables us to save and load our work. It can also contain simulation results on top of the project that created them. Think about adding entities to the system analogous to adding bricks to the house and think of the database as a snapshot of the entire house.

Think about a situation where instead of clicking your way through the system forms and entering data in a certain order, you can write down sentences that describe what you are doing is the form of instructions. Such a set of instructions can be used to create the database from scratch. This set of instructions constitutes a program that can reconstruct such a database. With analogy to the house, think about this as a plan with detailed instructions to a quick builder on how to build the house.

Now think that you already have a database zip file and want the system to figure out what is the set of instructions that created the database zip file. The system can do just that if you used the utility called ConvertDataToCode.py.

This utility takes a database zip file as input and creates Python code that reconstructs this database. With analogy to the house, think about it as looking at a snapshot of a house and automatically deriving the plans for the house as instructions to the builder.

The main input parameter to the reconstruction program is the database zip file we will denote as DataDefintionsFileName.zip and typically the script will be invoked in the following way:

python ConvertDataToCode.py DataDefintionsFileName.zip

This will avoid asking questions from the user and just perform the conversion with default values which are recommended for most cases. By default the set of instructions to create the database will be saved as a reconstruction Python program using the file name: TheGeneratedDataCode.py

With regards to our analogy, think about this file as the plan containing instructions for the builder to build the house.

If you open this file, you will find instructions that create your data base in the following order:

Unless you request for it specifically, simulation results will not be converted by default, otherwise these will appear at the end.

At the very end of the code, you will find a line that creates a new zip file from the code under the default file name TheGeneratedDataCode_out.zip. So if you run the python reconstruction program TheGeneratedDataCode.py it will create a new database zip file under the name TheGeneratedDataCode_out.zip that is equivalent to the database file you converted to code.

To run the conversion from code back to data use the command: python TheGeneratedDataCode.py

If there are no changes to the python reconstruction program then this allows circular path between code and data that can be followed in either direction. In other words, this allows transfer from data definitions to code and vice versa so that code and data definitions are now interchangeable. With analogy to the house, think about is as having the ability to build a house from a plan containing building instructions and the ability to take a snapshot of an existing house and convert it back into building plans. This is powerful mechanism that allows the user to make complicated changes easily.

The most useful task that can be performed through code is making changes while avoiding dependencies. For example, if the user wants to change a name of a parameter from Diabetes to Type2Diabetes, once Diabetes is used, the system will not allow the user to perform this change through the Graphical User Interface (GUI) since this will violate dependencies. Yet it is possible to do this using find and replace operation in the code file and then reconstructing a new data file. Note that the user should be careful to make the changes in all places and avoid name clashes and changes of other variable names with the word Diabetes in them. If the changes the user made in the code are reasonable, once the code is executed a new database file will be created. If changes create conflicts or are otherwise invalid, the system will not be able to reconstruct the data file. With analogy to the house, think about it as being able to take a snapshot of an existing house converting it into plans, changing the plans of the foundation and then rebuilding the entire house from the existing plan.

Note that this type of operation is intended for the advanced user and the user is responsible for making intelligent changes in the code. However, the system will make validity checks when converting the code back to data. With analogy to the house, it is up to the designer to make a proper change in the foundation in the plan, otherwise the builders will either not be able to build this house, or if the house is built, it may be faulty due to a bad change in the foundations.

There are other uses of this powerful capability code that include:

  1. Merging different model versions by selecting wanted code lines from each version.
  2. Conversation of code into a document or spreadsheet tables by replacing delimiting text with table separation characters and importing into a spreadsheet or a word processor application.
  3. Finding changes between data definition files by comparing their code representation.

There may be other uses to this powerful capability. Yet again it is important to understand that it is not recommended for non advanced user. If not used properly, it can cause much confusion. Never the less it is a very useful tool.

As an example, it is recommended to run the following command:

python ConvertDataToCode.py Testing.zip

This will convert the testing data definitions to code in the file TheGeneratedDataCode.py that can be inspected by the user or executed to regenerate the data definitions.

15.3 Running Multiple Simulations

The script in focus for this topic is MultiRunSimulation.py.

Using the GUI it is possible to define and run a simulation by pressing the Run Simulation Button in the simulation screen. Each time a simulation is launched there is a need to wait for it to finish. Once done, simulation results are accessible.

However, since we typically run a Monte-Carlo simulation, we will expect different results each time we run the simulation. If we want to get a good understanding of the distribution of results, there is a need to run many repetitions of the same simulation. This is possible to do by defining a large number of repetitions for a project. However, for practical reasons it is may not be the most efficient thing to do. These reasons include: 1) Running the simulation for a very large number of population repetitions, such as 100,000 or more, may be required for some models to get stable results, yet it may take much time to wait for results. 2) Keeping simulation results in memory may not be practical as it may require larger machines and is prone to interruptions of simulation. 3) We sometimes want the population size to match the study size to allow better comparison of results. 4) Sometimes the user may wants to run the simulations outside the GUI - perhaps as a batch job.

To resolve these issues and offer further flexibility, the system provides a mechanism to run simulations outside the GUI using the MultiRunSimulation.py script. When the script is invoked, it will ask for the following parameters in this order:

As an example, it is recommended to run the following command: python MultiRunSimulation.py Testing.zip 0 3

This will run the first example in the file 3 times and will generate the files Testsing_0.zip, Testsing_1.zip, Testsing_2.zip, each holding simulation results for the first project. You can then load these files through the GUI and inspect the results in each file.

Note that the simulation will be conducted sequentially one after the other on the same machine on the same CPU core. So using MultiRunSimulation.py script does not save simulation time in this form. However, this script allows avoiding memory limit violations. It allows practical flexibility of conducting simulations by manipulating the simulation defaults and scaling the simulation result sizes after definitions. These capabilities can be utilized manually by the user. However, these capabilities are best utilized by the system to provide parallel computing capabilities as will be discussed later.

15.4 Generating Textual Reports from Multiple Results

The script in focus for this topic is MultiRunCombinedReport.py.

Using the GUI it was possible to generate a report for a single simulation result set. However, even within the GUI it is possible to run several simulations for the same project, each time creating a new results set while the report is per simulation results set - not per project. Moreover, if simulations for the same project were generated using MultiRunSimulation.py, then results exist in multiple files and it is hard to compose a report for all of these together.

The MultiRunCombinedReport.py script allows pulling together several result sets from multiple files and creating a single report combining them together. It is up to the user to make sure that the result sets are compatible.

When this script is invoked, it will ask the user a few questions as input. It is possible to answer the questions by hand, or prepare a file with the answers and run the script with this file as input as depicted in the usage.

The inputs requested are:

  1. A list of data file names from which results will be collected, each in a separate line and a blank line to indicate the end of the list. These will be the files from which results will be pulled.
  2. A list of simulation result ID numbers, each is a separate line with a blank line to end the list. These ID values will be searched in each file mentioned above to create the report. Typically, however, if there are multiple ID numbers defined, then there will be only one results file and vice versa.
  3. An optional list of format options. These format options are provided as line pairs of OptionName and OptionValue. A blank line indicates the end of the format options list. Note that an easy way to obtain this list is saving the format options from the GUI results form into an .opt file and copying the contents of this file.
  4. The optional output report filename. If unspecified, the report name will be Report.txt.

As an example that demonstrates the capabilities of this utility, we will build upon the results from the previous example created by MultiRunSimulation.py . In this example invoke the program in the following manner:

python MultiRunCombinedReport.py

Then provide the following answers, where (Press Enter for Blank Line) stands for an empty line:

     Testing_0.zip
     Testing_1.zip
     Testing_2.zip
     (Press Enter for Blank Line)
     1
     (Press Enter for Blank Line)
     DetailLevel
     1
     (Press Enter for Blank Line)
     (Press Enter for Blank Line)

These inputs can be also saved into a file that will be provided as a parameter to the script in the command line when it is invoked.

Once the script finished running, you can open the file Report.txt and find a detailed report that will combine results from all 3 simulations in the 3 files created previously. Note that the record count is 3000 rather than 1000. Also note that the filenames are presented at the top of the report.

The MultiRunCombinedReport.py script in combination with the previous MultiRunSimulation.py script allows overcoming memory limitations by chopping down a large simulation to smaller chunks. This is one way to get better statistics while running a report. However, processing the report may be very time consuming, especially if there are many files since this is done sequentially. Moreover, the report will combine all individuals together into a single report so the number of individuals in the report may not match the study size. Finally, the report is textual. The system provides other tools that provide further flexibility in reporting results that are discussed next.

15.5 Generating Spreadsheet Reports from Multiple Results

The script in focus for this topic is MultiRunSimulationStatisticsAsCSV.py.

Previous reports were textual with fixed width tables, yet since most reports in the system are tabular it makes sense to create the report as a spreadsheet. A common method to represent such reports textually is the CSV format that stands for Comma Separated Values. In this format, each cell in the spreadsheet is separated from its neighbor rows using commas and a new line indicates a new row in the spreadsheet. Spreadsheet applications can open this file and the user can then manipulate it further if needed.

The script in focus is able to generate such a CSV report from a data definitions zip file with results. Moreover, this script can do this for multiple files generated by MultiRunSimulation.py and generate additional statistics in a summary report. Furthermore, this script allows processing this information in parallel and cutting down computation time significantly if computing power is available.

It is possible to invoke the script without input parameters in the command line and enter them manually. Yet it is usually invoked from the command line as follows:

python MultiRunSimulationStatisticsAsCSV.py FilePattern ResultsID OptFile OutPrefix

Note that the last three command line parameters are optional and can be omitted. Here is a description of these inputs:

This script enables processing of reports for multiple result files and can be invoked on a machine with a single CPU, or in parallel processing environment. Here are examples that will build upon the results from the previous example created by MultiRunSimulation.py:

Example for running simulation statistics in serial:

python MultiRunSimulationStatisticsAsCSV.py "Testing_*.zip"

This will generate 8 CSV files: Testing_0.csv, Testing_1.csv, Testing_2.csv, Testing_Max.csv, Testing_Mean.csv, Testing_Median.csv, Testing_Min.csv, Testing_STD.csv. The first 3 files will contain a report of the results from the corresponding zip file. The last 5 files will gather information from these 3 files and calculate a specific statistic function over these files using the functions: Max, Mean, Median, Min, STD. Note that a CSV report will look rotated compared to a textual report since columns become rows and vice versa. In the generated CSV reports, each row represents a different parameter and calculation and each column represents different time steps within a stratification cell. If there are several stratification cells, these will appear as column blocks starting with a mostly blank column defining the stratification. Note that the first few columns/rows contain headers. The statistic files also contain a new row at the end defining the number of repetitions from which the information was extracted. This is helpful to figure out how much information was available to construct the statistics. Note that in contrary to MultiRunCombinedReport.py that combines all results and then generates a textual report on the combined population size, MultiRunSimulationStatisticsAsCSV.py will generate multiple CSV reports for each result set using the original specified population size and provides statistics on what happens when the simulation is repeated several times.

The same example above can be repeated by running the script several times in parallel with different input parameters. To do this, the following commands should be run in parallel. This can be simulated by running the commands from multiple console/terminal windows:

python MultiRunSimulationStatisticsAsCSV.py Testing_0.zip python MultiRunSimulationStatisticsAsCSV.py Testing_1.zip python MultiRunSimulationStatisticsAsCSV.py Testing_2.zip

Once all the above scripts have finished, run the collection script: python MultiRunSimulationStatisticsAsCSV.py Testing_*.csv

The first 3 commands will create a since CSV report for each zip file, while the last command will create the 5 summary statistics CSV files from the single report CSV files. The results are similar to running the computation in the serial case, while gaining the advantage of utilizing computing power to cut down coverall computation time. This advantage is significant in High Performance Computing Environment (HPC) where this script is executed on a cluster, as will be shown later on.

15.6 Assembling CSV Reports from Multiple Scenarios

The script in focus for this topic is AssembleReportCSV.py.

Typically, simulations reproduce a few different scenarios that should be compared. For example the results of a control group need to be compared to the results of an intervention group in a simulated clinical study. Once results are available, the user will want to see the results near each other on the same report using similar terminology. Alternatively, a user may want to compare simulation results to the actual results obtained from a clinical trial. Also, the user may just want to narrow down the amount of information from a single CSV report file to compare specific time frames and stratifications in a certain order from a much larger list.

The system provides some support to accommodate such comparison and visualization through the AssembleReportCSV.py utility.

The AssembleReportCSV.py utility assumes that MultiRunSimulationStatisticsAsCSV.py created summary simulation reports as CSV files. And these files are to be combined to a single file that compares specific columns from those CSV files, and possibly includes reference columns from other files with a similar format.

The script is always invoked from the command line in the following format:

python AssembleReportCSV.py AssemblySequence OutputFileName

The report generated is very similar to previous CSV reports with the difference that it can extract columns from multiple files and provides a title for each such column. So the output file contains the following information for each column: user specific title, the file name from which the column was extracted for reference, the stratification requested by the user, the project name that generated the results, the model name used in the project, the population set name used in the project, start step of interval, end step of interval, many rows with parameter statistics, repetitions count.

To make the report readable it is recommended to extract the first two header columns by including the following tuples in the beginning of the sequence: ('FileName','',''), ('FileName','Start Step','End Step'). Note that this assumes that <Header> was selected as the first parameter in the report options file, which is the default.

Here is an example that builds again on the simulations we conducted using MultiRunSimulation.py and on reports we created using MultiRunSimulationStatisticsAsCSV.py beforehand.

Type in the following command:

python AssembleReportCSV.py "[('Testing_Mean.csv','',''), ('Testing_Mean.csv','Start Step','End Step'), ('Testing_0.csv','0','0','','Simulation 1 result'), ('Testing_1.csv','0','0','','Simulation 2 result'), ('Testing_2.csv','0','0','','Simulation 3 result'), ('Testing_Mean.csv','0','0','','Mean of 3 simulations') , ('Testing_STD.csv','0','0','','STD of 3 simulations'), ('Testing_0.csv','1','1','','Simulation 1 result'), ('Testing_1.csv','1','1','','Simulation 2 result'), ('Testing_2.csv','1','1','','Simulation 3 result'), ('Testing_Mean.csv','1','1','','Mean of 3 simulations') , ('Testing_STD.csv','1','1','','STD of 3 simulations'), ('Testing_0.csv','2','2','','Simulation 1 result'), ('Testing_1.csv','2','2','','Simulation 2 result'), ('Testing_2.csv','2','2','','Simulation 3 result'), ('Testing_Mean.csv','2','2','','Mean of 3 simulations') , ('Testing_STD.csv','2','2','','STD of 3 simulations'), ('Testing_0.csv','3','3','','Simulation 1 result'), ('Testing_1.csv','3','3','','Simulation 2 result'), ('Testing_2.csv','3','3','','Simulation 3 result'), ('Testing_Mean.csv','3','3','','Mean of 3 simulations') , ('Testing_STD.csv','3','3','','STD of 3 simulations')]" Testing_Out.csv

This example demonstrates the use of this script to compare the results from each of the 3 simulations at all 3 years near each other. It also compares those to the Mean and STD statistics extracted for those 3 simulations.

Note that the user can specify a reference CSV file that can be used to include specific columns. Also note that the system will not check if the rows match, it just selects columns from multiple files and assembles those together. It is up to the user to make sure the columns and their definitions match between files. With good organization of the data, CSV reports can now be read by human or reused to create graphical plots as described hereafter.

15.7 Creating Graphical Plots

The script in focus for this topic is CreatePlotsFromCSV.py.

Once a CSV report is assembled, it is possible to use a spreadsheet to plot graphs using external tools. However, in many cases, there is a need to create the same plot repetitively in an automated way without manipulating the CSV file after its creation. To support such a method, the system provides the utility CreatePlotsFromCSV.py.

This utility relies on the format that is produced by AssembleReportCSV.py since it expects the first row to contain a title. It also expects the first two columns in the file to contain header columns with parameter and calculation method. Basically what the script does is produce a plot where the X and Y axis values are selected by the user by specifying a parameter and a calculation method. The script is sensitive to the titles provided at the first row and defines these as different series with different legends in the plots. It can also generate several plots together.

This script is invoked with the following command line:

python CreatePlotsFromCSV.py InputFileName OutputFileName PlotSequence

The next example will demonstrate plot generation from the CSV file previously created by the AssembleReportCSV.py example.

python CreatePlotsFromCSV.py Testing_Out.csv Testing_Out.pdf "[ [('','Start Step',''), ('Alive','Sum All',''),[ ('Age','Avg All',''), ('Alive','Sum All',''), ('Dead','Sum All','')]] , ['Simulation 1 result', 'Simulation 2 result', 'Simulation 3 result', 'Mean of 3 simulations'] , ['r-','g-','b-','k-', 'r--','g--','b--','k--'] ]"

This command will create a pdf file with two plots. The first will show the number of alive people per year for each simulation and for the average of 3 simulations. The second plot, will also show the number of deaths per year on the same plot where the X axis is age.

This plot script can be included in other scripts to build elaborate graphical reports as will be demonstrated later.

15.8 Running Simulations and Reports in Parallel on a Computer Cluster

The script in focus for this topic is SlurmRun.py.

The utility scripts above can be used to conduct simulations, generate reports, and even create graphical plots. Those utilities run on both Linux and Windows. Those utilities can also work in High Performance Computing (HPC) environment where these can be executed on a cluster of computers. Although the system can potentially run on several HPC environments, the HPC environment of choice for the system is SLURM. More information in SLURM be found here.

If you have SLURM install on a computing cluster that also has all required packages installed on it, the system provides the SlurmRun.py script that executes a complete simulation and reporting mechanism.

Note, however, that contrary to other scripts that receive input parameters when run and should not be changes, this script is a Python program that should be changed by the user to adapt to their needs. So it is assumed that the user has at least basic understanding of Python and programming. This tutorial may be helpful for getting acquainted with Python.

SlurmRun.py starts with a set of definitions that are intended for change by the user. After these are defined, the system will run the simulation in parallel in 3 main phases. These phases include several sub phases that will be described hereafter:

Using these phases, the system can run many simulations in parallel and receive many results from many scenario variations. To control the simulation, the user will change parameters in the scenario definition section at the top of the script. These parameters are:

15.9 Extracting Results For External Processing

The script in focus for this topic is MultiRunExportResultsAsCSV.py.

The user may wish to process simulation results using different calculation techniques than those provided so far. Or the user may wish to store the calculations within a database that can be read by other systems. To provide such capabilities, the system provides a script to convert the simulation results from the internal zip form to CSV files that can be read by many systems, including spreadsheets and database applications.

It is possible to invoke the script without input parameters in the command line and enter them manually. Yet it is usually invoked from the command line as follows:

python MultiRunSimulationStatisticsAsCSV.py FileNamePattern ResultsID ColumnName

Where:

The output file name for each file that matched the FileNamePattern will be the same as the file name with the .zip ending replaced with Results.csv. The first line in this output file will contain the parameter names to allow easier visualization and import into spreadsheet and database applications.

To demonstrate this script, here is an example that is based on the results from the previously described example of MultiRunSimulation.py. Try running the script with the following line:

python MultiRunExportResultsAsCSV.py "Testing_*.zip" 1 IndividualID Repetition Time Age Alive Dead

The system will create 3 files: Testing_0Results.csv , Testing_1Results.csv, Testing_2Results.csv. Each file will contain 6 columns corresponding to the list provided by the user. These files are easily opened with a spreadsheet application and the results there can be manipulated further by the user to create their own reports.