There is a very useful script called runparallel
which allows you to distribute input coming from STDIN
to multiple parallel incarnations of a process. The output of these processes (if they produce any and
you are interested in it) is then combined into one. This is done while maintaining the correct order,
just like if the input had been processed sequentially by a single process.
Let’s look at an example: Imagine we have a custom script get_name.sh
that reads ADO IDs on STDIN
and returns both the ADO IDs and the respective ADO longname. And it does this by looping through the ADO IDs
and calling some Formula Engine code to get the longname etc.:
acdba@acbox:~$ echo C0.ISS.1 | ./get_name.sh
C0.ISS.1,TERRAFINO INDUSTRIES COR-7.25% PFD
acdba@acbox:~$
If we do this for a single ADO, this is quick. We can even wait for 100. But things are starting to take longer once we run this for a few thousand ADOs.
This is where runparallel
can help:
acdba@acbox:~$ echo "select symbol from fundmstr order by 1" | ac_bl -qs - | runparallel -l10 -n8 ./get_name.sh
+++INFO+++ 20190405_22:12:08 runparallel: Starting: runparallel -l10 -n8 ./get_name.sh
C0.ISS.1,TERRAFINO INDUSTRIES COR-7.25% PFD
C0.ISS.2,TERRAFINO INDUSTRIES-7.25% NOTES
...
C0.ISS.99998,TERRAFINO INDUSTRIES CORP
C0.ISS.99999,TERRAFINO INDUSTRIES CORP-CL E
+++INFO+++ 20190405_22:12:09 runparallel: Finished runparallel
acdba@acbox:~$
The echo ... | ac_bl -qs -
to the left of runparallel returns all ADO IDs in our system. Then we use
-l10
and -n8
to instruct runparallel to divide the input into batches of 10 and run 8 instances
of get_name.sh in parallel.
So, rather than a single process collecting all the ADO longnames, we now have 8. And we expect this to be much faster.
Btw. the script runparallel
is very useful in connection with
this approach for effectively applying an FE function to one ADO at a time.