markit Excel plugin and slow spreadsheet loading recalculations
IHS Markit is one of the premium data providers for quants around the globe. One of the mechanism that this data can be accessed is via an Excel plug-in.
An interesting problem can arise with any asynchronous plug-in (not just Markit) that can lead to Excel spreadsheets not loading correctly or recalcuations that never complete. Let's explore the situation that I encountered !
The idea behind this Excel plugin is not that different or unique to Markit. The idea is to make available a new function that the user can code in a cell. Here, I will just name the function RJP to avoid using anything that might be protected by Markit and create a really simple example.
Suppose you need cell A1 to hold the date stock symbol "ABC" last paid a dividend on. In cell A1, you would code something like =RJP("lastDividendPaidDate","ABC") and if all works as designed, the cell will be populated with a date that the last dividend was paid for symbol "ABC" . Make a note here: To Excel, a date is just a number formatted to look like a date.
However, these calls are only triggered when Excel performs a worksheet recalcuation and determines that the cell has "changed". And when the cell has been marked as "changed", Excel makes every effort to ensure that all the dependent cells are up todate.
Now, we have two independent computers (the workstation and the Markit central servers) with the internet with unknown speed between them. So, the plugin when called performs an asynchronous operation. It sends the request for data off to the Markit central server, it records that it is expecting a response and records what cell (A1 in this example) that response will be for. The plug-in doesnt know what the result will be nor does it know when it will get it back. So, just to keep things moving along it returns "#PENDING#" back to Excel. Excel gladly puts that string in cell A1 and marks the cell as "up-to-date". Note that it is supposed to a number, but it isn't. (Well not yet).
Recalculation of all changed cells continues by Excel. Then, from the internet - here comes back the result from our earlier request. The plugin can't interupt Excel, so it does the next best thing which is to mark the cell associated with the result (A1) as "changed" and waits for Excel to call the plug-in during a recalculation cycle.
Eventually, Excel will call the plug-in to get the updated value for cell A1 because that cell has "changed". The plug-in now knows what value to return and does so. And because it is a number, and the cell is formatted to be a date - all looks good to our quant.
So, what can go wrong?
Suppose in cell A2, we want to know the closing price of symbol "ABC" on the "lastDividendPaidDate". We might code something like =RJP("priceClose","ABC",A1). Note that we are using the date from cell A1, creating a dependency. This means we cant calculate A2 until we have number representing a date in cell A1.
Now, Excel is smart enough to determine there is a dendency and takes this into account when it determines the order in which to recalculate cells. Specifically, Excel needs to calculate cell A1 first before calcuating cell A2.
Suppose we start all over with two formula/cell spreadsheet. As before, cell A1 is calculated first and the value "#PENDING#" is returned. Excel is fine with that, and now recalculates cell A2. The plug-in isnt going to like this because it is expecting a number. Now, depending on how the plug-in is written, this check may be local or require a call to the central server. But, the plugin has to return something so it will return a string like #PENDING# and then later it will have to mark the cell as "changed" because it knows that isnt the right answer.
At this point we likely have two calls pending with the central server, and at some point they come back and triggering recaculation of cells A1 and A2. Note the order of the returns in unkown to us. We could get back the error message on cell A2 before the date that goes in cell A1 and have to start that over.
Eventually, we get cell A1 response back, trigger a recalc of cell A1 and we get the date we want. Excel is smart enough to determine that cell A2 is "dependent" on cell A1. When cell A1 changes, it will trigger a recalc of cell A2. Which, now will for sure result in a call to the central server. And, when that request comes back, we will finally have the desired result in cell A2.
Now suppose you have spreadsheet with several thousand rows and a dozen columns each containing a plugin call where there are long chains of dependent cells. Cell A10 depends on cell A9 depends on A8 depends on A7....etc.
This creates a cascading wave of recalculations and traffic to/from the central server. Depending on how fast the network is versus the speed of the workstation, the requests/responses can come in faster then Excel can keep up. Assuming you have enough memory, Excel should eventually work thru all of this but you could be looking at 30 minutes or more plus running into data limits imposed by the central server.
How to solve?
To address this requires a modification of the formulas when the plugin requires information from a cell that is also making a plug-in call. In our simple example, we would change the formula in cell A2:
A2 Before: =RJP("priceClose","ABC",A1)
A2 After: =IF(ISNUM(A1),RJP("priceClose","ABC",A1),NA())
Before we call the Excel plugin, we make sure that cell A1 is numeric. If its not numeric, we just mark the cell as "Not Available". This prevents a call to the plug-in, and it keep the recalculation logic internal to the spreadsheet under Excel's control. Only when cell A1 has a valid number do we call the external plugin.
This will reduce the load time dramatically because now we are only going to make an call to the central server when we know the plugin has the input data in the format required. There is far less Excel overhead to just putting a NA() in a cell versus going thru pages of plug-in logic and possibly interacting with an external logic.
If you have any comments or experience on this, please feel free to ContactMe .