Help: Inferring outcomes based on previous data

algorithm
programming

#1

I have a rather simple problem that’s somewhat out of my field of expertise, any help would be appreciated. :slight_smile:

I have a data set similar to this:

+-----------+-----+-----+-----+-----+-----+-----+
|   TIME    | BV  | C1  | C2  | C3  | C4  | ... |
+-----------+-----+-----+-----+-----+-----+-----+
|           |     |     |     |     |     |     |
| timestamp |     | 2   | 6   | 10  | 9   | ..  |
|           |     |     |     |     |     |     |
| timestamp |     | -6  | -10 | -17 | -4  | ... |
|           |     |     |     |     |     |     |
| timestamp |     | -7  | -15 | -14 | -12 | ... |
|           |     |     |     |     |     |     |
| timestamp |     | 11  | 16  | 12  | 9   | ... |
|           |     |     |     |     |     |     |
| ...       | ... | ... | ... | ... | ... | ... |
+-----------+-----+-----+-----+-----+-----+-----+

Basically these datasets go on for 10s of millions of rows. There are approx 200 more columns though often 90%+ of them are NULL (meaning no data was collected).

BV == base value, and the CN cols are the percentage difference the other collectors reported compared to the main value.

What I want to do is given 3 rows of data predict, based off the whole dataset, the likely next N rows of data we’ll collect. I plan to start predicting just the next 1 row but perhaps extending out to 3 or 4 would be good.

It seems to me that using bayesian inference to model probable outcomes would be ideal, but am looking for others thoughts before I set off on this project.

Thanks for reading, looking forward to any other approaches anyone may have.