I have a rather simple problem that’s somewhat out of my field of expertise, any help would be appreciated.
I have a data set similar to this:
+-----------+-----+-----+-----+-----+-----+-----+
| TIME | BV | C1 | C2 | C3 | C4 | ... |
+-----------+-----+-----+-----+-----+-----+-----+
| | | | | | | |
| timestamp | | 2 | 6 | 10 | 9 | .. |
| | | | | | | |
| timestamp | | -6 | -10 | -17 | -4 | ... |
| | | | | | | |
| timestamp | | -7 | -15 | -14 | -12 | ... |
| | | | | | | |
| timestamp | | 11 | 16 | 12 | 9 | ... |
| | | | | | | |
| ... | ... | ... | ... | ... | ... | ... |
+-----------+-----+-----+-----+-----+-----+-----+
Basically these datasets go on for 10s of millions of rows. There are approx 200 more columns though often 90%+ of them are NULL (meaning no data was collected).
BV == base value, and the CN cols are the percentage difference the other collectors reported compared to the main value.
What I want to do is given 3 rows of data predict, based off the whole dataset, the likely next N rows of data we’ll collect. I plan to start predicting just the next 1 row but perhaps extending out to 3 or 4 would be good.
It seems to me that using bayesian inference to model probable outcomes would be ideal, but am looking for others thoughts before I set off on this project.
Thanks for reading, looking forward to any other approaches anyone may have.