Date of Award
Doctor of Philosophy (PhD)
Time series analysis is an essential tool in modern world statistical analysis, with a myriad of real data problems having temporal components that need to be studied to gain a better understanding of the temporal dependence structure in the data. For example, in the stock market, it is of significant importance to identify the ups and downs of the stock prices, for which time series analysis is crucial. Most of the existing literature on time series deals with linear time series, or with Gaussianity assumption. However, there are multiple instances where the time series shows nonlinear trends, or when the underlying error structure is non-Gaussian. In such instances, nonlinear time series analysis is essential. That can be achieved by using a nonlinear parametric structure or using nonparametric approaches.
In Chapter 2, we have proposed a quadratic prediction procedure that provides a better prediction accuracy when there exists non-linearity or non-Gaussinaity in the time series and a quantification of the amount of prediction gain we obtain using the quadratic prediction. We also provide a characterization of the processes for which the quadratic prediction will always give a better result compared to linear prediction in terms of the bispectra of the underlying process. We have provided ample simulation studies and two real data analyses to substantiate the theoretical results obtained. Chapter 3 deals with polyspectral means, a higher-order version of spectral means, which gives us important insights into a time series under the existence of non-linearity. We have proposed an estimate of the polyspectral mean and derived its asymptotic distribution. We have also proposed a linearity test based on the obtained asymptotic normality result. Finally, we have provided a simulation study and a real-world data analysis to offer possible applications of the polyspectral means in the real-world scenario.
The next part of the thesis deals with real data analysis. Chapter 4 is devoted to an election-prediction algorithm, which utilizes hashtag information and the dynamic network structure in social media data and the opinion polls. We proposed two algorithms, one using the network structure (THANOS) and one without (THOS). Both our methods performed better than existing election prediction algorithms. Also, for closely fought elections, the one using the network structure gave much closer predictions than the one without. Chapter 5 involves proposing a bot-detection algorithm for social media data. Inorganic accounts, famously known as bots, are used extensively for spreading malicious information and false propaganda, and it is of significant importance to identify them as quickly as possible. We have extracted several temporal and semantic features and used known machine learning algorithms to identify the inorganic accounts.
The final chapter deals with bootstrap in extreme value analysis. Efron’s bootstrap is found to be inconsistent with extreme value theory. It is known that m out of n bootstrap works in this particular scenario when m = o(n). However, not much work has been done in finding the optimal choice of m in the m out of n bootstrap. In Chapter 6, we propose an optimal choice of m which would minimize the convergence rate of the bootstrap. We have given a real-world data analysis using the AQI level of several cities around the world.
Chair and Committee
Ghosh, Dhrubajyoti, "Contribution to Data Science: Time Series, Uncertainty Quantification and Applications" (2022). Arts & Sciences Electronic Theses and Dissertations. 2773.