Time series foundation models (TSFMs) are pre-trained models that forecast, classify, impute, and detect anomalies in time series data without requiring a separate model for every dataset or industry. TSFMs use transformer-based architectures and large-scale time-series datasets to generalize across domains such as finance, retail, energy, and healthcare.
Discover the architecture, use cases, adoption in industries, benefits, challenges, and comparisons of time series foundation models with existing models:
What are Time Series Foundation Models?
TSFMs apply foundation model training methods to sequential numerical data, using architectures that can learn temporal patterns from large collections of time series. Instead of training a separate model for each forecasting problem, they can be adapted to tasks such as forecasting, anomaly detection, imputation, and classification.
Leading TSFMs are:
Amazon Chronos-2
Amazon Chronos-2 is an encoder-only model derived from T5 encoder architecture and reached tens of millions of downloads of Hugging Face.1
Salesforce Moirai-2
Salesforce Moirai-2 uses a decoder-only transformer architecture trained on the 27 billion observation LOTSA dataset.
Sundial
Sundial developed by researchers in Tsinghua University has market-leading results on the TimeBench dataset.
TimesFM-2.5
TimesFM-2.5 is Google’s latest model in the TimesFM series. It is a pretrained model with ~200M parameters and 16k context length, trained on a corpus of real-world time series data points. 2 Compared with large language models (LLMs), it brings a compact size, fast inference, and focus on time series data.
Architecture and training
TimesFM borrows the decoder-only transformer architecture from language models: stacked causal self-attention and feedforward layers generate the following output conditioned on past context.
Unlike text, the model represents a sequence as patches of contiguous time points; each patch is embedded (via an MLP residual block plus positional encodings) and treated as a token. A key design choice is to predict a longer output patch length than the input patch, which reduces iterative steps at inference and limits error accumulation on long horizons.
For model training, Google mixes synthetic data (to teach basic temporal “grammar”) with a large, diverse dataset of real series (e.g., Google Trends and Wikipedia Pageviews) to improve transfer. The total pretraining scale is on the order of 100B time points.
Figure 1: Graph showing TimesFM’s architecture.3
Evaluation and results
Google evaluated TimesFM in pure zero-shot mode across public benchmarks. On the Monash Forecasting Archive, TimesFM outperforms most statistical models (e.g., ARIMA, ETS) and matches or exceeds several deep learning baselines trained on the target series.
On long-horizon tasks (e.g., ETT datasets), TimesFM’s zero-shot accuracy rivals supervised baselines (e.g., PatchTST trained per dataset) and beat prompt-based LLM forecasters. Metrics include scaled MAE and geometric-mean summaries across datasets.4
Key characteristics and architecture of TSFMs
TSFMs’ transformer architecture uses self-attention, residual connections, and linear layers to model long-range dependencies and seasonality patterns. Input patches are transformed via a multilayer perceptron into embeddings, while positional encodings preserve temporal order.
Compared to other foundation models, these architectures are adapted for forecasting tasks, rather than text or image processing.
Figure 2: Diagram showing different adaptation techniques.5
What are the primary use cases?
Forecasting
TSFMs use historical time series values and, when supported by the model, additional inputs such as weather, promotions, holidays, or other variables. This allows them to model relationships across multiple signals rather than relying on a single target series.
Classification
TSFMs use transformer-based models to recognize characteristic structures such as arrhythmias in medical data or unusual demand peaks in retail.
Imputation
TSFMs reconstruct missing intervals by leveraging patterns learned from diverse datasets during unified training.
Unlike simple interpolation, they retain consistency with seasonality and trends. Applications include filling gaps in energy usage logs or medical monitoring data, where missing information can affect downstream forecasting tasks.
Anomaly detection
TSFMs can support anomaly detection by converting their learned time series representations or reconstruction outputs into anomaly scores. For example, MOMENT uses a reconstruction-based setup in which the mean squared error between the observed and predicted time series is used as the anomaly criterion.6
This approach can reduce the need for task-specific anomaly labels, but it should still be benchmarked against traditional anomaly detection methods for each dataset.
Industries adopting TSFMs
Retail
In retail, TSFMs are primarily relevant for SKU- or store-level demand forecasting, where sales patterns can change due to holidays, pricing, promotions, stockouts, and regional seasonality.
Their usefulness depends on whether the model supports external variables. For example, TimeGPT can use exogenous variables such as prices, promotions, and holiday indicators.7
Another example, Lag-Llama, is designed as a univariate probabilistic forecasting model. This means TSFMs should not be described as a single class that always incorporates retail-specific drivers.8
A more practical retail use case is to test TSFMs as reusable forecasting baselines on demand datasets, then compare them with existing statistical, machine learning, or domain-specific forecasting models before deployment.
Finance
In financial time series, TSFMs are most relevant for tasks where historical data is limited, noisy, or affected by regime changes. These include forecasting newly listed assets, estimating short-term volatility, and identifying unusual transactions or market patterns.
Single-market models such as ARIMA, GARCH, or LSTM-based forecasters can become less reliable when the data distribution changes after interest rate shifts, liquidity shocks, macroeconomic announcements, or market stress events. TSFMs address this limitation by transferring patterns learned from broader time series datasets, but their outputs still require backtesting because financial data is highly non-stationary.
Potential finance use cases include asset price forecasting, volatility forecasting, portfolio risk monitoring, and fraud or transaction anomaly detection.
Healthcare
TSFMs learn from both clinical and synthetic data, enabling early warning systems that adapt to patient-specific baselines. Beyond monitoring, they support research and discovery in drug trials by identifying subtle temporal patterns across large datasets.
Energy
Unlike traditional methods that assume fixed seasonal patterns, TSFMs handle variable conditions such as renewable generation.
They combine consumption histories with exogenous variables such as temperature and wind speed, producing probabilistic time-series forecasting outputs for grid balancing. Computational efficiency is relevant here, as tiny time mixers provide localized predictions at lower cost. Explore sustainability AI applications for more information.
Transportation
TSFMs trained on diverse datasets can transfer across regions with minimal fine-tuned adaptation. Real-world examples include congestion forecasting in urban areas and optimizing delivery routes in logistics.
Manufacturing
TSFMs handle long-range dependencies across sensors and production cycles, improving early fault detection.
When fine-tuned with facility-specific data, they achieve improved performance in reducing downtime and ensuring quality control.
Weather and climate
Weather and climate modeling requires managing multiple forecast horizons, from hours to years. Statistical models and traditional methods often fail to capture multi-scale variability.
TSFMs, through their transformer architecture and self-attention mechanisms, can model both local and global dependencies. Examples include short-term precipitation forecasting and climate cycle predictions. Probabilistic time series forecasting helps quantify uncertainty in these outputs.
Benefits of time series foundation models
Key advantages of TSFMs compared to existing models include:
- Zero-shot performance: Delivering strong results on unseen datasets without fine-tuned adaptation.
- Reduced training costs: Reuse of one model across domains instead of training separate models.
- Domain generalization: A model adapts to varied contexts with transfer learning and few-shot learners.
- Computational efficiency: Smaller than large foundation models in NLP while still delivering improved performance.
- Versatility: Handling diverse forecast horizons, granularities, and output patch lengths.
Challenges of TSFMs
Technical challenges
Training data scarcity: Unlike text for language models, the available public datasets for time series data is smaller. However, now there are datasets like Large-scale Open Time Series Archive (LOTSA) with billions of observations across multiple domains.9
Lack of universal structure: No equivalent of vocabulary or grammar.
Complex temporal dynamics: Diverse seasonality patterns and histories.
Domain specificity: Different sampling rates and behaviors across industries.
Practical challenges
- Privacy concerns in collecting diverse datasets.
- High computational efficiency requirements for model training.
- Distribution shift in evolving environments.
- Interpretability and transparency in real-world applications.
- Integration into legacy systems and related work pipelines.
Time series foundation models: Development and design factors
Time series foundation models: Outcomes and operational factors
Differences from other foundation models
TSFMs diverge from language models and vision foundational models in several ways:
- Data modality: Sequential numeric data rather than text or images.
- Architecture: Adapted transformer-based architectures with patching and normalization (e.g., reversible instance normalization).
- Training approach: Incorporating both synthetic data and real-world corpora, like Google Research datasets.
- Scale: Smaller in size than large foundation models, yet delivering high-quality point forecasts.
- Evaluation: Benchmarked on forecasting tasks, anomaly detection, and imputation instead of text understanding.
Conclusion
Time series foundation models represent a shift from domain-specific statistical models, regression models, and supervised deep learning toward a unified model for time series. By applying transformer-based architectures and leveraging pre-trained models, they offer scalable solutions for forecasting tasks, anomaly detection, and other applications across industries.
While challenges remain in training data availability, interpretability, and integration into existing workflows, the advantages in zero-shot forecasting, transfer learning, and cross-domain adaptability position TSFMs as a key step toward general-purpose forecasting.
Cite this research
Pick the format that matches where you're publishing. Pasting the link version into your CMS preserves the backlink.
@misc{ermut2026,
author = {Ermut, Sıla},
title = {{Time Series Foundation Models: Use Cases & Benefits}},
year = {2026},
month = jun,
howpublished = {\url{https://aimultiple.com/time-series-foundation-models}},
note = {AIMultiple. Retrieved June 12, 2026}
}

Be the first to comment
Your email address will not be published. All fields are required. Comments are left in their original language.