API Documentation
Overview
TabH2O is a foundation model for tabular prediction. Send labeled training data and unlabeled test data, get back predictions. Supports classification, regression (including timeseries forecasting), clustering, and imputation. Clustering and imputation are available on paid plans — contact sales for access.
Quickstart
Get your first prediction in under a minute.
Authentication
- Sign in with your LinkedIn or Google account
- Go to your dashboard to create an API key
- Include the key in every request as a Bearer token:
Authorization: Bearer tabh2o_live_...Base URL
https: //tabh2o.h2oai.com/api/v1Your first prediction
Send a classification request with a small inline dataset:
curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
-H "Authorization: Bearer tabh2o_live_..." \
-H "Content-Type: application/json" \
-d '{
"train": {
"data": [
[25, 50000, 1, "Yes"],
[30, 60000, 3, "No"],
[22, 45000, 0, "Yes"],
[35, 70000, 8, "No"],
[28, 52000, 2, "Yes"],
[40, 85000, 10, "No"]
],
"columns": ["age", "income", "experience", "purchased"]
},
"test": {
"data": [[27, 53000, 2], [38, 78000, 7]],
"columns": ["age", "income", "experience"]
},
"target_column": "purchased",
"task": "classification"
}'Response:
{
"predictions": ["Yes", "No"],
"probabilities": [[0.051431, 0.948569], [0.926749, 0.073251]],
"metadata": {
"task": "classification",
"model": "tabh2o_v1_20260408",
"train_rows": 6,
"test_rows": 2,
"columns": 4,
"time_ms": 245
}
}Making predictions
All predictions go through a single endpoint. You choose the task type and input format.
POST https://tabh2o.h2oai.com/api/v1/predictTask types
| Task | Description | Required fields |
|---|---|---|
| classification | Predict categorical labels | target_column |
| regression | Predict continuous values (add time_column for timeseries) | target_column |
| clustering | Group similar rows (unsupervised) Paid | none |
| imputation | Fill missing values (unsupervised) Paid | none |
JSON request body
Send data inline as JSON arrays. Set Content-Type: application/json.
| Field | Type | Description |
|---|---|---|
| train.data | array of arrays | Labeled rows including target column |
| train.columns | array of strings | Column names (including target) |
| test.data | array of arrays | Unlabeled rows to predict |
| test.columns | array of strings | Column names (without target) |
| task | string | One of: classification, regression, clustering, imputation (clustering & imputation require paid plan) |
| target_column | string | Name of the target column. Required for classification and regression. |
| time_column | string | Name of the time/date column. Enables timeseries forecasting (regression only). Accepts any format that pandas can parse with pd.to_datetime(). |
| n_clusters | integer | Number of clusters (2-1000). Required for kmeans, ignored for dbscan. |
| cluster_method | string | Clustering algorithm: kmeans (default) or dbscan. |
CSV file upload
Instead of JSON, send a CSV file via multipart/form-data. Same endpoint, same auth. Rows where the target column is empty or null are treated as test rows; the rest are training rows.
| Field | Type | Description |
|---|---|---|
| file | file | CSV file with header row. Include all columns including target. Leave target empty for rows you want predicted. |
| task | string | One of: classification, regression, clustering, imputation (clustering & imputation require paid plan) |
| target_column | string | Name of the target column. Required for classification and regression. |
| time_column | string | Name of the time/date column. Enables timeseries forecasting (regression only). Accepts any format that pandas can parse with pd.to_datetime(). |
| n_clusters | integer | Number of clusters (2-1000). Required for kmeans, ignored for dbscan. |
| cluster_method | string | Clustering algorithm: kmeans (default) or dbscan. |
Classification
Predict categorical labels. Provide labeled training rows and unlabeled test rows. Requires target_column and task: "classification".
Example data
Rows with an empty target are predicted:
age,income,experience,purchased
25,50000,1,Yes
30,60000,3,No
22,45000,0,Yes
35,70000,8,No
28,55000,2,
38,75000,7,Request
curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
-H "Authorization: Bearer tabh2o_live_..." \
-F file=@data.csv \
-F target_column=purchased \
-F task=classificationResponse
{
"predictions": ["No", "No"],
"probabilities": [[0.62023, 0.37977], [0.847015, 0.152985]],
"metadata": {
"task": "classification",
"model": "tabh2o_v1_20260408",
"train_rows": 4,
"test_rows": 2,
"columns": 4,
"time_ms": 594
}
}probabilities contains per-class confidence scores for each test row. All floats are rounded to 6 decimal places.
Regression
Predict continuous numeric values. Same structure as classification but with task: "regression".
Example data
sqft,bedrooms,garage,price
1200,2,0,250000
1800,3,1,350000
2400,4,1,500000
900,1,0,180000
1500,3,1,310000
1600,3,1,
2000,4,0,Request
curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
-H "Authorization: Bearer tabh2o_live_..." \
-F file=@houses.csv \
-F target_column=price \
-F task=regressionResponse
{
"predictions": [328540.891023, 421730.336712],
"confidence_intervals": [[295012.5, 362069.282045], [389103.170489, 454357.502935]],
"metadata": {
"task": "regression",
"model": "tabh2o_v1_20260408",
"train_rows": 5,
"test_rows": 2,
"columns": 4,
"time_ms": 312
}
}confidence_intervals contains [lower, upper] bounds for each prediction. All floats are rounded to 6 decimal places.
Timeseries forecasting
Forecast future values by providing historical data with a time column. Uses task: "regression" with an additional time_column field. Train rows contain known values; test rows contain future dates to predict.
Example data
date,store,sales
2025-01-01,A,120
2025-01-02,A,135
2025-01-03,A,98
2025-01-04,A,142
2025-01-05,A,155
2025-01-06,A,
2025-01-07,A,Request
curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
-H "Authorization: Bearer tabh2o_live_..." \
-F file=@sales.csv \
-F target_column=sales \
-F time_column=date \
-F task=regressionResponse
{
"predictions": [161.234018, 148.709452],
"confidence_intervals": [[142.017839, 180.450197], [128.503211, 168.915693]],
"metadata": {
"task": "regression",
"model": "tabh2o_v1_20260408",
"train_rows": 5,
"test_rows": 2,
"columns": 3,
"time_ms": 380
}
}Multi-series forecasting
To forecast multiple time series at once (e.g., sales per store, metrics per device), add an item_id column that identifies each individual series. The model will learn from all series jointly and return predictions grouped by item_id. For single time series data (no item_id column), the API treats the entire dataset as one series — no changes needed.
Example data (multi-series)
item_id,date,store_type,sales
store_A,2025-01-01,mall,120
store_A,2025-01-02,mall,135
store_A,2025-01-03,mall,98
store_A,2025-01-04,mall,
store_A,2025-01-05,mall,
store_B,2025-01-01,street,60
store_B,2025-01-02,street,72
store_B,2025-01-03,street,55
store_B,2025-01-04,street,
store_B,2025-01-05,street,Request (multi-series)
curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
-H "Authorization: Bearer tabh2o_live_..." \
-F file=@multi_sales.csv \
-F target_column=sales \
-F time_column=date \
-F task=regressionResponse
{
"predictions": [142.518734, 138.042591, 68.317205, 61.703448],
"confidence_intervals": [[125.003217, 160.034251], [120.508312, 155.576870], [52.011483, 84.622927], [45.400219, 78.006677]],
"metadata": {
"task": "regression",
"model": "tabh2o_v1_20260408",
"train_rows": 6,
"test_rows": 4,
"columns": 4,
"time_ms": 520
}
}Predictions are returned in the same order as the test rows in the CSV. The item_id column can contain any string or integer values.
Clustering Paid plan
Clustering is available on paid plans. Contact sales to get access.
Group similar rows without a target column. No target_column is needed. Supports two algorithms: kmeans (default, requires n_clusters) and dbscan (density-based, determines cluster count automatically).
Example data
age,income,spending_score
19,15000,39
21,15000,81
20,16000,6
23,16000,77
31,17000,40
22,17000,76
35,18000,6
23,18000,94
64,19000,3
30,19000,72Request
curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
-H "Authorization: Bearer tabh2o_live_..." \
-F file=@customers.csv \
-F task=clustering \
-F n_clusters=3Response
{
"predictions": [1, 0, 2, 0, 1, 0, 2, 0, 2, 1],
"metadata": {
"task": "clustering",
"model": "tabh2o_v1_20260408",
"algorithm": "kmeans",
"train_rows": 10,
"test_rows": 10,
"columns": 3,
"n_clusters": 3,
"time_ms": 520
}
}Each value in predictions is a cluster ID (integer starting from 0).
Imputation Paid plan
Imputation is available on paid plans. Contact sales to get access.
Fill in missing values in your dataset. No target_column needed. The response includes the complete dataset, the column names, and a boolean mask showing which values were imputed.
Example data
Cells with missing values will be filled:
age,income,satisfaction,city
25,50000,8,NYC
30,,7,LA
,45000,9,NYC
28,55000,,LA
35,70000,6,
22,40000,8,NYCRequest
curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
-H "Authorization: Bearer tabh2o_live_..." \
-F file=@survey.csv \
-F task=imputationResponse
{
"imputed_data": [
[25, 50000, 8, "NYC"],
[30, 51823.417209, 7, "LA"],
[26.831405, 45000, 9, "NYC"],
[28, 55000, 7.214038, "LA"],
[35, 70000, 6, "NYC"],
[22, 40000, 8, "NYC"]
],
"imputed_columns": ["age", "income", "satisfaction", "city"],
"imputed_mask": [
[false, false, false, false],
[false, true, false, false],
[true, false, false, false],
[false, false, true, false],
[false, false, false, true],
[false, false, false, false]
],
"metadata": {
"task": "imputation",
"model": "tabh2o_v1_20260408",
"train_rows": 2,
"test_rows": 6,
"columns": 4,
"columns_imputed": 4,
"time_ms": 890
}
}The imputed_mask is true where a value was filled in, so you can see exactly what changed. Works with both numeric and categorical columns.
Agentic use
TabH2O works as a tool for AI agents. Your agent can call the API directly whenever it needs tabular predictions.
How it works
- Agent identifies the task: classification or regression (clustering and imputation available on paid plans)
- Formats training and test data into the API schema
- Calls
POST /api/v1/predict, gets predictions back - Uses the results in its workflow
Skill file for agents
We provide a machine-readable SKILL.md following the Agent Skills standard. It has the full API schema, examples, error codes, and limits.
Installing the skill
Agent harnesses like pi and Claude Code load skills from a folder. Install project-level (one project) or globally (all projects):
mkdir -p .pi/skills/tabh2o-predict
curl -o .pi/skills/tabh2o-predict/SKILL.md \
https://tabh2o.h2oai.com/tabh2o-predict/SKILL.mdOnce installed, agents discover the skill automatically and can use it whenever they encounter tabular prediction tasks. You can also invoke it explicitly:
/skill:tabh2o-predictIntegration examples
You can also pass SKILL.md to your agent via system prompt or tool definitions.
As an OpenAI function tool
tools = [{
"type": "function",
"function": {
"name": "tabh2o_predict",
"description": "Predict on tabular data via TabH2O. "
"Send labeled training rows and unlabeled test rows.",
"parameters": {
"type": "object",
"properties": {
"train_data": {
"type": "array",
"description": "Labeled training rows (each row is an array of values)"
},
"train_columns": {
"type": "array",
"description": "Column names including target"
},
"test_data": {
"type": "array",
"description": "Unlabeled test rows to predict"
},
"test_columns": {
"type": "array",
"description": "Column names without target"
},
"target_column": {"type": "string"},
"task": {"type": "string", "enum": ["classification", "regression", "clustering", "imputation"]}, // clustering & imputation require paid plan
"time_column": {"type": "string", "description": "Time/date column name (enables timeseries forecasting)"},
"n_clusters": {"type": "integer", "description": "Number of clusters (required for kmeans)"},
"cluster_method": {"type": "string", "enum": ["kmeans", "dbscan"], "description": "Clustering algorithm (default: kmeans)"}
},
"required": ["train_data", "train_columns", "test_data",
"test_columns", "task"]
}
}
}]Tool handler
import requests
TABH2O_KEY = "tabh2o_live_..."
def handle_tabh2o_predict(args):
resp = requests.post(
"https://tabh2o.h2oai.com/api/v1/predict",
headers={"Authorization": f"Bearer {TABH2O_KEY}"},
json={
"train": {
"data": args["train_data"],
"columns": args["train_columns"],
},
"test": {
"data": args["test_data"],
"columns": args["test_columns"],
},
"target_column": args["target_column"],
"task": args["task"],
},
)
resp.raise_for_status()
return resp.json()In a system prompt
Or paste SKILL.md into the system prompt directly:
system_prompt = """You are a data analysis assistant.
When the user asks you to make predictions on tabular data,
use the TabH2O API. Here is the specification:
""" + open("SKILL.md").read()Tip: The skill file at https://tabh2o.h2oai.com/tabh2o-predict/SKILL.md stays current with the latest API spec. Point your agent there for automatic updates.
Plugins
TabH2O plugins bring predictions directly into your spreadsheet. No code, no CSV exports — select your data, pick a target column, and get predictions written back into the sheet.
How it works
- Enter your TabH2O API key
- Load a sheet and optionally select a range
- Pick the target column (rows with empty values become predictions)
- Choose classification or regression
- Click Predict — results are written back into the empty cells
Note: The plugins currently support classification and regression. Forecasting and other task types are coming soon.
Excel · Installation
The Excel Add-in runs as a side panel inside Excel Desktop and Excel Online. It reads your sheet data and writes predictions back.
Excel Online / Microsoft 365 Cloud
Open a workbook at excel.cloud.microsoft and follow the steps below.
- On the Home ribbon, click Add-ins, then Advanced... — or — Add-ins → More Add-ins (depending on your Microsoft account type)
- Choose Upload My Add-in (if you used More Add-ins: switch to the MY ADD-INS tab, open the Manage My Add-ins dropdown first)
- Select the downloaded
manifest.xmlfile - Back on the Home ribbon, click Open TabH2O (appears next to Add-ins)
Windows
- Create a folder (e.g.
C:\AddinManifests), right-click it → Properties → Sharing → Share, and note the network path (e.g.\\YourPC\AddinManifests) - Place the downloaded
manifest.xmlin that shared folder - In Excel, go to File → Options → Trust Center → Trust Center Settings → Trusted Add-in Catalogs
- Enter the network path in Catalog Url, click Add catalog, and check Show in Menu
- Click OK to close both dialogs, then restart Excel
- On the Home ribbon, click Add-ins → More Add-ins, switch to the SHARED FOLDER tab, select TabH2O, and click Add
- Back on the Home ribbon, click Open TabH2O (appears next to Add-ins)
macOS
- Make sure Excel is not running:
pkill -9 "Microsoft Excel"in Terminal - If it doesn't already exist, create Excel's sideload folder:
~/Library/Containers/com.microsoft.Excel/Data/Documents/wef/ - Copy the downloaded
manifest.xmlinto that folder - In Excel, on the Home menu click Add-ins → My Add-ins, switch to the Developer Add-ins tab, select TabH2O, and click Add
manifest.xml
Google Sheets · Installation
The Google Sheets extension adds a TabH2O menu to your spreadsheet with the same side panel experience as the Excel Add-in.
- Open your Google Sheet and go to Extensions → Apps Script
- In the left sidebar, click the + next to Services, select Google Sheets API (leave the identifier as
Sheets), and click Add - Replace the contents of
Code.gswith the script below - Save the project, reload the spreadsheet, and open TabH2O → Open from the menu (grant access when prompted on first run)
Code.gs
Data selection
- If no range is selected (or a trivial single row/column), the entire sheet is used
- Header row (row 1) is always used as column names — empty headers are skipped
- Columns with no data in the selected rows are excluded
- Rows where all selected columns are empty are skipped
How predictions work
- Rows with a filled target column → training data
- Rows with an empty target column → predictions are made for these
- You need at least two labeled rows and at least one unlabeled row
- Predictions are written back in purple bold
Troubleshooting
No predictable target found
Every column is fully filled — there are no empty cells to predict. Leave the target cells blank for the rows you want predicted.
No empty target cells to predict
The chosen target column has no blank cells. Clear the cells you want the model to predict.
Prediction failed with a network error
The browser or your network is blocking the request to the TabH2O API. This is typically caused by a corporate firewall or proxy. Allow access to tabh2o.h2oai.com or consult your network administrator.
Sheet load failed: [..] PERMISSION_DENIED
Usually caused by being logged into multiple Google accounts. Sign out of all accounts and log in with only the account that owns the spreadsheet. It can also mean the script hasn't been granted access to the spreadsheet yet — go to Extensions → Apps Script, click Run on any function to retrigger the authorization dialog. If that doesn't help, remove the script's access under Google Account → Security → Third-party apps and reauthorize.
Error codes
| Status | Error | Description |
|---|---|---|
| 401 | invalid_api_key | Missing, invalid, or revoked API key |
| 403 | task_not_available | Task not available on your plan (e.g. clustering on free tier) |
| 422 | validation_error | Invalid request body, data too large, or bad format |
| 429 | rate_limit_exceeded | Too many requests per minute. Slow down. |
| 429 | quota_exceeded | Daily or monthly quota reached. Contact sales. |
| 503 | service_unavailable | Inference backend is temporarily down |
| 504 | timeout | Inference timed out. Try a smaller dataset. |
Rate limit headers
Every API response includes these headers:
X-RateLimit-Limit: 2
X-RateLimit-Remaining: 1
X-Quota-Limit-Day: 20
X-Quota-Remaining-Day: 42
X-Quota-Limit-Month: 500
X-Quota-Remaining-Month: 387When the per-minute rate limit kicks in, the 429 response includes a standard Retry-Afterheader with the number of seconds until the next window opens. Sleep at least that long before retrying — don't roll your own exponential backoff, and don't look for retry_afterin the JSON body (it isn't there).
HTTP/1.1 429 Too Many Requests
Retry-After: 25
X-RateLimit-Limit: 2
X-RateLimit-Remaining: 0
{"error": "rate_limit_exceeded", "message": "Too many requests. Please slow down."}Free tier limits
| Limit | Value |
|---|---|
| Requests per minute | 2 |
| Available tasks | Classification, Regression |
| Requests per day | 20 |
| Requests per month | 500 |
| Max rows per request | 100,000 |
| Max columns | 100 |
| Max payload | 50 MB |
Need higher limits? Contact sales →
Privacy & anonymization
Your data is never stored. Each request is processed in memory and discarded immediately after the response is returned. Nothing is logged, cached, or used for model training.
If your data is sensitive, you can fully anonymize it before sending — with zero impact on prediction quality:
- Column names are arbitrary. Rename columns to
c1,c2, etc. The only column name that matters istarget_column, which just needs to match the header. - Categorical values can be mapped to integers. Replace
"red"→0,"blue"→1, etc. TabH2O is a foundation model — it learns from statistical patterns, not label semantics. - Numeric values need no changes. Keep them as-is for best results.
Example: A dataset with columns customer_name, city, revenue, churned can be sent as c1, c2, c3, c4 with city names mapped to integers. Predictions will be identical.
Output usage
There are no restrictions on how you use the predictions returned by the API. Commercial use is permitted. Automated calls from pipelines, agents, and production systems are permitted (within the rate limits of your tier). You own your outputs.