API Documentation

Overview

TabH2O is a foundation model for tabular prediction. Send labeled training data and unlabeled test data, get back predictions. Supports classification, regression (including timeseries forecasting), clustering, and imputation. Clustering and imputation are available on paid plans — contact sales for access.

Quickstart

Get your first prediction in under a minute.

Authentication

Sign in with your LinkedIn or Google account
Go to your dashboard to create an API key
Include the key in every request as a Bearer token:

Header

Authorization: Bearer tabh2o_live_...

Base URL

https: //tabh2o.h2oai.com/api/v1

Your first prediction

Send a classification request with a small inline dataset:

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "train": {
      "data": [
        [25, 50000, 1, "Yes"],
        [30, 60000, 3, "No"],
        [22, 45000, 0, "Yes"],
        [35, 70000, 8, "No"],
        [28, 52000, 2, "Yes"],
        [40, 85000, 10, "No"]
      ],
      "columns": ["age", "income", "experience", "purchased"]
    },
    "test": {
      "data": [[27, 53000, 2], [38, 78000, 7]],
      "columns": ["age", "income", "experience"]
    },
    "target_column": "purchased",
    "task": "classification"
  }'

Response:

json

{
  "predictions": ["Yes", "No"],
  "probabilities": [[0.051431, 0.948569], [0.926749, 0.073251]],
  "metadata": {
    "task": "classification",
    "model": "tabh2o_v1_20260408",
    "train_rows": 6,
    "test_rows": 2,
    "columns": 4,
    "time_ms": 245
  }
}

Making predictions

All predictions go through a single endpoint. You choose the task type and input format.

http

POST https://tabh2o.h2oai.com/api/v1/predict

Task types

Task	Description	Required fields
classification	Predict categorical labels	target_column
regression	Predict continuous values (add `time_column` for timeseries)	target_column
clustering	Group similar rows (unsupervised) Paid	none
imputation	Fill missing values (unsupervised) Paid	none

JSON request body

Send data inline as JSON arrays. Set Content-Type: application/json.

Field	Type	Description
train.data	array of arrays	Labeled rows including target column
train.columns	array of strings	Column names (including target)
test.data	array of arrays	Unlabeled rows to predict
test.columns	array of strings	Column names (without target)
task	string	One of: classification, regression, clustering, imputation (clustering & imputation require paid plan)
target_column	string	Name of the target column. Required for classification and regression.
time_column	string	Name of the time/date column. Enables timeseries forecasting (regression only). Accepts any format that pandas can parse with `pd.to_datetime()`.
n_clusters	integer	Number of clusters (2-1000). Required for kmeans, ignored for dbscan.
cluster_method	string	Clustering algorithm: `kmeans` (default) or `dbscan`.

CSV file upload

Instead of JSON, send a CSV file via multipart/form-data. Same endpoint, same auth. Rows where the target column is empty or null are treated as test rows; the rest are training rows.

Field	Type	Description
file	file	CSV file with header row. Include all columns including target. Leave target empty for rows you want predicted.
task	string	One of: classification, regression, clustering, imputation (clustering & imputation require paid plan)
target_column	string	Name of the target column. Required for classification and regression.
time_column	string	Name of the time/date column. Enables timeseries forecasting (regression only). Accepts any format that pandas can parse with `pd.to_datetime()`.
n_clusters	integer	Number of clusters (2-1000). Required for kmeans, ignored for dbscan.
cluster_method	string	Clustering algorithm: `kmeans` (default) or `dbscan`.

Classification

Predict categorical labels. Provide labeled training rows and unlabeled test rows. Requires target_column and task: "classification".

Example data

Rows with an empty target are predicted:

data.csv

age,income,experience,purchased
25,50000,1,Yes
30,60000,3,No
22,45000,0,Yes
35,70000,8,No
28,55000,2,
38,75000,7,

Request

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@data.csv \
  -F target_column=purchased \
  -F task=classification

Response

json

{
  "predictions": ["No", "No"],
  "probabilities": [[0.62023, 0.37977], [0.847015, 0.152985]],
  "metadata": {
    "task": "classification",
    "model": "tabh2o_v1_20260408",
    "train_rows": 4,
    "test_rows": 2,
    "columns": 4,
    "time_ms": 594
  }
}

probabilities contains per-class confidence scores for each test row. All floats are rounded to 6 decimal places.

Regression

Predict continuous numeric values. Same structure as classification but with task: "regression".

Example data

houses.csv

sqft,bedrooms,garage,price
1200,2,0,250000
1800,3,1,350000
2400,4,1,500000
900,1,0,180000
1500,3,1,310000
1600,3,1,
2000,4,0,

Request

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@houses.csv \
  -F target_column=price \
  -F task=regression

Response

json

{
  "predictions": [328540.891023, 421730.336712],
  "confidence_intervals": [[295012.5, 362069.282045], [389103.170489, 454357.502935]],
  "metadata": {
    "task": "regression",
    "model": "tabh2o_v1_20260408",
    "train_rows": 5,
    "test_rows": 2,
    "columns": 4,
    "time_ms": 312
  }
}

confidence_intervals contains [lower, upper] bounds for each prediction. All floats are rounded to 6 decimal places.

Timeseries forecasting

Forecast future values by providing historical data with a time column. Uses task: "regression" with an additional time_column field. Train rows contain known values; test rows contain future dates to predict.

Example data

sales.csv

date,store,sales
2025-01-01,A,120
2025-01-02,A,135
2025-01-03,A,98
2025-01-04,A,142
2025-01-05,A,155
2025-01-06,A,
2025-01-07,A,

Request

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@sales.csv \
  -F target_column=sales \
  -F time_column=date \
  -F task=regression

Response

json

{
  "predictions": [161.234018, 148.709452],
  "confidence_intervals": [[142.017839, 180.450197], [128.503211, 168.915693]],
  "metadata": {
    "task": "regression",
    "model": "tabh2o_v1_20260408",
    "train_rows": 5,
    "test_rows": 2,
    "columns": 3,
    "time_ms": 380
  }
}

Multi-series forecasting

To forecast multiple time series at once (e.g., sales per store, metrics per device), add an item_id column that identifies each individual series. The model will learn from all series jointly and return predictions grouped by item_id. For single time series data (no item_id column), the API treats the entire dataset as one series — no changes needed.

Example data (multi-series)

multi_sales.csv

item_id,date,store_type,sales
store_A,2025-01-01,mall,120
store_A,2025-01-02,mall,135
store_A,2025-01-03,mall,98
store_A,2025-01-04,mall,
store_A,2025-01-05,mall,
store_B,2025-01-01,street,60
store_B,2025-01-02,street,72
store_B,2025-01-03,street,55
store_B,2025-01-04,street,
store_B,2025-01-05,street,

Request (multi-series)

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@multi_sales.csv \
  -F target_column=sales \
  -F time_column=date \
  -F task=regression

Response

json

{
  "predictions": [142.518734, 138.042591, 68.317205, 61.703448],
  "confidence_intervals": [[125.003217, 160.034251], [120.508312, 155.576870], [52.011483, 84.622927], [45.400219, 78.006677]],
  "metadata": {
    "task": "regression",
    "model": "tabh2o_v1_20260408",
    "train_rows": 6,
    "test_rows": 4,
    "columns": 4,
    "time_ms": 520
  }
}

Predictions are returned in the same order as the test rows in the CSV. The item_id column can contain any string or integer values.

Clustering Paid plan

Clustering is available on paid plans. Contact sales to get access.

Group similar rows without a target column. No target_column is needed. Supports two algorithms: kmeans (default, requires n_clusters) and dbscan (density-based, determines cluster count automatically).

Example data

customers.csv

age,income,spending_score
19,15000,39
21,15000,81
20,16000,6
23,16000,77
31,17000,40
22,17000,76
35,18000,6
23,18000,94
64,19000,3
30,19000,72

Request

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@customers.csv \
  -F task=clustering \
  -F n_clusters=3

Response

json

{
  "predictions": [1, 0, 2, 0, 1, 0, 2, 0, 2, 1],
  "metadata": {
    "task": "clustering",
    "model": "tabh2o_v1_20260408",
    "algorithm": "kmeans",
    "train_rows": 10,
    "test_rows": 10,
    "columns": 3,
    "n_clusters": 3,
    "time_ms": 520
  }
}

Each value in predictions is a cluster ID (integer starting from 0).

Imputation Paid plan

Imputation is available on paid plans. Contact sales to get access.

Fill in missing values in your dataset. No target_column needed. The response includes the complete dataset, the column names, and a boolean mask showing which values were imputed.

Example data

Cells with missing values will be filled:

survey.csv

age,income,satisfaction,city
25,50000,8,NYC
30,,7,LA
,45000,9,NYC
28,55000,,LA
35,70000,6,
22,40000,8,NYC

Request

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@survey.csv \
  -F task=imputation

Response

json

{
  "imputed_data": [
    [25, 50000, 8, "NYC"],
    [30, 51823.417209, 7, "LA"],
    [26.831405, 45000, 9, "NYC"],
    [28, 55000, 7.214038, "LA"],
    [35, 70000, 6, "NYC"],
    [22, 40000, 8, "NYC"]
  ],
  "imputed_columns": ["age", "income", "satisfaction", "city"],
  "imputed_mask": [
    [false, false, false, false],
    [false, true, false, false],
    [true, false, false, false],
    [false, false, true, false],
    [false, false, false, true],
    [false, false, false, false]
  ],
  "metadata": {
    "task": "imputation",
    "model": "tabh2o_v1_20260408",
    "train_rows": 2,
    "test_rows": 6,
    "columns": 4,
    "columns_imputed": 4,
    "time_ms": 890
  }
}

The imputed_mask is true where a value was filled in, so you can see exactly what changed. Works with both numeric and categorical columns.

Agentic use

TabH2O works as a tool for AI agents. Your agent can call the API directly whenever it needs tabular predictions.

How it works

Agent identifies the task: classification or regression (clustering and imputation available on paid plans)
Formats training and test data into the API schema
Calls POST /api/v1/predict, gets predictions back
Uses the results in its workflow

Skill file for agents

We provide a machine-readable SKILL.md following the Agent Skills standard. It has the full API schema, examples, error codes, and limits.

Download SKILL.md View raw →

Installing the skill

Agent harnesses like pi and Claude Code load skills from a folder. Install project-level (one project) or globally (all projects):

mkdir -p .pi/skills/tabh2o-predict
curl -o .pi/skills/tabh2o-predict/SKILL.md \
  https://tabh2o.h2oai.com/tabh2o-predict/SKILL.md

Once installed, agents discover the skill automatically and can use it whenever they encounter tabular prediction tasks. You can also invoke it explicitly:

/skill:tabh2o-predict

Integration examples

You can also pass SKILL.md to your agent via system prompt or tool definitions.

As an OpenAI function tool

python

tools = [{
    "type": "function",
    "function": {
        "name": "tabh2o_predict",
        "description": "Predict on tabular data via TabH2O. "
            "Send labeled training rows and unlabeled test rows.",
        "parameters": {
            "type": "object",
            "properties": {
                "train_data": {
                    "type": "array",
                    "description": "Labeled training rows (each row is an array of values)"
                },
                "train_columns": {
                    "type": "array",
                    "description": "Column names including target"
                },
                "test_data": {
                    "type": "array",
                    "description": "Unlabeled test rows to predict"
                },
                "test_columns": {
                    "type": "array",
                    "description": "Column names without target"
                },
                "target_column": {"type": "string"},
                "task": {"type": "string", "enum": ["classification", "regression", "clustering", "imputation"]},  // clustering & imputation require paid plan
                "time_column": {"type": "string", "description": "Time/date column name (enables timeseries forecasting)"},
                "n_clusters": {"type": "integer", "description": "Number of clusters (required for kmeans)"},
                "cluster_method": {"type": "string", "enum": ["kmeans", "dbscan"], "description": "Clustering algorithm (default: kmeans)"}
            },
            "required": ["train_data", "train_columns", "test_data",
                         "test_columns", "task"]
        }
    }
}]

Tool handler

python

import requests

TABH2O_KEY = "tabh2o_live_..."

def handle_tabh2o_predict(args):
    resp = requests.post(
        "https://tabh2o.h2oai.com/api/v1/predict",
        headers={"Authorization": f"Bearer {TABH2O_KEY}"},
        json={
            "train": {
                "data": args["train_data"],
                "columns": args["train_columns"],
            },
            "test": {
                "data": args["test_data"],
                "columns": args["test_columns"],
            },
            "target_column": args["target_column"],
            "task": args["task"],
        },
    )
    resp.raise_for_status()
    return resp.json()

In a system prompt

Or paste SKILL.md into the system prompt directly:

python

system_prompt = """You are a data analysis assistant.

When the user asks you to make predictions on tabular data,
use the TabH2O API. Here is the specification:

""" + open("SKILL.md").read()

Tip: The skill file at https://tabh2o.h2oai.com/tabh2o-predict/SKILL.md stays current with the latest API spec. Point your agent there for automatic updates.

Plugins

TabH2O plugins bring predictions directly into your spreadsheet. No code, no CSV exports — select your data, pick a target column, and get predictions written back into the sheet.

How it works

Enter your TabH2O API key
Load a sheet and optionally select a range
Pick the target column (rows with empty values become predictions)
Choose classification or regression
Click Predict — results are written back into the empty cells

Note: The plugins currently support classification and regression. Forecasting and other task types are coming soon.

Excel · Installation

The Excel Add-in runs as a side panel inside Excel Desktop and Excel Online. It reads your sheet data and writes predictions back.

Excel Online / Microsoft 365 Cloud

Open a workbook at excel.cloud.microsoft and follow the steps below.

On the Home ribbon, click Add-ins, then Advanced... — or — Add-ins → More Add-ins (depending on your Microsoft account type)
Choose Upload My Add-in (if you used More Add-ins: switch to the MY ADD-INS tab, open the Manage My Add-ins dropdown first)
Select the downloaded manifest.xml file
Back on the Home ribbon, click Open TabH2O (appears next to Add-ins)

Windows

Create a folder (e.g. C:\AddinManifests), right-click it → Properties → Sharing → Share, and note the network path (e.g. \\YourPC\AddinManifests)
Place the downloaded manifest.xml in that shared folder
In Excel, go to File → Options → Trust Center → Trust Center Settings → Trusted Add-in Catalogs
Enter the network path in Catalog Url, click Add catalog, and check Show in Menu
Click OK to close both dialogs, then restart Excel
On the Home ribbon, click Add-ins → More Add-ins, switch to the SHARED FOLDER tab, select TabH2O, and click Add
Back on the Home ribbon, click Open TabH2O (appears next to Add-ins)

macOS

Make sure Excel is not running: pkill -9 "Microsoft Excel" in Terminal
If it doesn't already exist, create Excel's sideload folder: ~/Library/Containers/com.microsoft.Excel/Data/Documents/wef/
Copy the downloaded manifest.xml into that folder
In Excel, on the Home menu click Add-ins → My Add-ins, switch to the Developer Add-ins tab, select TabH2O, and click Add

manifest.xml

Download manifest.xml View raw →

Google Sheets · Installation

The Google Sheets extension adds a TabH2O menu to your spreadsheet with the same side panel experience as the Excel Add-in.

Open your Google Sheet and go to Extensions → Apps Script
In the left sidebar, click the + next to Services, select Google Sheets API (leave the identifier as Sheets), and click Add
Replace the contents of Code.gs with the script below
Save the project, reload the spreadsheet, and open TabH2O → Open from the menu (grant access when prompted on first run)

Code.gs

Download Code.gs View raw →

Data selection

If no range is selected (or a trivial single row/column), the entire sheet is used
Header row (row 1) is always used as column names — empty headers are skipped
Columns with no data in the selected rows are excluded
Rows where all selected columns are empty are skipped

How predictions work

Rows with a filled target column → training data
Rows with an empty target column → predictions are made for these
You need at least two labeled rows and at least one unlabeled row
Predictions are written back in purple bold

Troubleshooting

No predictable target found

Every column is fully filled — there are no empty cells to predict. Leave the target cells blank for the rows you want predicted.

No empty target cells to predict

The chosen target column has no blank cells. Clear the cells you want the model to predict.

Prediction failed with a network error

The browser or your network is blocking the request to the TabH2O API. This is typically caused by a corporate firewall or proxy. Allow access to tabh2o.h2oai.com or consult your network administrator.

Sheet load failed: [..] PERMISSION_DENIED

Usually caused by being logged into multiple Google accounts. Sign out of all accounts and log in with only the account that owns the spreadsheet. It can also mean the script hasn't been granted access to the spreadsheet yet — go to Extensions → Apps Script, click Run on any function to retrigger the authorization dialog. If that doesn't help, remove the script's access under Google Account → Security → Third-party apps and reauthorize.

Error codes

Status	Error	Description
401	invalid_api_key	Missing, invalid, or revoked API key
403	task_not_available	Task not available on your plan (e.g. clustering on free tier)
422	validation_error	Invalid request body, data too large, or bad format
429	rate_limit_exceeded	Too many requests per minute. Slow down.
429	quota_exceeded	Daily or monthly quota reached. Contact sales.
503	service_unavailable	Inference backend is temporarily down
504	timeout	Inference timed out. Try a smaller dataset.

Rate limit headers

Every API response includes these headers:

http

X-RateLimit-Limit: 2
X-RateLimit-Remaining: 1
X-Quota-Limit-Day: 20
X-Quota-Remaining-Day: 42
X-Quota-Limit-Month: 500
X-Quota-Remaining-Month: 387

When the per-minute rate limit kicks in, the 429 response includes a standard Retry-Afterheader with the number of seconds until the next window opens. Sleep at least that long before retrying — don't roll your own exponential backoff, and don't look for retry_afterin the JSON body (it isn't there).

http

HTTP/1.1 429 Too Many Requests
Retry-After: 25
X-RateLimit-Limit: 2
X-RateLimit-Remaining: 0

{"error": "rate_limit_exceeded", "message": "Too many requests. Please slow down."}

Free tier limits

Limit	Value
Requests per minute	2
Available tasks	Classification, Regression
Requests per day	20
Requests per month	500
Max rows per request	100,000
Max columns	100
Max payload	50 MB

Need higher limits? Contact sales →

Privacy & anonymization

Your data is never stored. Each request is processed in memory and discarded immediately after the response is returned. Nothing is logged, cached, or used for model training.

If your data is sensitive, you can fully anonymize it before sending — with zero impact on prediction quality:

Column names are arbitrary. Rename columns to c1, c2, etc. The only column name that matters is target_column, which just needs to match the header.
Categorical values can be mapped to integers. Replace "red" → 0, "blue" → 1, etc. TabH2O is a foundation model — it learns from statistical patterns, not label semantics.
Numeric values need no changes. Keep them as-is for best results.

Example: A dataset with columns customer_name, city, revenue, churned can be sent as c1, c2, c3, c4 with city names mapped to integers. Predictions will be identical.

Output usage

There are no restrictions on how you use the predictions returned by the API. Commercial use is permitted. Automated calls from pipelines, agents, and production systems are permitted (within the rate limits of your tier). You own your outputs.