H2O.aiTabH2O

API Documentation

Overview

TabH2O is a foundation model for tabular prediction. Send labeled training data and unlabeled test data, get back predictions. Supports classification, regression (including timeseries forecasting), clustering, and imputation. Clustering and imputation are available on paid plans — contact sales for access.

Quickstart

Get your first prediction in under a minute.

Authentication

  1. Sign in with your LinkedIn or Google account
  2. Go to your dashboard to create an API key
  3. Include the key in every request as a Bearer token:
Header
Authorization: Bearer tabh2o_live_...

Base URL

Base URL
https: //tabh2o.h2oai.com/api/v1

Your first prediction

Send a classification request with a small inline dataset:

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "train": {
      "data": [
        [25, 50000, 1, "Yes"],
        [30, 60000, 3, "No"],
        [22, 45000, 0, "Yes"],
        [35, 70000, 8, "No"],
        [28, 52000, 2, "Yes"],
        [40, 85000, 10, "No"]
      ],
      "columns": ["age", "income", "experience", "purchased"]
    },
    "test": {
      "data": [[27, 53000, 2], [38, 78000, 7]],
      "columns": ["age", "income", "experience"]
    },
    "target_column": "purchased",
    "task": "classification"
  }'

Response:

json
{
  "predictions": ["Yes", "No"],
  "probabilities": [[0.051431, 0.948569], [0.926749, 0.073251]],
  "metadata": {
    "task": "classification",
    "model": "tabh2o_v1_20260408",
    "train_rows": 6,
    "test_rows": 2,
    "columns": 4,
    "time_ms": 245
  }
}

Making predictions

All predictions go through a single endpoint. You choose the task type and input format.

http
POST https://tabh2o.h2oai.com/api/v1/predict

Task types

TaskDescriptionRequired fields
classificationPredict categorical labelstarget_column
regressionPredict continuous values (add time_column for timeseries)target_column
clusteringGroup similar rows (unsupervised) Paidnone
imputationFill missing values (unsupervised) Paidnone

JSON request body

Send data inline as JSON arrays. Set Content-Type: application/json.

FieldTypeDescription
train.dataarray of arraysLabeled rows including target column
train.columnsarray of stringsColumn names (including target)
test.dataarray of arraysUnlabeled rows to predict
test.columnsarray of stringsColumn names (without target)
taskstringOne of: classification, regression, clustering, imputation (clustering & imputation require paid plan)
target_columnstringName of the target column. Required for classification and regression.
time_columnstringName of the time/date column. Enables timeseries forecasting (regression only). Accepts any format that pandas can parse with pd.to_datetime().
n_clustersintegerNumber of clusters (2-1000). Required for kmeans, ignored for dbscan.
cluster_methodstringClustering algorithm: kmeans (default) or dbscan.

CSV file upload

Instead of JSON, send a CSV file via multipart/form-data. Same endpoint, same auth. Rows where the target column is empty or null are treated as test rows; the rest are training rows.

FieldTypeDescription
filefileCSV file with header row. Include all columns including target. Leave target empty for rows you want predicted.
taskstringOne of: classification, regression, clustering, imputation (clustering & imputation require paid plan)
target_columnstringName of the target column. Required for classification and regression.
time_columnstringName of the time/date column. Enables timeseries forecasting (regression only). Accepts any format that pandas can parse with pd.to_datetime().
n_clustersintegerNumber of clusters (2-1000). Required for kmeans, ignored for dbscan.
cluster_methodstringClustering algorithm: kmeans (default) or dbscan.

Classification

Predict categorical labels. Provide labeled training rows and unlabeled test rows. Requires target_column and task: "classification".

Example data

Rows with an empty target are predicted:

data.csv
age,income,experience,purchased
25,50000,1,Yes
30,60000,3,No
22,45000,0,Yes
35,70000,8,No
28,55000,2,
38,75000,7,

Request

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@data.csv \
  -F target_column=purchased \
  -F task=classification

Response

json
{
  "predictions": ["No", "No"],
  "probabilities": [[0.62023, 0.37977], [0.847015, 0.152985]],
  "metadata": {
    "task": "classification",
    "model": "tabh2o_v1_20260408",
    "train_rows": 4,
    "test_rows": 2,
    "columns": 4,
    "time_ms": 594
  }
}

probabilities contains per-class confidence scores for each test row. All floats are rounded to 6 decimal places.

Regression

Predict continuous numeric values. Same structure as classification but with task: "regression".

Example data

houses.csv
sqft,bedrooms,garage,price
1200,2,0,250000
1800,3,1,350000
2400,4,1,500000
900,1,0,180000
1500,3,1,310000
1600,3,1,
2000,4,0,

Request

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@houses.csv \
  -F target_column=price \
  -F task=regression

Response

json
{
  "predictions": [328540.891023, 421730.336712],
  "confidence_intervals": [[295012.5, 362069.282045], [389103.170489, 454357.502935]],
  "metadata": {
    "task": "regression",
    "model": "tabh2o_v1_20260408",
    "train_rows": 5,
    "test_rows": 2,
    "columns": 4,
    "time_ms": 312
  }
}

confidence_intervals contains [lower, upper] bounds for each prediction. All floats are rounded to 6 decimal places.

Timeseries forecasting

Forecast future values by providing historical data with a time column. Uses task: "regression" with an additional time_column field. Train rows contain known values; test rows contain future dates to predict.

Example data

sales.csv
date,store,sales
2025-01-01,A,120
2025-01-02,A,135
2025-01-03,A,98
2025-01-04,A,142
2025-01-05,A,155
2025-01-06,A,
2025-01-07,A,

Request

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@sales.csv \
  -F target_column=sales \
  -F time_column=date \
  -F task=regression

Response

json
{
  "predictions": [161.234018, 148.709452],
  "confidence_intervals": [[142.017839, 180.450197], [128.503211, 168.915693]],
  "metadata": {
    "task": "regression",
    "model": "tabh2o_v1_20260408",
    "train_rows": 5,
    "test_rows": 2,
    "columns": 3,
    "time_ms": 380
  }
}

Multi-series forecasting

To forecast multiple time series at once (e.g., sales per store, metrics per device), add an item_id column that identifies each individual series. The model will learn from all series jointly and return predictions grouped by item_id. For single time series data (no item_id column), the API treats the entire dataset as one series — no changes needed.

Example data (multi-series)

multi_sales.csv
item_id,date,store_type,sales
store_A,2025-01-01,mall,120
store_A,2025-01-02,mall,135
store_A,2025-01-03,mall,98
store_A,2025-01-04,mall,
store_A,2025-01-05,mall,
store_B,2025-01-01,street,60
store_B,2025-01-02,street,72
store_B,2025-01-03,street,55
store_B,2025-01-04,street,
store_B,2025-01-05,street,

Request (multi-series)

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@multi_sales.csv \
  -F target_column=sales \
  -F time_column=date \
  -F task=regression

Response

json
{
  "predictions": [142.518734, 138.042591, 68.317205, 61.703448],
  "confidence_intervals": [[125.003217, 160.034251], [120.508312, 155.576870], [52.011483, 84.622927], [45.400219, 78.006677]],
  "metadata": {
    "task": "regression",
    "model": "tabh2o_v1_20260408",
    "train_rows": 6,
    "test_rows": 4,
    "columns": 4,
    "time_ms": 520
  }
}

Predictions are returned in the same order as the test rows in the CSV. The item_id column can contain any string or integer values.

Clustering Paid plan

Clustering is available on paid plans. Contact sales to get access.

Group similar rows without a target column. No target_column is needed. Supports two algorithms: kmeans (default, requires n_clusters) and dbscan (density-based, determines cluster count automatically).

Example data

customers.csv
age,income,spending_score
19,15000,39
21,15000,81
20,16000,6
23,16000,77
31,17000,40
22,17000,76
35,18000,6
23,18000,94
64,19000,3
30,19000,72

Request

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@customers.csv \
  -F task=clustering \
  -F n_clusters=3

Response

json
{
  "predictions": [1, 0, 2, 0, 1, 0, 2, 0, 2, 1],
  "metadata": {
    "task": "clustering",
    "model": "tabh2o_v1_20260408",
    "algorithm": "kmeans",
    "train_rows": 10,
    "test_rows": 10,
    "columns": 3,
    "n_clusters": 3,
    "time_ms": 520
  }
}

Each value in predictions is a cluster ID (integer starting from 0).

Imputation Paid plan

Imputation is available on paid plans. Contact sales to get access.

Fill in missing values in your dataset. No target_column needed. The response includes the complete dataset, the column names, and a boolean mask showing which values were imputed.

Example data

Cells with missing values will be filled:

survey.csv
age,income,satisfaction,city
25,50000,8,NYC
30,,7,LA
,45000,9,NYC
28,55000,,LA
35,70000,6,
22,40000,8,NYC

Request

curl -X POST https://tabh2o.h2oai.com/api/v1/predict \
  -H "Authorization: Bearer tabh2o_live_..." \
  -F file=@survey.csv \
  -F task=imputation

Response

json
{
  "imputed_data": [
    [25, 50000, 8, "NYC"],
    [30, 51823.417209, 7, "LA"],
    [26.831405, 45000, 9, "NYC"],
    [28, 55000, 7.214038, "LA"],
    [35, 70000, 6, "NYC"],
    [22, 40000, 8, "NYC"]
  ],
  "imputed_columns": ["age", "income", "satisfaction", "city"],
  "imputed_mask": [
    [false, false, false, false],
    [false, true, false, false],
    [true, false, false, false],
    [false, false, true, false],
    [false, false, false, true],
    [false, false, false, false]
  ],
  "metadata": {
    "task": "imputation",
    "model": "tabh2o_v1_20260408",
    "train_rows": 2,
    "test_rows": 6,
    "columns": 4,
    "columns_imputed": 4,
    "time_ms": 890
  }
}

The imputed_mask is true where a value was filled in, so you can see exactly what changed. Works with both numeric and categorical columns.

Agentic use

TabH2O works as a tool for AI agents. Your agent can call the API directly whenever it needs tabular predictions.

How it works

  1. Agent identifies the task: classification or regression (clustering and imputation available on paid plans)
  2. Formats training and test data into the API schema
  3. Calls POST /api/v1/predict, gets predictions back
  4. Uses the results in its workflow

Skill file for agents

We provide a machine-readable SKILL.md following the Agent Skills standard. It has the full API schema, examples, error codes, and limits.

Installing the skill

Agent harnesses like pi and Claude Code load skills from a folder. Install project-level (one project) or globally (all projects):

mkdir -p .pi/skills/tabh2o-predict
curl -o .pi/skills/tabh2o-predict/SKILL.md \
  https://tabh2o.h2oai.com/tabh2o-predict/SKILL.md

Once installed, agents discover the skill automatically and can use it whenever they encounter tabular prediction tasks. You can also invoke it explicitly:

/skill:tabh2o-predict

Integration examples

You can also pass SKILL.md to your agent via system prompt or tool definitions.

As an OpenAI function tool

python
tools = [{
    "type": "function",
    "function": {
        "name": "tabh2o_predict",
        "description": "Predict on tabular data via TabH2O. "
            "Send labeled training rows and unlabeled test rows.",
        "parameters": {
            "type": "object",
            "properties": {
                "train_data": {
                    "type": "array",
                    "description": "Labeled training rows (each row is an array of values)"
                },
                "train_columns": {
                    "type": "array",
                    "description": "Column names including target"
                },
                "test_data": {
                    "type": "array",
                    "description": "Unlabeled test rows to predict"
                },
                "test_columns": {
                    "type": "array",
                    "description": "Column names without target"
                },
                "target_column": {"type": "string"},
                "task": {"type": "string", "enum": ["classification", "regression", "clustering", "imputation"]},  // clustering & imputation require paid plan
                "time_column": {"type": "string", "description": "Time/date column name (enables timeseries forecasting)"},
                "n_clusters": {"type": "integer", "description": "Number of clusters (required for kmeans)"},
                "cluster_method": {"type": "string", "enum": ["kmeans", "dbscan"], "description": "Clustering algorithm (default: kmeans)"}
            },
            "required": ["train_data", "train_columns", "test_data",
                         "test_columns", "task"]
        }
    }
}]

Tool handler

python
import requests

TABH2O_KEY = "tabh2o_live_..."

def handle_tabh2o_predict(args):
    resp = requests.post(
        "https://tabh2o.h2oai.com/api/v1/predict",
        headers={"Authorization": f"Bearer {TABH2O_KEY}"},
        json={
            "train": {
                "data": args["train_data"],
                "columns": args["train_columns"],
            },
            "test": {
                "data": args["test_data"],
                "columns": args["test_columns"],
            },
            "target_column": args["target_column"],
            "task": args["task"],
        },
    )
    resp.raise_for_status()
    return resp.json()

In a system prompt

Or paste SKILL.md into the system prompt directly:

python
system_prompt = """You are a data analysis assistant.

When the user asks you to make predictions on tabular data,
use the TabH2O API. Here is the specification:

""" + open("SKILL.md").read()

Tip: The skill file at https://tabh2o.h2oai.com/tabh2o-predict/SKILL.md stays current with the latest API spec. Point your agent there for automatic updates.

Plugins

TabH2O plugins bring predictions directly into your spreadsheet. No code, no CSV exports — select your data, pick a target column, and get predictions written back into the sheet.

How it works

  1. Enter your TabH2O API key
  2. Load a sheet and optionally select a range
  3. Pick the target column (rows with empty values become predictions)
  4. Choose classification or regression
  5. Click Predict — results are written back into the empty cells

Note: The plugins currently support classification and regression. Forecasting and other task types are coming soon.

Excel · Installation

The Excel Add-in runs as a side panel inside Excel Desktop and Excel Online. It reads your sheet data and writes predictions back.

Excel Online / Microsoft 365 Cloud

Open a workbook at excel.cloud.microsoft and follow the steps below.

  1. On the Home ribbon, click Add-ins, then Advanced... — or — Add-insMore Add-ins (depending on your Microsoft account type)
  2. Choose Upload My Add-in (if you used More Add-ins: switch to the MY ADD-INS tab, open the Manage My Add-ins dropdown first)
  3. Select the downloaded manifest.xml file
  4. Back on the Home ribbon, click Open TabH2O (appears next to Add-ins)

Windows

  1. Create a folder (e.g. C:\AddinManifests), right-click it → Properties → Sharing → Share, and note the network path (e.g. \\YourPC\AddinManifests)
  2. Place the downloaded manifest.xml in that shared folder
  3. In Excel, go to File → Options → Trust Center → Trust Center Settings → Trusted Add-in Catalogs
  4. Enter the network path in Catalog Url, click Add catalog, and check Show in Menu
  5. Click OK to close both dialogs, then restart Excel
  6. On the Home ribbon, click Add-ins → More Add-ins, switch to the SHARED FOLDER tab, select TabH2O, and click Add
  7. Back on the Home ribbon, click Open TabH2O (appears next to Add-ins)

macOS

  1. Make sure Excel is not running: pkill -9 "Microsoft Excel" in Terminal
  2. If it doesn't already exist, create Excel's sideload folder: ~/Library/Containers/com.microsoft.Excel/Data/Documents/wef/
  3. Copy the downloaded manifest.xml into that folder
  4. In Excel, on the Home menu click Add-ins → My Add-ins, switch to the Developer Add-ins tab, select TabH2O, and click Add

manifest.xml

Download manifest.xmlView raw →

Google Sheets · Installation

The Google Sheets extension adds a TabH2O menu to your spreadsheet with the same side panel experience as the Excel Add-in.

  1. Open your Google Sheet and go to Extensions → Apps Script
  2. In the left sidebar, click the + next to Services, select Google Sheets API (leave the identifier as Sheets), and click Add
  3. Replace the contents of Code.gs with the script below
  4. Save the project, reload the spreadsheet, and open TabH2O → Open from the menu (grant access when prompted on first run)

Code.gs

Download Code.gsView raw →

Data selection

  • If no range is selected (or a trivial single row/column), the entire sheet is used
  • Header row (row 1) is always used as column names — empty headers are skipped
  • Columns with no data in the selected rows are excluded
  • Rows where all selected columns are empty are skipped

How predictions work

  • Rows with a filled target column → training data
  • Rows with an empty target column → predictions are made for these
  • You need at least two labeled rows and at least one unlabeled row
  • Predictions are written back in purple bold

Troubleshooting

No predictable target found

Every column is fully filled — there are no empty cells to predict. Leave the target cells blank for the rows you want predicted.

No empty target cells to predict

The chosen target column has no blank cells. Clear the cells you want the model to predict.

Prediction failed with a network error

The browser or your network is blocking the request to the TabH2O API. This is typically caused by a corporate firewall or proxy. Allow access to tabh2o.h2oai.com or consult your network administrator.

Sheet load failed: [..] PERMISSION_DENIED

Usually caused by being logged into multiple Google accounts. Sign out of all accounts and log in with only the account that owns the spreadsheet. It can also mean the script hasn't been granted access to the spreadsheet yet — go to Extensions → Apps Script, click Run on any function to retrigger the authorization dialog. If that doesn't help, remove the script's access under Google Account → Security → Third-party apps and reauthorize.

Error codes

StatusErrorDescription
401invalid_api_keyMissing, invalid, or revoked API key
403task_not_availableTask not available on your plan (e.g. clustering on free tier)
422validation_errorInvalid request body, data too large, or bad format
429rate_limit_exceededToo many requests per minute. Slow down.
429quota_exceededDaily or monthly quota reached. Contact sales.
503service_unavailableInference backend is temporarily down
504timeoutInference timed out. Try a smaller dataset.

Rate limit headers

Every API response includes these headers:

http
X-RateLimit-Limit: 2
X-RateLimit-Remaining: 1
X-Quota-Limit-Day: 20
X-Quota-Remaining-Day: 42
X-Quota-Limit-Month: 500
X-Quota-Remaining-Month: 387

When the per-minute rate limit kicks in, the 429 response includes a standard Retry-Afterheader with the number of seconds until the next window opens. Sleep at least that long before retrying — don't roll your own exponential backoff, and don't look for retry_afterin the JSON body (it isn't there).

http
HTTP/1.1 429 Too Many Requests
Retry-After: 25
X-RateLimit-Limit: 2
X-RateLimit-Remaining: 0

{"error": "rate_limit_exceeded", "message": "Too many requests. Please slow down."}

Free tier limits

LimitValue
Requests per minute2
Available tasksClassification, Regression
Requests per day20
Requests per month500
Max rows per request100,000
Max columns100
Max payload50 MB

Need higher limits? Contact sales →

Privacy & anonymization

Your data is never stored. Each request is processed in memory and discarded immediately after the response is returned. Nothing is logged, cached, or used for model training.

If your data is sensitive, you can fully anonymize it before sending — with zero impact on prediction quality:

  • Column names are arbitrary. Rename columns to c1, c2, etc. The only column name that matters is target_column, which just needs to match the header.
  • Categorical values can be mapped to integers. Replace "red"0, "blue"1, etc. TabH2O is a foundation model — it learns from statistical patterns, not label semantics.
  • Numeric values need no changes. Keep them as-is for best results.

Example: A dataset with columns customer_name, city, revenue, churned can be sent as c1, c2, c3, c4 with city names mapped to integers. Predictions will be identical.

Output usage

There are no restrictions on how you use the predictions returned by the API. Commercial use is permitted. Automated calls from pipelines, agents, and production systems are permitted (within the rate limits of your tier). You own your outputs.