Comprehensive Smoke Test Script For 32 Tables
Hey guys! Let's dive into creating a comprehensive smoke test script for all 32 tables. This is super important because, as it turns out, just testing a few tables can lead to some major issues slipping through the cracks. We're talking about making sure everything works smoothly before we declare any milestones complete. So, buckle up, and let's get started!
The Problem: Why We Need This
So, here's the deal. Previously, we were only validating 4 out of 32 tables. That's like, a tiny 12.5% sample! This led to some, shall we say, optimistic claims of "perfect parity" when, in reality, there were widespread failures lurking beneath the surface. The main problem is comprehensive smoke tests weren't in place to systematically validate all tables.
Current State of Affairs
Right now, we've got a few tables that are playing nice:
- β
test_basic.simple_table - β
test_collections.collection_table - β
test_timeseries.sensor_data - β
test_wide_rows.wide_partition_table
But, and this is a big but, we have a whopping 28 tables that haven't been thoroughly tested. Let's break it down:
- 8 in
test_basic(thinkcomposite_key_table,compression_test_table,counters, and more). - 7 in
test_collections(includingcollection_clustering_table,collections_with_udts, etc.). - 8 in
test_timeseries(such asapp_metrics,event_store,log_entries, you get the gist). - 7 in
test_wide_rows(likechat_messages,document_versions, the whole shebang).
Known Culprits
And guess what? We already know about some troublemakers:
- β
static_columns_table- This one's throwing an "Unknown magic number0xC0515C00" error. - β
uncompressed_table- Another magic number mystery:0x0010045E. - β
ttl_test_table- And yet another magic number issue:0xEA220000.
The Mission: Our Objective
Our main goal is crystal clear: we need to create an automated smoke test that does the following:
- Loads every single one of those 32 tables.
- Validates basic operations like schema extraction and row iteration. You know, the fundamental stuff.
- Compares row counts against our JSONL fixtures. We need to make sure the numbers match up.
- Reports a clear pass/fail status. No ambiguity here!
- Runs in our CI pipeline. This is crucial for preventing regressions down the line. We need to catch issues early and often.
The Plan of Attack: Tasks
Alright, let's break down the tasks to get this smoke test rolling.
1. Crafting the Smoke Test Script
File: test-data/scripts/smoke-test-all-tables.sh
#!/usr/bin/env bash
set -euo pipefail
DATASETS_ROOT="${CQLITE_DATASETS_ROOT:-$PWD/test-data/datasets}"
RESULTS_FILE="smoke-test-results.txt"
FAILED_TABLES=()
PASSED_TABLES=()
echo "=== CQLite Smoke Test: All Tables ==="
echo "Dataset root: $DATASETS_ROOT"
echo ""
# Discover all tables
for table_dir in "$DATASETS_ROOT"/sstables/*/*/; do
table_path=$(basename "$table_dir")
table_name=$(echo "$table_path" | cut -d'-' -f1)
keyspace=$(basename "$(dirname "$table_dir")")
echo -n "Testing $keyspace.$table_name ... "
# Test 1: Load SSTable without errors
if ! cargo run --quiet --bin cqlite -- \
read-sstable "$table_dir" --output json > /tmp/smoke_$table_name.json 2>&1; then
echo "FAIL (load error)"
FAILED_TABLES+=("$keyspace.$table_name")
continue
fi
# Test 2: Validate row count (if JSONL exists)
jsonl_file="$table_dir/../${table_name}.jsonl"
if [ -f "$jsonl_file" ]; then
expected_rows=$(wc -l < "$jsonl_file")
actual_rows=$(wc -l < "/tmp/smoke_$table_name.json")
if [ "$expected_rows" -ne "$actual_rows" ]; then
echo "FAIL (row count: expected $expected_rows, got $actual_rows)"
FAILED_TABLES+=("$keyspace.$table_name")
continue
fi
fi
# Test 3: Validate schema extraction
if ! grep -q '"column_name"' /tmp/smoke_$table_name.json 2>/dev/null; then
echo "FAIL (schema extraction)"
FAILED_TABLES+=("$keyspace.$table_name")
continue
fi
echo "PASS"
PASSED_TABLES+=("$keyspace.$table_name")
done
# Summary
echo ""
echo "=== Smoke Test Summary ==="
echo "Passed: ${#PASSED_TABLES[@]}/32"
echo "Failed: ${#FAILED_TABLES[@]}/32"
if [ ${#FAILED_TABLES[@]} -gt 0 ]; then
echo ""
echo "Failed Tables:"
for table in "${FAILED_TABLES[@]}"; do
echo " - $table"
done
exit 1
fi
echo ""
echo "β
All tables passed **smoke test**"
This script is our workhorse. It systematically goes through each table, tries to load it, checks the row counts, and validates the schema extraction. If anything goes wrong, it flags the table as a failure. At the end, it gives us a nice summary of which tables passed and failed.
2. Adding a CLI Smoke Test Variant
File: test-data/scripts/smoke-test-cli-queries.sh
This script takes a different approach. It tests query execution on all tables. The idea here is to make sure we can actually query the data.
#!/usr/bin/env bash
# For each table, run:
# cqlite --schema <schema.cql> --data-dir <dir> --query "SELECT * FROM <table>" --output json
Basically, for every table, we're going to run a SELECT * FROM <table> query and see if it works. This gives us another layer of confidence that things are working as expected.
3. Setting Up a CI Job
File: .github/workflows/smoke-tests.yml
Now, this is where things get really cool. We're going to set up a Continuous Integration (CI) job that automatically runs our smoke tests whenever we push code or create a pull request. This is crucial for catching regressions early.
name: Smoke Tests
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
smoke-test-all-tables:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Build CQLite
run: cargo build --release --bin cqlite
- name: Run Smoke Tests
env:
CQLITE_DATASETS_ROOT: ${{ github.workspace }}/test-data/datasets
run: |
chmod +x test-data/scripts/smoke-test-all-tables.sh
./test-data/scripts/smoke-test-all-tables.sh
- name: Upload Results
if: failure()
uses: actions/upload-artifact@v4
with:
name: smoke-test-results
path: smoke-test-results.txt
This YAML file defines a CI job that:
- Checks out our code.
- Sets up Rust (because we're using Rust, duh!).
- Builds our
cqlitebinary. - Runs the smoke test script.
- If the tests fail, it uploads the results as an artifact so we can investigate.
4. Creating a Detailed Validation Matrix
File: test-data/validation-matrix.md
We need a clear way to track the status of each table. That's where the validation matrix comes in. This is a Markdown file that gives us a table-based view of the smoke test results.
| **Keyspace** | **Table** | **Rows** | **Schema** | **Magic** | **Load** | **Query** | **Parity** |
|---|---|---|---|---|---|---|---|
| test_basic | simple_table | 1000 | β
| β
| β
| β
| β
|
| test_basic | static_columns_table | 100 | β | β 0xC0515C00 | β | β | β |
| test_basic | uncompressed_table | 100 | β | β 0x0010045E | β | β | β |
| ... | ... | ... | ... | ... | ... | ... | ... |
Each row represents a table, and the columns show the results of different validation checks (row counts, schema extraction, magic number checks, etc.). This gives us a comprehensive overview of the status of each table.
5. Adding Issue Cross-References
Finally, we need to connect any smoke test failures to the issues that are causing them. This helps us prioritize our work and track progress.
## Smoke Test Failures β Blocking Issues
- static_columns_table, uncompressed_table, ttl_test_table β Issue #194 (Magic numbers)
- Tables with schema extraction failures β Issue #195 (SerializationHeader)
- Tables with partial results β Issue #196 (V5 parser iteration)
This section of our documentation will map specific table failures to the relevant issues in our issue tracker. This makes it much easier to understand why a test is failing and what needs to be done to fix it.
The Finish Line: Acceptance Criteria
To make sure we've nailed it, here's what we need to have in place:
- [ ] A smoke test script that exists and is executable. Obvious, right?
- [ ] The script tests all 32 tables in our test suite. No exceptions!
- [ ] Clear pass/fail output for each table. No more guessing games.
- [ ] Row count validation against JSONL fixtures. Accurate numbers are crucial.
- [ ] A CI job that runs smoke tests on every PR. Automation is our friend.
- [ ] A validation matrix that documents the status of all tables. We need a clear overview.
- [ ] Failures cross-referenced to blocking issues. Let's connect the dots.
Measuring Success: Success Metrics
We're going to track our progress in three phases:
Phase 1 (Baseline):
- Document the current pass rate: X/32 tables. Where are we starting from?
- Identify all failure modes (magic numbers, schema, parsing). What are the common problems?
Phase 2 (After P0 Fixes):
- Target: 32/32 tables pass smoke test. We want a perfect score!
- Zero "Unknown magic number" errors. Let's get rid of those pesky magic numbers.
- Zero schema extraction failures. We need to be able to read the schema correctly.
- Row counts match expected. Accurate data is key.
Phase 3 (Continuous):
- CI enforces: No PR merged if smoke test fails. This is our safety net.
- Regression tests prevent reintroduction of failures. We don't want to backslide.
Proving It Works: Validation
To run the smoke test locally, you can use these commands:
# Run smoke test locally
$ chmod +x test-data/scripts/smoke-test-all-tables.sh
$ env CQLITE_DATASETS_ROOT=$PWD/test-data/datasets \
./test-data/scripts/smoke-test-all-tables.sh
# Expected output:
Testing test_basic.simple_table ... PASS
Testing test_basic.static_columns_table ... FAIL (load error)
Testing test_basic.uncompressed_table ... FAIL (load error)
...
Passed: X/32
Failed: Y/32
This will run the smoke test script and give you a summary of the results.
Resources: References
- Unified Readiness Report: "Validation Plan - Phase 1: Smoke Test All Tables"
- Related: Issues #194, #195, #196 (P0 blockers preventing smoke test success)
Why This Matters: Priority Justification
This is a P1 - High Priority task. Why? Because it's the validation framework we need to prove that our P0 fixes are actually working. We can't claim milestone completion without comprehensive validation, and this smoke test is the key. The good news is that we can implement this in parallel with those P0 fixes, so we're not blocked.
So there you have it, guys! A comprehensive smoke test script for all 32 tables. Let's get this done and make sure our data is solid!