PDF Vector

Word Extraction API for developers

Extract structured data from Word documents using AI and JSON Schema. Perfect for contracts, forms, reports, and any structured Word content.

  • Schema-Based ExtractionDefine your data structure with JSON Schema and extract consistently from Word documents
  • AI-Powered UnderstandingIntelligent extraction that adapts to document variations while maintaining your schema
  • Type-Safe ResultsGet validated, structured JSON output ready for your database or application

Easy to use APIs

Use our simple APIs directly or our TypeScript SDK with just a few lines of code.

Word Extract

API Docs
import { PDFVector } from "pdfvector";

const client = new PDFVector({
    apiKey: "pdfvector_xxxxxxx"
  });

// From URL
const results = await client.extract({
    url: "https://example.com/invoice.docx",
    prompt: "Extract all invoice details",
    schema: {
      type: "object",
      properties: {
        invoiceNumber: { type: "string" },
        date: { type: "string" },
        totalAmount: { type: "number" }
      }
  }
  });

// From file
import { readFile } from "fs/promises";
const results = await client.extract({
    data: await readFile("document.docx"),
    contentType: "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    prompt: "Extract all invoice details",
    schema: {
      type: "object",
      properties: {
        invoiceNumber: { type: "string" },
        date: { type: "string" },
        totalAmount: { type: "number" }
      }
  }
  });

What people are saying

See how PDF Vector is helping teams improve their document processing workflows

Abdo El-Mobayad

Can't recommend PDF Vector enough! It boosts your AI workflow accuracy to 100% while dropping your costs! Especially if you're a T4 Org in the $150/m spend range!

Abdo El-Mobayad

Trent

Gotta give a shoutout to PDF Vector team for helping me set up PDF Vector for a project. They even delivered on a feature request before I purchased. Incredible customer service. 👏

Trent

Praneeth Pike

Been implementing RAG and changing a lot of things under the hood for @rabbitholesai, came across PDF Vector and it was a huge time saver. I got a document parsing solution for the rag pipeline within minutes! one less thing to worry about

Praneeth Pike

Structured Data from Word Documents

Transform unstructured Word documents into clean, validated data. Define your schema once and extract consistently from all your documents.

Get started

JSON Schema Definition

Use industry-standard JSON Schema to define exactly what data you need. Get predictable, type-safe results every time.

Intelligent Extraction

AI understands Word document structure and semantics, extracting data accurately even when document layouts vary.

Application-Ready Data

Receive clean, validated JSON that's ready for your database or business logic. No manual processing required.

Complex Document Support

Extract from contracts, forms, reports, proposals, and more. Handles tables, lists, and nested structures effortlessly.

Example Output

Real examples of structured data extracted from Word documents

Original Document

Invoice

Output

AI-generated answer to your question

Question

Extract all invoice details

Schema

{
  "type": "object",
  "properties": {
    "invoiceNumber": {
      "type": "string"
    },
    "totalAmount": {
      "type": "number"
    },
    "Basic Fee wmView": {
      "type": "string"
    }
  },
  "required": [
    "invoiceNumber",
    "totalAmount"
  ],
  "additionalProperties": false
}

Answer

{
  "data": {
    "invoiceNumber": "123100401",
    "totalAmount": 453.53,
    "Basic Fee wmView": "130,00 €"
  },
  "pageCount": 3,
  "creditCount": 9
}

One subscription, all APIs

Start for free, then scale as you grow. No hidden fees.

Save one month

Free

$0

Credit Card Required

Perfect for testing and small projects

  • Access to all APIs
  • 100 credits
Subscribe to Free

Basic

$23/month

$275 billed annually

Great for personal projects and small businesses

  • Access to all APIs
  • 3,000 credits
Subscribe to Basic
Most Popular

Pro

$89/month

$1067 billed annually

Most popular plan for growing businesses

  • Access to all APIs
  • 100,000 credits
Subscribe to Pro

Enterprise

$457/month

$5489 billed annually

For large-scale applications and enterprises

  • Access to all APIs
  • 500,000 credits
Subscribe to Enterprise

Ready to Structure Your Word Data?

Turn your Word document into structured, validated data with our powerful extraction API. Define once, extract forever.

No setup fees • Integrate in minutes • Cancel anytime

Frequently asked questions