Transformation Examples

This page provides practical examples of transforming documents using the Graphora client library.

Basic Document Transformation

The following example demonstrates how to transform a set of documents using a registered ontology:

from graphora import GraphoraClient

# Initialize the client with user ID (required)
client = GraphoraClient(
    base_url="https://api.graphora.io",
    user_id="your-user-id",  # Required for all API calls
    api_key="your-api-key"
)

# Use a previously registered ontology
ontology_id = "ont_123456789"

# Prepare a list of document files to transform
document_paths = [
    "document1.pdf",
    "document2.docx",
    "document3.txt"
]

# Transform the documents
response = client.transform(ontology_id, document_paths)
print(f"Transformation started with ID: {response.id}")

# Check the status of the transformation
status = client.get_transform_status(response.id)
print(f"Transformation status: {status.status}")
print(f"Progress: {status.progress:.2%}")

Transforming Documents with Metadata

You can also provide metadata for each document to enrich the transformation process:

from graphora import GraphoraClient, DocumentMetadata, DocumentType

# Initialize the client with user ID (required)
client = GraphoraClient(
    base_url="https://api.graphora.io",
    user_id="your-user-id",  # Required for all API calls
    api_key="your-api-key"
)

# Use a previously registered ontology
ontology_id = "ont_123456789"

# Prepare document paths and metadata
document_paths = [
    "document1.pdf",
    "document2.docx"
]

metadata = [
    DocumentMetadata(
        source="Annual Report 2024",
        document_type=DocumentType.PDF,
        tags=["finance", "annual", "report"],
        custom_metadata={
            "department": "Finance",
            "confidentiality": "Internal"
        }
    ),
    DocumentMetadata(
        source="Product Specifications",
        document_type=DocumentType.DOCX,
        tags=["product", "specifications", "technical"],
        custom_metadata={
            "department": "Engineering",
            "product_id": "PRD-2024-001"
        }
    )
]

# Transform the documents with metadata
response = client.transform(ontology_id, document_paths, metadata=metadata)
print(f"Transformation started with ID: {response.id}")

Monitoring Transformation Progress

For long-running transformations, you can monitor the progress:

from graphora import GraphoraClient
import time

# Initialize the client with user ID (required)
client = GraphoraClient(
    base_url="https://api.graphora.io",
    user_id="your-user-id",  # Required for all API calls
    api_key="your-api-key"
)

# Start a transformation (assuming you have already set up the ontology and documents)
transform_response = client.transform(ontology_id, document_paths)
transform_id = transform_response.id

# Monitor the transformation progress
completed = False
while not completed:
    status = client.get_transform_status(transform_id)
    print(f"Status: {status.status}, Progress: {status.progress:.2%}")
    
    if status.status in ["COMPLETED", "FAILED"]:
        completed = True
        if status.status == "COMPLETED":
            print("Transformation completed successfully!")
        else:
            print(f"Transformation failed: {status.error}")
    else:
        # Wait for 5 seconds before checking again
        time.sleep(5)

# If transformation was successful, you can proceed to merging
if status.status == "COMPLETED":
    print("Ready to merge the transformed data into a graph")

Using the Wait Helper Method

You can also use the built-in wait method for convenience:

from graphora import GraphoraClient

# Initialize the client with user ID (required)
client = GraphoraClient(
    base_url="https://api.graphora.io",
    user_id="your-user-id",  # Required for all API calls
    api_key="your-api-key"
)

# Start a transformation
transform_response = client.transform(ontology_id, document_paths)
transform_id = transform_response.id

# Wait for completion (with timeout)
try:
    final_status = client.wait_for_transform(transform_id, timeout=300)  # 5 minutes
    print(f"Transformation completed with status: {final_status.status}")
except Exception as e:
    print(f"Transformation failed or timed out: {e}")

Handling Different Document Types

Graphora supports various document types. Here’s how to transform a mix of document types:

from graphora import GraphoraClient

# Initialize the client with user ID (required)
client = GraphoraClient(
    base_url="https://api.graphora.io",
    user_id="your-user-id",  # Required for all API calls
    api_key="your-api-key"
)

# Use a previously registered ontology
ontology_id = "ont_123456789"

# Prepare a list of different document types
documents = [
    "text_document.txt",      # Plain text
    "word_document.docx",     # Microsoft Word
    "pdf_document.pdf",       # PDF
    "data.csv",               # CSV
    "config.json",            # JSON
    "schema.yaml"             # YAML
]

# Transform the documents
response = client.transform(ontology_id, documents)
print(f"Transformation started with ID: {response.id}")

These examples demonstrate the basic operations for transforming documents with the Graphora client library. For more detailed information about transformation concepts and options, see the Transformation Concepts page.