Transformation Examples
This page provides practical examples of transforming documents using the Graphora client library.
The following example demonstrates how to transform a set of documents using a registered ontology:
from graphora import GraphoraClient
# Initialize the client with user ID (required)
client = GraphoraClient(
base_url="https://api.graphora.io",
user_id="your-user-id", # Required for all API calls
api_key="your-api-key"
)
# Use a previously registered ontology
ontology_id = "ont_123456789"
# Prepare a list of document files to transform
document_paths = [
"document1.pdf",
"document2.docx",
"document3.txt"
]
# Transform the documents
response = client.transform(ontology_id, document_paths)
print(f"Transformation started with ID: {response.id}")
# Check the status of the transformation
status = client.get_transform_status(response.id)
print(f"Transformation status: {status.status}")
print(f"Progress: {status.progress:.2%}")
You can also provide metadata for each document to enrich the transformation process:
from graphora import GraphoraClient, DocumentMetadata, DocumentType
# Initialize the client with user ID (required)
client = GraphoraClient(
base_url="https://api.graphora.io",
user_id="your-user-id", # Required for all API calls
api_key="your-api-key"
)
# Use a previously registered ontology
ontology_id = "ont_123456789"
# Prepare document paths and metadata
document_paths = [
"document1.pdf",
"document2.docx"
]
metadata = [
DocumentMetadata(
source="Annual Report 2024",
document_type=DocumentType.PDF,
tags=["finance", "annual", "report"],
custom_metadata={
"department": "Finance",
"confidentiality": "Internal"
}
),
DocumentMetadata(
source="Product Specifications",
document_type=DocumentType.DOCX,
tags=["product", "specifications", "technical"],
custom_metadata={
"department": "Engineering",
"product_id": "PRD-2024-001"
}
)
]
# Transform the documents with metadata
response = client.transform(ontology_id, document_paths, metadata=metadata)
print(f"Transformation started with ID: {response.id}")
For long-running transformations, you can monitor the progress:
from graphora import GraphoraClient
import time
# Initialize the client with user ID (required)
client = GraphoraClient(
base_url="https://api.graphora.io",
user_id="your-user-id", # Required for all API calls
api_key="your-api-key"
)
# Start a transformation (assuming you have already set up the ontology and documents)
transform_response = client.transform(ontology_id, document_paths)
transform_id = transform_response.id
# Monitor the transformation progress
completed = False
while not completed:
status = client.get_transform_status(transform_id)
print(f"Status: {status.status}, Progress: {status.progress:.2%}")
if status.status in ["COMPLETED", "FAILED"]:
completed = True
if status.status == "COMPLETED":
print("Transformation completed successfully!")
else:
print(f"Transformation failed: {status.error}")
else:
# Wait for 5 seconds before checking again
time.sleep(5)
# If transformation was successful, you can proceed to merging
if status.status == "COMPLETED":
print("Ready to merge the transformed data into a graph")
Using the Wait Helper Method
You can also use the built-in wait method for convenience:
from graphora import GraphoraClient
# Initialize the client with user ID (required)
client = GraphoraClient(
base_url="https://api.graphora.io",
user_id="your-user-id", # Required for all API calls
api_key="your-api-key"
)
# Start a transformation
transform_response = client.transform(ontology_id, document_paths)
transform_id = transform_response.id
# Wait for completion (with timeout)
try:
final_status = client.wait_for_transform(transform_id, timeout=300) # 5 minutes
print(f"Transformation completed with status: {final_status.status}")
except Exception as e:
print(f"Transformation failed or timed out: {e}")
Handling Different Document Types
Graphora supports various document types. Here’s how to transform a mix of document types:
from graphora import GraphoraClient
# Initialize the client with user ID (required)
client = GraphoraClient(
base_url="https://api.graphora.io",
user_id="your-user-id", # Required for all API calls
api_key="your-api-key"
)
# Use a previously registered ontology
ontology_id = "ont_123456789"
# Prepare a list of different document types
documents = [
"text_document.txt", # Plain text
"word_document.docx", # Microsoft Word
"pdf_document.pdf", # PDF
"data.csv", # CSV
"config.json", # JSON
"schema.yaml" # YAML
]
# Transform the documents
response = client.transform(ontology_id, documents)
print(f"Transformation started with ID: {response.id}")
These examples demonstrate the basic operations for transforming documents with the Graphora client library. For more detailed information about transformation concepts and options, see the Transformation Concepts page.