Transformation
Understanding the document transformation process in Graphora
Document Transformation
Document transformation is the process of converting unstructured documents into structured graph data based on your ontology. This is a core capability of Graphora that allows you to extract meaningful entities and relationships from your documents.
The Transformation Process
When you submit documents to Graphora for transformation, the following steps occur:
Document Upload
Documents are uploaded to the Graphora platform and validated.
Parsing
Documents are parsed into a format that can be processed by the extraction pipeline.
Entity Extraction
Entities defined in your ontology are identified and extracted from the documents.
Relationship Extraction
Relationships between entities are identified based on the ontology definitions.
Validation
Extracted entities and relationships are validated against the ontology constraints.
Graph Construction
A graph is constructed from the validated entities and relationships.
Supported Document Types
Graphora supports a variety of document types for transformation:
- Text files (
.txt
) - PDF documents (
.pdf
) - PPT documents (
.ppt
) - PPTX documents (
.pptx
) - Word documents (
.doc
) - Word documents (
.docx
) - CSV files (
.csv
) - JSON files (
.json
) - YAML files (
.yaml
) - XML files (
.xml
)
Transformation in the Client Library
The Graphora client library provides methods for transforming documents and monitoring the transformation process:
Transforming Documents
Checking Transformation Status
Retrieving the Transformed Graph
Transformation Stages
The transformation process goes through several stages, which you can monitor through the stage_progress
field in the transformation status:
Stage | Description |
---|---|
UPLOAD | Documents are being uploaded to the platform |
PARSING | Documents are being parsed into a processable format |
EXTRACTION | Entities and relationships are being extracted |
VALIDATION | Extracted data is being validated against the ontology |
INDEXING | Data is being indexed for efficient querying |
COMPLETED | Transformation is complete |
Handling Transformation Errors
If a transformation fails, you can check the error
field in the transformation status:
Best Practices for Document Transformation
To get the best results from document transformation:
- Ensure your ontology is well-defined: The quality of extraction depends on your ontology
- Use appropriate document types: Different document types may yield different results
- Monitor transformation progress: Long documents may take time to process
- Handle errors gracefully: Check for and handle transformation errors
- Clean up after completion: Use
cleanup_transform
to remove temporary files
Example: Complete Transformation Workflow
Here’s a complete example of transforming documents and handling the results:
Next Steps
- Learn about Merging extracted data
- Explore the Graph data model
- Check out the API Reference for detailed information