neosqlite.collection.temporary_table_aggregation package¶

Submodules¶

Module contents¶

class neosqlite.collection.temporary_table_aggregation.TemporaryTableAggregationProcessor(collection, query_engine=None)[source]¶

Bases: OperatorsMixin

__init__(collection, query_engine=None)[source]¶

Initialize the TemporaryTableAggregationProcessor with a collection.

Parameters:

collection – The NeoSQLite collection to process aggregation pipelines on. This collection provides the database connection and document loading functionality needed for pipeline processing.
query_engine – Optional QueryEngine instance for accessing helpers. If not provided, text search in match stages will use simplified processing.

process_pipeline(pipeline: list[dict[str, Any]], is_count: bool = False, count_field: str | None = None, batch_size: int = 101) → list[dict[str, Any]][source]¶

Process an aggregation pipeline using temporary tables for intermediate results.

This method implements a temporary table approach for processing complex aggregation pipelines that cannot be optimized into a single SQL query by the current NeoSQLite implementation. It works by:

Generating a deterministic pipeline ID based on the pipeline content
Using the aggregation_pipeline_context for atomicity and cleanup
Creating temporary tables for each stage or group of compatible stages
Processing pipeline stages in an optimized order (grouping compatible stages)
Returning the final results from the last temporary table

The method supports these pipeline stages: - $match: For filtering documents - $unwind: For deconstructing array fields - $lookup: For joining documents from different collections - $sort, $skip, $limit: For sorting and pagination - $addFields: For adding fields to documents - $count: For counting documents (optimized to use SQL COUNT)

Parameters:

pipeline (list[dict[str, Any]]) – A list of aggregation pipeline stages to process

Returns:

A list of result documents after processing the: pipeline

Return type:

list[dict[str, Any]]

Raises:

NotImplementedError – If the pipeline contains unsupported stages

class neosqlite.collection.temporary_table_aggregation.DeterministicTempTableManager(pipeline_id: str)[source]¶

Bases: object

Manager for deterministic temporary table names.

This class generates unique but deterministic temporary table names based on pipeline stages and a pipeline ID. It ensures that the same pipeline stage will always generate the same table name within the same pipeline execution, which is useful for caching and optimization purposes.

__init__(pipeline_id: str)[source]¶

Initialize the DeterministicTempTableManager with a pipeline ID for generating unique table names.

Parameters:: pipeline_id (str) – A unique identifier for the pipeline, used to ensure table names are deterministic and unique across different pipeline executions.

make_temp_table_name(stage: dict[str, Any], name_suffix: str = '') → str[source]¶

Generate a deterministic temporary table name based on the pipeline stage and pipeline ID.

This method creates a unique but deterministic name for a temporary table by: 1. Creating a canonical representation of the stage 2. Hashing the stage to create a short, unique suffix 3. Combining the pipeline ID, stage type, and hash to form a base name 4. Ensuring uniqueness by tracking name usage within the pipeline

Parameters:

stage (dict[str, Any]) – The pipeline stage dictionary used to generate the table name
name_suffix (str, optional) – An additional suffix to append to the table name. Defaults to “”.

Returns:

A deterministic temporary table name unique to this stage and: pipeline

Return type:

str

neosqlite.collection.temporary_table_aggregation.aggregation_pipeline_context(db_connection, pipeline_id: str | None = None)[source]¶

Context manager for temporary aggregation tables with automatic cleanup.

This context manager provides a clean and safe way to work with temporary tables during aggregation pipeline processing. It handles:

Creating a savepoint for atomicity of the entire pipeline
Generating deterministic temporary table names
Providing a function to create temporary tables with proper naming
Automatic cleanup of all temporary tables and savepoint on exit

The context manager supports both new deterministic naming (using stage dictionaries) and backward compatibility (using string suffixes) for temporary tables.

Parameters:

db_connection – The database connection object
pipeline_id (str | None) – A unique identifier for the pipeline. If None, a default ID is generated for backward compatibility.

Yields:

Callable –

A function to create temporary tables with the signature:

create_temp_table(stage_or_suffix, query, params=None, name_suffix=””)

Where: - stage_or_suffix: Either a stage dict (new approach) or string

(backward compatibility)

query: The SQL query to populate the temporary table
params: Optional parameters for the SQL query
name_suffix: Optional suffix for backward compatibility naming

Raises:

Exception – Any exception that occurs during pipeline processing is re-raised after cleanup operations

neosqlite.collection.temporary_table_aggregation.can_process_with_temporary_tables(pipeline: list[dict[str, Any]]) → bool[source]¶

Determine if a pipeline can be processed with temporary tables.

This function checks if all stages in an aggregation pipeline are supported by the temporary table processing approach. It verifies that each stage in the pipeline is one of the supported stage types.

Additionally, it handles special cases for text search operations: - Pure text search operations are supported with hybrid processing - Text search with simple unwind operations are supported (uses Python text search on temp tables) - Complex nested unwinds (multiple unwinds or dotted paths) fall back to Python

Parameters:

pipeline (list[dict[str, Any]]) – List of aggregation pipeline stages to check

Returns:

True if all stages in the pipeline are supported and can be processed: with temporary tables, False otherwise

Return type:

bool

neosqlite.collection.temporary_table_aggregation.execute_2nd_tier_aggregation(query_engine, pipeline: list[dict[str, Any]], batch_size: int = 101) → list[dict[str, Any]][source]¶

Execute aggregation pipeline using temporary table approach for complex pipelines.

This function is designed to be called as the second tier in a three-tier processing system: 1. First tier (QueryEngine): Try existing SQL optimization for simple pipelines 2. Second tier (this function): Try temporary table approach for complex pipelines 3. Third tier (QueryEngine): Fall back to Python implementation for unsupported operations

This function focuses specifically on processing complex pipelines that the current NeoSQLite SQL optimization cannot handle efficiently, using temporary tables for better performance.

Parameters:

query_engine – The NeoSQLite QueryEngine instance to use for processing
pipeline (list[dict[str, Any]]) – List of aggregation pipeline stages to process
batch_size (int) – Batch size for fetching results from temporary tables

Returns:

List of result documents after processing the pipeline

Return type:

list[dict[str, Any]]

neosqlite.collection.temporary_table_aggregation._sanitize_params(params: list[Any] | None) → list[Any] | None[source]¶

Sanitize SQL parameters by converting ObjectId instances to strings.

SQLite doesn’t know how to bind ObjectId types, so we convert them to strings.

Parameters:: params – List of parameters or None
Returns:: Sanitized parameters with ObjectId converted to strings

neosqlite.collection.temporary_table_aggregation._json_extract_field_with_objectid_support(json_function_prefix: str, field_name: str, is_local_field: bool = True) → str[source]¶

Generate SQL expression to extract a field value with ObjectId support.

When a field contains an ObjectId (stored as {“__neosqlite_objectid__”:true,”id”:”…”}), this extracts just the ID string instead of the full JSON object.

Parameters:

json_function_prefix – The JSON function prefix (json or jsonb)
field_name – The field name to extract
is_local_field – Whether this is a local field (True) or foreign field (False)

Returns:

SQL expression string

neosqlite.collection.temporary_table_aggregation._contains_text_search(match_spec: dict[str, Any]) → bool[source]¶

Check if a match specification contains text search operations.

This function delegates to the centralized _contains_text_operator function to ensure consistent text search detection across all NeoSQLite components.

Parameters:: match_spec (dict[str, Any]) – The match specification to check for text search operations
Returns:: True if the match specification contains text search operations, False otherwise
Return type:: bool