Skip to content

Query Planner Fails to Validate Valid ABFSS Path with Wildcard (**) #3014

@matteohexagon

Description

@matteohexagon

Subject: Query Planner Fails to Validate Valid ABFSS Path with Wildcard (**)

Component: Storage - Azure

Apache Drill Version: 1.22.0

Summary:

A SELECT query against a specific directory path on Azure Blob Storage (using the ABFSS connector) fails during the validation phase with an "Object not found" error. However, Drill's own file listing tools (SHOW FILES) can see and list the contents of the exact same path, and a global wildcard query can read the data successfully.

The issue appears to be a bug in the query planner's path validation logic. The planner seems to develop a "stuck" or "corrupted" state for certain directory names, refusing to acknowledge them in SELECT statements while other parts of Drill can access them without issue. The bug persists even after restarting the Drillbit and completely deleting/recreating the storage plugin.

Environment:

  • Storage Plugin: file
  • Connection Type: Azure Blob Storage (abfss://<container>@<account>.dfs.core.windows.net)
  • Authentication: SharedKey

Storage Plugin Configuration:

{
  "type": "file",
  "enabled": true,
  "connection": "abfss://<container>@<account>.dfs.core.windows.net",
  "config": {
    "fs.azure.account.auth.type": "SharedKey",
    "fs.azure.account.key.observercondenseddata.dfs.core.windows.net": "...",
    "fs.azure.createRemoteFileSystemDuringInitialization": "false",
    "fs.azure.io.list.recursive": "true"
  },
  "workspaces": {
    "root": {
      "location": "/",
      "writable": false,
      "allowRecursiveScan": true
    },
    "monthly": {
       "location": "/prod-condenser-logs-1-Month/",
       "writable": false,
       "allowRecursiveScan": true
     },
     "daily": {
       "location": "/prod-condenser-logs-1-day/",
       "writable": false,
       "allowRecursiveScan": true
     },
     "hourly": {
       "location": "/prod-condenser-logs-1-hour/",
       "writable": false,
       "allowRecursiveScan": true
     }
  },
  "formats": {
    "log": {
      "type": "logRegex",
      "extension": "log",
      "regex": "^(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}) - (\\w+) - (.*)|^(.+)",
      "maxErrors": 100000,
      "schema": [
        {"fieldName": "log_timestamp", "fieldType": "TIMESTAMP", "format": "yyyy-MM-dd HH:mm:ss,SSS"},
        {"fieldName": "log_level"},
        {"fieldName": "structured_message"},
        {"fieldName": "unstructured_line"}
      ]
    }
  }
}

Directory Structure on Azure:

/
├── prod-condenser-logs-1-Month/
│   └── 2025/
│       └── 07/
├── prod-condenser-logs-1-day/
│   └── 2025/
│       ├── 07/
│       └── 08/
└── prod-condenser-logs-1-hour/
    └── 2025/
        └── ...

Steps to Reproduce:

  1. A query on a sibling directory works correctly: The following query against the ...-1-Month directory executes successfully every time.

    SELECT * FROM az.root.`prod-condenser-logs-1-Month/2025/**` LIMIT 10;
  2. An identical query on the target directory fails: The following query against the ...-1-day directory consistently fails.

    SELECT * FROM az.root.`prod-condenser-logs-1-day/2025/**` LIMIT 10;
  3. Drill's listing tools prove the path is visible: Contradicting the query failure, the SHOW FILES command can see and list the contents of the failing directory, proving the path is valid and accessible to Drill.

    -- This command SUCCEEDS and shows the '2025' directory within
    SHOW FILES FROM az.root.`prod-condenser-logs-1-day`;

Expected Behavior:

The SELECT query against az.root.prod-condenser-logs-1-day/2025/**`` should execute successfully, just as the query against the sibling ...-1-Month directory does.

Actual Behavior:

The query fails during the validation phase with the error:
VALIDATION ERROR: ... Object 'prod-condenser-logs-1-day/2025/**' not found within 'az.root'

Troubleshooting Steps Attempted (All Failed to Resolve the Issue):

  • Restarting the Drillbit: The issue persists immediately after a full restart.
  • Deleting and Recreating the Storage Plugin: The exact same behavior occurs after completely removing the az plugin and recreating it from the saved configuration.
  • Renaming/Duplicating the Source Directory: Renaming the directory in Azure to a new name (e.g., prod-condenser-logs-daily-new) and querying it results in the same "Object not found" error.
  • Using Defined Workspaces: Querying via the az.daily workspace (e.g., FROM az.daily.2025/**``) also fails with the same error, even though SHOW FILES IN az.daily correctly lists the contents.
  • REFRESH TABLE METADATA: This command fails because Drill does not recognize the paths as tables.

Final Workaround Discovered:

The only reliable method to query the data in the affected directories is to use a global wildcard from the root (FROM az.root.**``) and then filter the desired path using a WHERE clause. This proves the data is readable and the bug is specific to the planner's path validation.

-- This query WORKS and returns data from the '...-1-day' directory
SELECT *
FROM az.root.`**`
WHERE filepath LIKE '%/prod-condenser-logs-1-day/%'
LIMIT 10;

This workaround suggests the core data reading engine is functional, but the upfront query validation is failing on specific path strings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions