Skip to content

Fixes #1327 replacing py2neo with neo4j#1402

Open
dewardvide wants to merge 1 commit intoMISP:mainfrom
dewardvide:replace-py2neo-with-neo4j
Open

Fixes #1327 replacing py2neo with neo4j#1402
dewardvide wants to merge 1 commit intoMISP:mainfrom
dewardvide:replace-py2neo-with-neo4j

Conversation

@dewardvide
Copy link

Problem Statement

We need to migrate from py2neo to official neo4j or neomodels driver because py2neo has been deprecated since 2021 as pointed out in issue #1327.

Issue analysis & solution

  • The issue affects pymisp/tools/neo4j.py which migrates events into a Neo4j graph, establishing relationships between the events, attributes and attribute values.
  • Instead of a 1:1 port I made the following architectural changes:
    • I decided to use neo4j because of the simplicity of the script. Using it allows better performance and support because it is the official Neo4j lib
    • By choosing neo4j the Cypher queries were defined directly below is a summarized mapping of the old functions and the new ones:
Aspect py2neo (ORM) neo4j (Cypher)
Connection authenticate() + Graph() GraphDatabase.driver() with Bolt protocol
Node creation Python Node() objects Cypher CREATE (e:Event {$event.uuid})
Relationships Python Relationship() objects Cypher CREATE (a)-[:type]->(b)
Transactions tx.begin() / tx.commit() session.execute_write(callback)
Deduplication tx.merge(subgraph) Cypher MERGE keyword
  • Further implementation details are as follows:

    1. Fixed Protocol & Port: Changed from HTTP (7474) to Bolt (7687) — modern protocol for Neo4j
    2. Added Label Sanitization: MISP attribute types contain special characters (ip-srcdomain|ip). Neo4j label names only accept alphanumeric + underscores. Solution: re.sub(r'[^A-Za-z0-9_]', '_', a.type)
    3. Ensured Resource Safety: Used with context managers to guarantee session/connection cleanup
    4. Parameterized Queries: All parameters use $parameter syntax, preventing Cypher injection
    5. Added Debugging: Warning messages when attribute types are sanitized
    6. Added dependency declaration: Added neo4j = [ "neo4j (>=5.0)" ] to pyproject.toml under optional-dependencies

Testing

  • I created test/test_neo4j.py to test the new version of neo4j.py.
    • The test included the creation of a mock attribute using MISPEvent.add_attribute()
    • Connection to a local neo4j instance
    • Importing the events data using import_event() which is the main function
    • Verifying the event data and their relationships exist
    • Verifying the label sanitation
    • Cleaning up
image *test outcome*
  • All tests were successful!

Disclaimer : AI assisted me in analyzing the issue suggesting different approaches and streamlining testing.

Copy link
Member

@Rafiot Rafiot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I did a very quick review of the code and I have a few notes.

Just making sure: you're using neo4j regularly and know what you're doing, right? I have not tested your code yet, this review is simply based on a quick read through.

def __init__(self, host: str='localhost', port: int=7687, username: str='neo4j', password: str='neo4j') -> None:
if not has_neo4j:
raise Exception('neo4j is required, please install: pip install neo4j')
self.driver = GraphDatabase.driver(f"bolt://{host}:{port}", auth=(username, password))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick googling and it seems you should use neo4j as a scheme, not bolt

raise Exception('neo4j is required, please install: pip install neo4j')
self.driver = GraphDatabase.driver(f"bolt://{host}:{port}", auth=(username, password))

def close(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring the user to manually call the closing method is prone to mistake and will often lead to never closing the driver.

Best practice is to make it a context, or opening/closing the driver when needed.

f"CREATE (attr:Attribute:{safe_type} {{uuid: $uuid, category: $category, name: $value}}) " # Create an Attribute node with the sanitized type as a label
f"CREATE (e)-[:`is member`]->(attr)", # Then create a relationship (is member) between the Event and Attribute
event_uuid=str(event.uuid), uuid=str(a.uuid),
category=a.category, value=a.value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about the query system used by neo4j, but is it safe to pass an arbitrary value like that?


event = MISPEvent()
event.info = "Test Event - Neo4j Import"
event.uuid = "test-event-uuid-001"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a valid UUID, and that's automatically created by PyMISP.

import os


def test_neo4j():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a real test, please use unittest.


brotli = [ "urllib3[broti] (>=2.6.3)" ]

neo4j = [ "neo4j (>=5.0)" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to update the poetry.lock file.

with self.driver.session() as session:
session.execute_write(self._import_event_tx, event)

@staticmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don´t get the point of a static method there, why not have that code in the import_event directly?

@dewardvide
Copy link
Author

Thank you @Rafiot for the valuable feedback. I will look into the comments and respond/make the necessary changes within the course of the week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants