-
Notifications
You must be signed in to change notification settings - Fork 117
Optimization: iFUB algorithm for unweighted graph diameter #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: iFUB algorithm for unweighted graph diameter #482
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #482 +/- ##
==========================================
+ Coverage 97.16% 97.19% +0.02%
==========================================
Files 121 121
Lines 7020 7092 +72
==========================================
+ Hits 6821 6893 +72
Misses 199 199 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thank you! On first look this looks great, but it will take a bit to get a proper review. In the meantime, we should probably have some more rigorous tests for it added. (1) Tests comparing against large graphs with known diameters (e.g. by construction); and (2) tests comparing against the old slow implementation for random graphs. Could you add such tests, maybe something that adds an additional 60 seconds to the tests, but with a "number of random samples" parameter that I can change locally and run for a few hours on my machine? |
|
Hello Krastanov, thanks for the comment. I have added (1) comparison against graphs with known diameters and the (2) comparison with old slow implementation for random graphs, under test/distance.jl. You can change NUM_SAMPLES there. Also, please note that it is my first time contributing to an existing Julia package, so please tell me if there is any changes to be done. |
|
Thanks, this looks great! I will try to review this in the next few days. One other thing that I need to check lately -- if you used an LLM-like tool, could you disclose that, which tool, and what the prompt was? I do not see anything in your PR that makes me think that was the case, and it is not like LLM tools are bad (they help a lot), but they do require approaching the review differently and having this extra info helps. |
|
Thanks for the feedback! I have used Gemini 3.0 for code generation. This optimization method was originally used in one of my researches, but I needed to change the style of the code to be similar to Graphs.jl. Therefore, I used Gemini to first change the code into a "Graphs.jl" format. Then I manually edited the codes again. |
|
Do you happen to have a reference of where this algorithm was introduced first / who named it iFUB, etc? Or a textbook about it? |
|
Pardon the formatter error -- it seems it is due to an incorrect security configuration on our end, not due to your code. |
|
Hello, sorry for the late response. The iFUB algorithm was first introduced in the following research papers. [Undirected Graphs] [Directed Graphs] Thank you. Please let me know if there is anything more needed. |
Krastanov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the review. Thanks again for submitting this, it is a great improvement.
I have a couple of minor documentation comments and a bigger comment about breaking changes in the current behavior in an edge case. After they are addressed this should be pretty straightforward to merge.
| ccs = connected_components(g) | ||
| largest_component = ccs[argmax(length.(ccs))] | ||
| g_lscc, _ = induced_subgraph(g, largest_component) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we should keep these tests of largest connected components, we should also have a set of tests that do not strip to just the largest connected subgraph, because this hides an inconsistency with the current master's behavior. One can argue that your current implementation has a more natural behavior, but even if that is true, it is a breaking change.
On master we have:
julia> diameter(Graph(2)) # a disconnected graph
┌ Warning: Infinite path length detected for vertex 1
└ @ Graphs ~/.julia/packages/Graphs/lNprZ/src/distance.jl:78
┌ Warning: Infinite path length detected for vertex 2
└ @ Graphs ~/.julia/packages/Graphs/lNprZ/src/distance.jl:78
9223372036854775807 # this is typemax(indextype)
If you do not like this behavior on current master and you have an argument for why it needs to be changed, we can discuss it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have edited the test/distance.jl file.
Please let me know if I have misunderstood something here, I believe I may have misunderstood something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
test/formattttt.jl
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be an old dev script you added? Just make sure to remove it before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this was my mistake. I have removed it.
src/distance.jl
Outdated
| if nv(g) <= 1 | ||
| return 0 | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep things consistent with the current diameter (see the other comment for more details), we should change this to connectivity check and in absence of connectivity we should return typemax(eltype(g)).
If you consider the connectivity check too expensive, we can consider a keyword argument to skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have noticed that it should have been typemax(). I have edited that, thanks for pointing this out.
src/distance.jl
Outdated
| eccentricities, return the maximum eccentricity of the graph. | ||
| For unweighted `SimpleGraph` and `SimpleDiGraph`, an optimized BFS algorithm | ||
| (iFUB) is used to avoid computing eccentricities for all vertices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add nicely formatted references (the two papers you shared) to the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added the references.
test/distance.jl
Outdated
| end | ||
| end | ||
|
|
||
| @testset "$(typeof(g))" for g in test_generic_graphs(a3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this testset is a bit cryptic to me. Could you elaborate it so that it is more immediately understandable if a test failure is reported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have edited the test name. Let me know if there is another issue that comes into your mind.
|
And one last thing: let's make sure |
Krastanov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I think there are only two minor things left:
- making sure this works on any graph, not just on specific graph types (after all other graph types might support the same API)
- making sure the doc links are to archival pages that probably will exist for multiple decades
src/distance.jl
Outdated
| Given a graph and optional distance matrix, or a vector of precomputed | ||
| eccentricities, return the maximum eccentricity of the graph. | ||
| An optimizied BFS algorithm (iFUB) is used for unweighted graphs, both [undirected](https://semeval.inria.fr/2012/printemps/theme1/teams/gang/62.pdf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This links seems to be someone's personal page or some informal repository. Could you switch the link to a more official resources, something that has better chances of still existing in 20 years?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the link to a one on ScienceDirect. I hope this link will have a better change of existing after decades.
src/distance.jl
Outdated
| """ | ||
| diameter(eccentricities) | ||
| diameter(g, distmx=weights(g)) | ||
| diameter(g::Union{SimpleGraph, SimpleDiGraph}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am reading this correctly, your algorithm supports any graph that implements the Graphs.jl API. E.g. LEMONGraph/IGraph/NautyGraph/VNGraph are all supported, because they have the outneighbors/inneighbors/vertices functions.
We should not artificially exclude them (that is not really in the spirit of Multiple Dispatch languages like Julia).
For an unweighted graph we automatically have weights(g) === DefaultDistance(g).
So the new methods should probably be:
diameter(g, ::DefaultDistance)your implementationdiameter(g) = diameter(g, weights(g))diamgter(g, distmx)the already existing method but with the default=weights(g)removed so that the dispatch tree works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have edited the codes, I hope it is alright now. Thanks for pointing this out, I missed this point.
| ccs = connected_components(g) | ||
| largest_component = ccs[argmax(length.(ccs))] | ||
| g_lscc, _ = induced_subgraph(g, largest_component) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
Implement iFUB algorithm for unweighted diameter.
Added some more rigorous tests
Edited reference, edited diameters() for multiple dispatch.
Fixed code that it will pass tests.
6eacbef to
ffdbd05
Compare
|
Let me know if there's anything else needed from my side to get this merged. Thank you. |
|
Looks good! I pushed a couple of minor changes:
This should be merged and released shortly. Thank you for this contribution, it is very valuable! |
Implement iFUB algorithm for diameter calculation of unweighted graphs,
SimpleGraphandSimpleDiGraphBelow is a comparison with the original diameter calculation logic, ran on my desktop.