Skip to content

Floating point exception(core dump) while running ib_send_bw #14

@UntaggedRui

Description

@UntaggedRui

Problem

I'm reproducing freeflow on my own machine, and when I run ib_send_bw in two containers located at the same physical machine, Floating point exception(core dump) occured. The error is also in two containers located at two machines while these two machines can run ib_send_bw correctely.
I am strictly following the quick start of github. Here is my environment and how I run freeflow.

Environment

  • os: ubuntu 14.04.6 with Linux Kernel 4.4.0-142-generic
  • RMDA NIC:
05:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
		Product Name: CX516A - ConnectX-5 QSFP28
05:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
		Product Name: CX516A - ConnectX-5 QSFP28=
  • OFED version:MLNX_OFED_LINUX-4.0-2.0.0.1-ubuntu14.04-x86_64,
  • Docker version: Docker version 1.13.0, build 49bf474
  • weave: 2.5.2
  • gcc: gcc-4.8.5 gcc-5.5.0 gcc-6.5.0 gcc-7.4.0.

When searching my problems in google, I saw the reason for this error is because the gcc version is wrong. So I tried gcc version with gcc-4.8.5 gcc-5.5.0 gcc-6.5.0 gcc-7.4.0. However, it is still like this.

host IPs and virtual IP to host IP mapping

I have two machines, 192.168.2.203 and 192.168.2.206. They are connected by weave overlay.I have modied host IPs and virtual IP to host IP mapping in my code. In ffrouter.h#L76,

const char HOST_LIST[HOST_NUM][16] = {
    "192.168.2.13",
    "192.168.2.15"
};

In ffrouter.cpp#L215,

    this->vip_map["10.47.128.0"] = "192.168.2.203";
    this->vip_map["10.47.0.5"] = "192.168.2.206";

Implementation

At host 203, enter freeflow router container and excute ./router router1, and run a container named node1, whose ip is 10.47.128.0, and run ib_send_bw.
At host 206, enter freeflow router container and excute ./router router1, and run a container named node2, whose ip is 10.47.0.8, and run ib_send_bw 10.47.128.0.
Then the error happened.
In container which run ib_send_bw, log is
error
In container which run ib_send_bw 10.47.128.0, log is
error2
How should I solve it? And can you tell me your gcc version ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions