Commit ee44c3b
committed
Limit the GPU threads per block
When nested_parallelism is enabled, ensure that the GPU thread count is
either 1024 or the user-provided `autoscheduler.parallelism` value,
whichever is smaller.
The immediate effect is that when `f.compute_at(g, xo)`, the allocated
GPU shared memory is not multiplied by the factor of nested parallelism,
exceeding the Mullapudi2016's original optimal cache size estimate.1 parent a1aa557 commit ee44c3b
File tree
5 files changed
+48
-42
lines changed- apps
- bgu
- lens_blur
- local_laplacian
- stencil_chain
- src/autoschedulers/mullapudi2016
5 files changed
+48
-42
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
17 | 28 | | |
18 | 29 | | |
19 | 30 | | |
20 | 31 | | |
21 | 32 | | |
22 | | - | |
| 33 | + | |
23 | 34 | | |
24 | 35 | | |
25 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
57 | 42 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
17 | 28 | | |
18 | 29 | | |
19 | 30 | | |
20 | 31 | | |
21 | 32 | | |
22 | | - | |
23 | | - | |
24 | | - | |
| 33 | + | |
25 | 34 | | |
26 | 35 | | |
27 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1368 | 1368 | | |
1369 | 1369 | | |
1370 | 1370 | | |
1371 | | - | |
| 1371 | + | |
1372 | 1372 | | |
1373 | 1373 | | |
1374 | 1374 | | |
| |||
1396 | 1396 | | |
1397 | 1397 | | |
1398 | 1398 | | |
1399 | | - | |
| 1399 | + | |
1400 | 1400 | | |
1401 | 1401 | | |
1402 | 1402 | | |
| |||
1423 | 1423 | | |
1424 | 1424 | | |
1425 | 1425 | | |
1426 | | - | |
1427 | | - | |
1428 | | - | |
1429 | | - | |
1430 | | - | |
| 1426 | + | |
| 1427 | + | |
| 1428 | + | |
| 1429 | + | |
| 1430 | + | |
1431 | 1431 | | |
1432 | 1432 | | |
1433 | | - | |
| 1433 | + | |
1434 | 1434 | | |
1435 | 1435 | | |
1436 | 1436 | | |
1437 | 1437 | | |
1438 | 1438 | | |
1439 | 1439 | | |
1440 | 1440 | | |
1441 | | - | |
| 1441 | + | |
1442 | 1442 | | |
1443 | 1443 | | |
1444 | 1444 | | |
| |||
2210 | 2210 | | |
2211 | 2211 | | |
2212 | 2212 | | |
2213 | | - | |
| 2213 | + | |
2214 | 2214 | | |
2215 | 2215 | | |
2216 | 2216 | | |
| |||
2233 | 2233 | | |
2234 | 2234 | | |
2235 | 2235 | | |
2236 | | - | |
| 2236 | + | |
2237 | 2237 | | |
2238 | 2238 | | |
2239 | 2239 | | |
| |||
3439 | 3439 | | |
3440 | 3440 | | |
3441 | 3441 | | |
3442 | | - | |
| 3442 | + | |
| 3443 | + | |
3443 | 3444 | | |
3444 | 3445 | | |
3445 | 3446 | | |
| |||
3463 | 3464 | | |
3464 | 3465 | | |
3465 | 3466 | | |
3466 | | - | |
| 3467 | + | |
3467 | 3468 | | |
3468 | 3469 | | |
3469 | 3470 | | |
| |||
3552 | 3553 | | |
3553 | 3554 | | |
3554 | 3555 | | |
3555 | | - | |
| 3556 | + | |
3556 | 3557 | | |
3557 | 3558 | | |
3558 | 3559 | | |
| |||
0 commit comments