Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1. In MPK, each task is mapped to an individual SM. The amount of work handled by a task is similar to that of a thread block in the traditional kernel-per-operator approach.

2. TL;DR: MPK automatically analyzes inter-task dependencies by tracking the input and output tensors associated with each task. A longer version: Longer version: MPK uses imap, omap, and fmap (see Section 2 of the Mirage paper) to determine each task’s input and output tensors. A dependency is introduced between task A and task B if A produces any tensor elements that B consumes—that is, if A's outputs overlap with B's inputs.

> Again taking matmul as an example: a given output output tile requires the correspond M_BLOCK rows of the A matrix. If the A matrix was itself an output of a prior matmul (+ nonlinearity), the dependees would be all of output tile tasks corresponding to those M_BLOCK rows of the operator that produced A?

Exactly. In this case, all output tile tasks that consume those M_BLOCK rows of A will depend on all tasks responsible for producing the corresponding parts of A in the previous operator.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: