Full-attention cross-modal Transformer (FACT) model that can mimic and understand dance motions and can even enhance a person’s ability to choreograph dance.