How can multimodality improve representations of proteins? Foundation models have shown promise in building powerful representations for many domains. Language models are able to access a vast quantity of human knowledge and are able to perform limited reasoning over this body of knowledge. Protein models learn the evolutionary patterns in proteins, enabling prediction of protein structure and function. This talk will cover the development of protein foundation models, understanding the representations they build, and how they scale. Finally, it will cover incorporating modalities beyond protein sequences, and how additional data could be added to produce better representations in the future.