Predicting a single label (or a distribution over labels as shown here to indicate our confidence) for a given image.
Detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos.
Partitioning image into semantically meaningful parts to classify each part into one of the pre-determined classes.
A.k.a Human Pose Estimation, detecting human figures in images and video to determine, for example, where someone’s elbow shows up in an image.
WIP Detecting faces of participants by using object detection and checks whether each face was present or not.
Detecting facial landmarks like eyes, nose, mouth, etc., which can be used for web-based try-on simulator of online store.
Transfering makeup style of the sample makeup image to facial image to check how the selected makeup looks like.
Generating higher-resolution image or video frames to prevent degradation of the perceived image or video quality.
Providing automatic image captioning which predicts explanatory words of the presentation slides for better accessibility.
Translating every text into different language.
Analysing and inferring emotion from input text, and displaying an emoji that represents the estimated emotion.
Generating short version of the recorded video to reduce recorded video data to be stored.