39-Issue 7

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/2632946

Browse

Now showing 1 - 20 of 54

Monocular Human Pose and Shape Reconstruction using Part Differentiable Rendering
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Wang, Min; Qiu, Feng; Liu, Wentao; Qian, Chen; Zhou, Xiaowei; Ma, Lizhuang; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Superior human pose and shape reconstruction from monocular images depends on removing the ambiguities caused by occlusions and shape variance. Recent works succeed in regression-based methods which estimate parametric models directly through a deep neural network supervised by 3D ground truth. However, 3D ground truth is neither in abundance nor can efficiently be obtained. In this paper, we introduce body part segmentation as critical supervision. Part segmentation not only indicates the shape of each body part but helps to infer the occlusions among parts as well. To improve the reconstruction with part segmentation, we propose a part-level differentiable renderer that enables part-based models to be supervised by part segmentation in neural networks or optimization loops. We also introduce a general parametric model engaged in the rendering pipeline as an intermediate representation between skeletons and detailed shapes, which consists of primitive geometries for better interpretability. The proposed approach combines parameter regression, body model optimization, and detailed model registration altogether. Experimental results demonstrate that the proposed method achieves balanced evaluation on pose and shape, and outperforms the state-of-the-art approaches on Human3.6M, UP-3D and LSP datasets.
Semi-Supervised 3D Shape Recognition via Multimodal Deep Co-training
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Song, Mofei; Liu, Yu; Liu, Xiao Fan; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
3D shape recognition has been actively investigated in the field of computer graphics. With the rapid development of deep learning, various deep models have been introduced and achieved remarkable results. Most 3D shape recognition methods are supervised and learn only from the large amount of labeled shapes. However, it is expensive and time consuming to obtain such a large training set. In contrast to these methods, this paper studies a semi-supervised learning framework to train a deep model for 3D shape recognition by using both labeled and unlabeled shapes. Inspired by the co-training algorithm, our method iterates between model training and pseudo-label generation phases. In the model training phase, we train two deep networks based on the point cloud and multi-view representation simultaneously. In the pseudo-label generation phase, we generate the pseudo-labels of the unlabeled shapes using the joint prediction of two networks, which augments the labeled set for the next iteration. To extract more reliable consensus information from multiple representations, we propose an uncertainty-aware consistency loss function to combine the two networks into a multimodal network. This not only encourages the two networks to give similar predictions on the unlabeled set, but also eliminates the negative influence of the large performance gap between the two networks. Experiments on the benchmark ModelNet40 demonstrate that, with only 10% labeled training data, our approach achieves competitive performance to the results reported by supervised methods.
SCGA-Net: Skip Connections Global Attention Network for Image Restoration
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Ren, Dongdong; Li, Jinbao; Han, Meng; Shu, Minglei; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Deep convolutional neural networks (DCNN) have shown their advantages in the image restoration tasks. But most existing DCNN-based methods still suffer from the residual corruptions and coarse textures. In this paper, we propose a general framework ''Skip Connections Global Attention Network'' to focus on the semantics delivery from shallow layers to deep layers for low-level vision tasks including image dehazing, image denoising, and low-light image enhancement. First of all, by applying dense dilated convolution and multi-scale feature fusion mechanism, we establish a novel encoder-decoder network framework to aggregate large-scale spatial context and enhance feature reuse. Secondly, the solution we proposed for skipping connection uses attention mechanism to constraint information, thereby enhancing the high-frequency details of feature maps and suppressing the output of corruptions. Finally, we also present a novel attention module dubbed global constraint attention, which could effectively captures the relationship between pixels on the entire feature maps, to obtain the subtle differences among pixels and produce an overall optimal 3D attention maps. Extensive experiments demonstrate that the proposed method achieves significant improvements over the state-of-the-art methods in image dehazing, image denoising, and low-light image enhancement.
Two-stage Photograph Cartoonization via Line Tracing
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Li, Simin; Wen, Qiang; Zhao, Shuang; Sun, Zixun; He, Shengfeng; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Cartoon is highly abstracted with clear edges, which makes it unique from the other art forms. In this paper, we focus on the essential cartoon factors of abstraction and edges, aiming to cartoonize real-world photographs like an artist. To this end, we propose a two-stage network, each stage explicitly targets at producing abstracted shading and crisp edges respectively. In the first abstraction stage, we propose a novel unsupervised bilateral flattening loss, which allows generating high-quality smoothing results in a label-free manner. Together with two other semantic-aware losses, the abstraction stage imposes different forms of regularization for creating cartoon-like flattened images. In the second stage we draw lines on the structural edges of the flattened cartoon with the fully supervised line drawing objective and unsupervised edge augmenting loss. We collect a cartoon-line dataset with line tracing, and it serves as the starting point for preparing abstraction and line drawing data. We have evaluated the proposed method on a large number of photographs, by converting them to three different cartoon styles. Our method substantially outperforms state-of-the-art methods in terms of visual quality quantitatively and qualitatively.
Simulation of Arbitrarily-shaped Magnetic Objects
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Kim, Seung-wook; Han, JungHyun; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Abstract We propose a novel method for simulating rigid magnets in a stable way. It is based on analytic solutions of the magnetic vector potential and flux density, which make the magnetic forces and torques calculated using them seldom diverge. Therefore, our magnet simulations remain stable even though magnets are in close proximity or penetrate each other. Thanks to the stability, our method can simulate magnets of any shapes. Another strength of our method is that the time complexities for computing the magnetic forces and torques are significantly reduced, compared to the previous methods. Our method is easily integrated with classic rigid-body simulators. The experiment results presented in this paper prove the stability and efficiency of our method.
Coarse to Fine:Weak Feature Boosting Network for Salient Object Detection
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Zhang, Chenhao; Gao, Shanshan; Pan, Xiao; Wang, Yuting; Zhou, Yuanfeng; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Salient object detection is to identify objects or regions with maximum visual recognition in an image, which brings significant help and improvement to many computer visual processing tasks. Although lots of methods have occurred for salient object detection, the problem is still not perfectly solved especially when the background scene is complex or the salient object is small. In this paper, we propose a novel Weak Feature Boosting Network (WFBNet) for the salient object detection task. In the WFBNet, we extract the unpredictable regions (low confidence regions) of the image via a polynomial function and enhance the features of these regions through a well-designed weak feature boosting module (WFBM). Starting from a coarse saliency map, we gradually refine it according to the boosted features to obtain the final saliency map, and our network does not need any post-processing step. We conduct extensive experiments on five benchmark datasets using comprehensive evaluation metrics. The results show that our algorithm has considerable advantages over the existing state-of-the-art methods.
Diversifying Semantic Image Synthesis and Editing via Class- and Layer-wise VAEs
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Endo, Yuki; Kanamori, Yoshihiro; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Semantic image synthesis is a process for generating photorealistic images from a single semantic mask. To enrich the diversity of multimodal image synthesis, previous methods have controlled the global appearance of an output image by learning a single latent space. However, a single latent code is often insufficient for capturing various object styles because object appearance depends on multiple factors. To handle individual factors that determine object styles, we propose a class- and layer-wise extension to the variational autoencoder (VAE) framework that allows flexible control over each object class at the local to global levels by learning multiple latent spaces. Furthermore, we demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods via extensive experiments with real and synthetic datasets in three different domains. We also show that our method enables a wide range of applications in image synthesis and editing tasks.
Learning Target-Adaptive Correlation Filters for Visual Tracking
(The Eurographics Association and John Wiley & Sons Ltd., 2020) She, Ying; Yi, Yang; Gu, Jialiang; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Correlation filters (CF) achieve excellent performance in visual tracking but suffer from undesired boundary effects. A significant amount of approaches focus on enlarging search regions to make up for this shortcoming. However, this introduces excessive background noises and misleads the filter into learning from the ambiguous information. In this paper, we propose a novel target-adaptive correlation filter (TACF) that incorporates context and spatial-temporal regularizations into the CF framework, thus learning a more robust appearance model in the case of large appearance variations. Besides, it can be effectively optimized via the alternating direction method of multipliers(ADMM), thus achieving a global optimal solution. Finally, an adaptive updating strategy is presented to discriminate the unreliable samples and alleviate the contamination of these training samples. Extensive evaluations on OTB-2013, OTB-2015, VOT-2016, VOT-2017 and TC-128 datasets demonstrate that our TACF is very promising for various challenging scenarios compared with several state-of-the-art trackers, with real-time performance of 20 frames per second(fps).
A Graph-based One-Shot Learning Method for Point Cloud Recognition
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Fan, Zhaoxin; Liu, Hongyan; He, Jun; Sun, Qi; Du, Xiaoyong; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Point cloud based 3D vision tasks, such as 3D object recognition, are critical to many real world applications such as autonomous driving. Many point cloud processing models based on deep learning have been proposed by researchers recently. However, they are all large-sample dependent, which means that a large amount of manually labelled training data are needed to train the model, resulting in huge labor cost. In this paper, to tackle this problem, we propose a One-Shot learning model for Point Cloud Recognition, namely OS-PCR. Different from previous methods, our method formulates a new setting, where the model only needs to see one sample per class once for memorizing at inference time when new classes are needed to be recognized. To fulfill this task, we design three modules in the model: an Encoder Module, an Edge-conditioned Graph Convolutional Network Module, and a Query Module. To evaluate the performance of the proposed model, we build a one-shot learning benchmark dataset for 3D point cloud analysis. Then, comprehensive experiments are conducted on it to demonstrate the effectiveness of our proposed model.
Robust Computation of 3D Apollonius Diagrams
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Wang, Peihui; Yuan, Na; Ma, Yuewen; Xin, Shiqing; He, Ying; Chen, Shuangmin; Xu, Jian; Wang, Wenping; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Apollonius diagrams, also known as additively weighted Voronoi diagrams, are an extension of Voronoi diagrams, where the weighted distance is defined by the Euclidean distance minus the weight. The bisectors of Apollonius diagrams have a hyperbolic form, which is fundamentally different from traditional Voronoi diagrams and power diagrams. Though robust solvers are available for computing 2D Apollonius diagrams, there is no practical approach for the 3D counterpart. In this paper, we systematically analyze the structural features of 3D Apollonius diagrams, and then develop a fast algorithm for robustly computing Apollonius diagrams in 3D. Our algorithm consists of vertex location, edge tracing and face extraction, among which the key step is to adaptively subdivide the initial large box into a set of sufficiently small boxes such that each box contains at most one Apollonius vertex. Finally, we use centroidal Voronoi tessellation (CVT) to discretize the curved bisectors with well-tessellated triangle meshes. We validate the effectiveness and robustness of our algorithm through extensive evaluation and experiments. We also demonstrate an application on computing centroidal Apollonius diagram.
A Multi-Person Selfie System via Augmented Reality
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Lin, Jie; Yang, Chuan-Kai; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
The limited length of a selfie stick always poses the problem of distortion in a selfie, in spite of the prevalence of selfie stick in recent years.We propose a technique, based on modifying existing augmented reality technology, to support the selfie of multiple persons, through properly aligning different photographing processes. It can be shown that our technique helps avoiding the common distortion drawback of using a selfie stick, and facilitates the composition process of a group photo. It can also be used to create some special effects, including creating an illusion of having multiple appearances of a person.
Personalized Hand Modeling from Multiple Postures with Multi-View Color Images
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Wang, Yangang; Rao, Ruting; Zou, Changqing; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Personalized hand models can be utilized to synthesize high quality hand datasets, provide more possible training data for deep learning and improve the accuracy of hand pose estimation. In recent years, parameterized hand models, e.g., MANO, are widely used for obtaining personalized hand models. However, due to the low resolution of existing parameterized hand models, it is still hard to obtain high-fidelity personalized hand models. In this paper, we propose a new method to estimate personalized hand models from multiple hand postures with multi-view color images. The personalized hand model is represented by a personalized neutral hand, and multiple hand postures. We propose a novel optimization strategy to estimate the neutral hand from multiple hand postures. To demonstrate the performance of our method, we have built a multi-view system and captured more than 35 people, and each of them has 30 hand postures.We hope the estimated hand models can boost the research of highfidelity parameterized hand modeling in the future. All the hand models are publicly available on www.yangangwang.com.
Human Pose Transfer by Adaptive Hierarchical Deformation
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Zhang, Jinsong; Liu, Xingzi; Li, Kun; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Human pose transfer, as a misaligned image generation task, is very challenging. Existing methods cannot effectively utilize the input information, which often fail to preserve the style and shape of hair and clothes. In this paper, we propose an adaptive human pose transfer network with two hierarchical deformation levels. The first level generates human semantic parsing aligned with the target pose, and the second level generates the final textured person image in the target pose with the semantic guidance. To avoid the drawback of vanilla convolution that treats all the pixels as valid information, we use gated convolution in both two levels to dynamically select the important features and adaptively deform the image layer by layer. Our model has very few parameters and is fast to converge. Experimental results demonstrate that our model achieves better performance with more consistent hair, face and clothes with fewer parameters than state-of-the-art methods. Furthermore, our method can be applied to clothing texture transfer. The code is available for research purposes at https://github.com/Zhangjinso/PINet_PG.
Deep Separation of Direct and Global Components from a Single Photograph under Structured Lighting
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Duan, Zhaoliang; Bieron, James; Peers, Pieter; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
We present a deep learning based solution for separating the direct and global light transport components from a single photograph captured under high frequency structured lighting with a co-axial projector-camera setup. We employ an architecture with one encoder and two decoders that shares information between the encoder and the decoders, as well as between both decoders to ensure a consistent decomposition between both light transport components. Furthermore, our deep learning separation approach does not require binary structured illumination, allowing us to utilize the full resolution capabilities of the projector. Consequently, our deep separation network is able to achieve high fidelity decompositions for lighting frequency sensitive features such as subsurface scattering and specular reflections. We evaluate and demonstrate our direct and global separation method on a wide variety of synthetic and captured scenes.
SRF-Net: Spatial Relationship Feature Network for Tooth Point Cloud Classification
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Ma, Qian; Wei, Guangshun; Zhou, Yuanfeng; Pan, Xiao; Xin, Shiqing; Wang, Wenping; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
3D scanned point cloud data of teeth is popular used in digital orthodontics. The classification and semantic labelling for point cloud of each tooth is a key and challenging task for planning dental treatment. Utilizing the priori ordered position information of tooth arrangement, we propose an effective network for tooth model classification in this paper. The relative position and the adjacency similarity feature vectors are calculated for tooth 3D model, and combine the geometric feature into the fully connected layers of the classification training task. For the classification of dental anomalies, we present a dental anomalies processing method to improve the classification accuracy. We also use FocalLoss as the loss function to solve the sample imbalance of wisdom teeth. The extensive evaluations, ablation studies and comparisons demonstrate that the proposed network can classify tooth models accurately and automatically and outperforms state-of-the-art point cloud classification methods.
A Bayesian Inference Framework for Procedural Material Parameter Estimation
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Guo, Yu; Hasan, Milos; Yan, Lingqi; Zhao, Shuang; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Procedural material models have been gaining traction in many applications thanks to their flexibility, compactness, and easy editability. We explore the inverse rendering problem of procedural material parameter estimation from photographs, presenting a unified view of the problem in a Bayesian framework. In addition to computing point estimates of the parameters by optimization, our framework uses a Markov Chain Monte Carlo approach to sample the space of plausible material parameters, providing a collection of plausible matches that a user can choose from, and efficiently handling both discrete and continuous model parameters. To demonstrate the effectiveness of our framework, we fit procedural models of a range of materials-wall plaster, leather, wood, anisotropic brushed metals and layered metallic paints-to both synthetic and real target images.
Cosserat Rod with rh-Adaptive Discretization
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Wen, Jiahao; Chen, Jiong; Nobuyuki, Umetani; Bao, Hujun; Huang, Jin; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Rod-like one-dimensional elastic objects often exhibit complex behaviors which pose great challenges to the discretization method for pursuing a faithful simulation. By only moving a small portion of material points, the Eulerian-on-Lagrangian (EoL) method already shows great adaptivity to handle sharp contact, but it is still far from enough to reproduce rich and complex geometry details arising in simulations. In this paper, we extend the discrete configuration space by unifying all Lagrangian and EoL nodes in representation for even more adaptivity with every sample being assigned with a dynamic material coordinate. However, this great extension will immediately bring in much more redundancy in the dynamic system. Therefore, we propose additional energy to control the spatial distribution of all material points, seeking to equally space them with respect to a curvature-based density field as a monitor. This flexible approach can effectively constrain the motion of material points to resolve numerical degeneracy, while simultaneously enables them to notably slide inside the parametric domain to account for the shape parameterization. Besides, to accurately respond to sharp contact, our method can also insert or remove nodes online and adjust the energy stiffness to suppress possible jittering artifacts that could be excited in a stiff system. As a result of this hybrid rh-adaption, our proposed method is capable of reproducing many realistic rod dynamics, such as excessive bending, twisting and knotting while only using a limited number of elements.
Weakly Supervised Part-wise 3D Shape Reconstruction from Single-View RGB Images
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Niu, Chengjie; Yu, Yang; Bian, Zhenwei; Li, Jun; Xu, Kai; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
In order for the deep learning models to truly understand the 2D images for 3D geometry recovery, we argue that singleview reconstruction should be learned in a part-aware and weakly supervised manner. Such models lead to more profound interpretation of 2D images in which part-based parsing and assembling are involved. To this end, we learn a deep neural network which takes a single-view RGB image as input, and outputs a 3D shape in parts represented by 3D point clouds with an array of 3D part generators. In particular, we devise two levels of generative adversarial network (GAN) to generate shapes with both correct part shape and reasonable overall structure. To enable a self-taught network training, we devise a differentiable projection module along with a self-projection loss measuring the error between the shape projection and the input image. The training data in our method is unpaired between the 2D images and the 3D shapes with part decomposition. Through qualitative and quantitative evaluations on public datasets, we show that our method achieves good performance in part-wise single-view reconstruction.
InstanceFusion: Real-time Instance-level 3D Reconstruction Using a Single RGBD Camera
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Lu, Feixiang; Peng, Haotian; Wu, Hongyu; Yang, Jun; Yang, Xinhang; Cao, Ruizhi; Zhang, Liangjun; Yang, Ruigang; Zhou, Bin; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
We present InstanceFusion, a robust real-time system to detect, segment, and reconstruct instance-level 3D objects of indoor scenes with a hand-held RGBD camera. It combines the strengths of deep learning and traditional SLAM techniques to produce visually compelling 3D semantic models. The key success comes from our novel segmentation scheme and the efficient instancelevel data fusion, which are both implemented on GPU. Specifically, for each incoming RGBD frame, we take the advantages of the RGBD features, the 3D point cloud, and the reconstructed model to perform instance-level segmentation. The corresponding RGBD data along with the instance ID are then fused to the surfel-based models. In order to sufficiently store and update these data, we design and implement a new data structure using the OpenGL Shading Language. Experimental results show that our method advances the state-of-the-art (SOTA) methods in instance segmentation and data fusion by a big margin. In addition, our instance segmentation improves the precision of 3D reconstruction, especially in the loop closure. InstanceFusion system runs 20.5Hz on a consumer-level GPU, which supports a number of augmented reality (AR) applications (e.g., 3D model registration, virtual interaction, AR map) and robot applications (e.g., navigation, manipulation, grasping). To facilitate future research and reproduce our system more easily, the source code, data, and the trained model are released on Github: https://github.com/Fancomi2017/InstanceFusion.
Not All Areas Are Equal: A Novel Separation-Restoration-Fusion Network for Image Raindrop Removal
(The Eurographics Association and John Wiley & Sons Ltd., 2020) Ren, Dongdong; Li, Jinbao; Han, Meng; Shu, Minglei; Eisemann, Elmar and Jacobson, Alec and Zhang, Fang-Lue
Detecting and removing raindrops from an image while keeping the high quality of image details has attracted tremendous studies, but remains a challenging task due to the inhomogeneity of the degraded region and the complexity of the degraded intensity. In this paper, we get rid of the dependence of deep learning on image-to-image translation and propose a separationrestoration- fusion network for raindrops removal. Our key idea is to recover regions of different damage levels individually, so that each region achieves the optimal recovery result, and finally fuse the recovered areas. In the region restoration module, to complete the restoration of a specific area, we propose a multi-scale feature fusion global information aggregation attention network to achieve global to local information aggregation. Besides, we also design an inside and outside dense connection dilated network, to ensure the fusion of the separated regions and the fine restoration of the image. The qualitatively and quantitatively evaluations are conducted to evaluate our method with the latest existing methods. The result demonstrates that our method outperforms state-of-the-art methods by a large margin on the benchmark datasets in extensive experiments.

Browse

Browsing 39-Issue 7 by Issue Date

Results Per Page

Sort Options