Eduardo Romera Web

Eduardo Romera

Hi! I'm Research Scientist @ ObjectVideo Labs / Alarm.com. Previously I was Chief Technology Officer (CTO) @ Pixcellence Inc., which got acquired by Alarm.com. I also collaborate actively with RobeSafe research group as a Postdoc @ University of Alcala (UAH). My work and knowledge is centered around Artificial Intelligence and Deep Learning.

In 2018, I finished my PhD (cum laude) in AI/DL/CV applied to the Perception of Autonomous Vehicles at the University of Alcala (UAH), with Luis Miguel Bergasa. I performed PhD internships at LISA/CVRR laboratories at the UCSD (California) with Mohan Trivedi and at NiCTA/CSIRO in Canberra (Australia) with Jose M. Alvarez.

Before my PhD, I obtained a M.Sc in Electronics from the UAH in 2015. Before that, I received the B.Sc. and M.Sc. in Telecommunications Engineering from the UAH in 2014. I did my last year at KIT University (Germany) while I worked as a researcher at the Fraunhofer IOSB, with Eduardo Monari.

Work Experience

Present 10/2019

Research Scientist

ObjectVideo Labs / Alarm.com, Remote / Spain
10/2019 12/2018

Chief Technology Officer (CTO)

Pixcellence Inc., Remote / Spain
12/2018 10/2018

AI Tech Lead

Pixcellence Inc., Remote / Spain
3/2018 4/2018

Computer Vision Scientist

Nielsen, Madrid (Spain)
Present 9/2014

Researcher @ Electronics dpt.

University of Alcala (UAH), Alcala de Henares (Spain)

Research Experience

Present 11/2018

Post-doctoral researcher

University of Alcalá, RobeSafe Research group in Electronics department
11/2018 9/2015

Ph.D. (honored with cum laude)

University of Alcalá, RobeSafe Research group in Electronics department
12/2017 9/2017

Researcher (PhD Visit)

University of California San Diego, California (USA)
12/2016 9/2016

Researcher (PhD Visit)

NICTA/CSIRO (Data61), Canberra (Australia)
9/2015 9/2014

Researcher

University of Alcalá, RobeSafe Research group in Electronics department
9/2014 10/2013

Research Assistant

Fraunhofer IOSB, Karlsruhe (Germany)

Education

Ph.D. Nov 2018

Ph.D. (cum laude) in Deep Learning/Computer Vision applied to Autonomous Vehicles

University of Alcalá (UAH), Spain
M.Sc.July 2015

Master in Electronics: "Master in Advanced Electronic Systems. Intelligent Systems"

University of Alcalá (UAH), Spain
Erasmus2013-2014

Fulfilled last year and Final Project of my Telecommunication studies

Karlsruher Institute for Technology (KIT), Karlsruhe (Germany)
B.Sc. + M.Sc.Sept 2014

5-years degree in Telecommunications Engineering (Ingeniería Superior en Telecomunicaciones)

University of Alcalá (UAH), Spain

Honors, Awards and Grants

Nov 2018

PhD honored with cum laude

University of Alcala (UAH), Madrid, Spain
June 2017

Best Student Paper Award (1st Prize), IV 2017

IEEE Intelligent Vehicles Symposium (IV 2017)
November 2015

Best Master Thesis on Intelligent Transportation Systems - Second Prize

IEEE Intelligent Transportation Systems Society (ITSS), Spanish Chapter
July 2015

Honored Master Thesis

University of Alcala (UAH), Madrid, Spain
March 2015

4-year "FPI" grant to perform my Ph.D.

University of Alcalá (UAH), Madrid, Spain
2013-2014

Erasmus grant to study in Germany

Publications

Please Note: I DON'T UPDATE THIS SECTION OFTEN! For a more updated list please see MY GOOGLE SCHOLAR

Filter by type:

Sort by year:

Vehicle Detection and Localization using 3D LIDAR Point Cloud and Image Semantic Segmentation

R. Barea, C. Pérez, L.M. Bergasa, E. López, E. Romera, E. Molinos, M. Ocaña amd J. López

Conference PapersIEEE Intelligent Transportation Systems Conference (ITSC), Hawaii, USA, Nov 2018 (Accepted paper)

Abstract

Unifying Terrain Awareness for the Visually Impaired through Real-Time Semantic Segmentation

K. Yang, K. Wang, L.M. Bergasa, E. Romera, W. Hu, D. Sun, J. Sun, R. Cheng, T. Chen and E. Lopez.

Journal PapersSensors, 2018

Abstract

Navigational assistance aims to help visually-impaired people to ambulate the environment safely and independently. This topic becomes challenging as it requires detecting a wide variety of scenes to provide higher level assistive awareness. Vision-based technologies with monocular detectors or depth sensors have sprung up within several years of research. These separate approaches have achieved remarkable results with relatively low processing time and have improved the mobility of impaired people to a large extent. However, running all detectors jointly increases the latency and burdens the computational resources. In this paper, we put forward seizing pixel-wise semantic segmentation to cover navigation-related perception needs in a unified way. This is critical not only for the terrain awareness regarding traversable areas, sidewalks, stairs and water hazards, but also for the avoidance of short-range obstacles, fast-approaching pedestrians and vehicles. The core of our unification proposal is a deep architecture, aimed at attaining efficient semantic understanding. We have integrated the approach in a wearable navigation system by incorporating robust depth segmentation. A comprehensive set of experiments prove the qualified accuracy over state-of-the-art methods while maintaining real-time speed. We also present a closed-loop field test involving real visually-impaired users, demonstrating the effectivity and versatility of the assistive framework.

Train Here, Deploy There: Robust Segmentation in Unseen Domains

E. Romera, L. M. Bergasa, J. M. Alvarez and M. Trivedi

Conference PapersIEEE Intelligent Vehicles Symposium (IV), Changshu, China, June 2018

Abstract

Semantic Segmentation methods play a key role in today’s Autonomous Driving research, since they provide a global understanding of the traffic scene for upper-level tasks like navigation. However, main research efforts are being put on enlarging deep architectures to achieve marginal accuracy boosts in existing datasets, forgetting that these algorithms must be deployed in a real vehicle with images that were not seen during training. On the other hand, achieving robustness in any domain is not an easy task, since deep networks are prone to overfitting even with thousands of training images. In this paper, we study in a systematic way what is the gap between the concepts of “accuracy” and “robustness”. A comprehensive set of experiments demonstrates the relevance of using data augmentation to yield models that can produce robust semantic segmentation outputs in any domain. Our results suggest that the existing domain gap can be significantly reduced when appropriate augmentation techniques regarding geometry (position and shape) and texture (color and illumination) are applied. In addition, the proposed training process results in better calibrated models, which is of special relevance to assess the robustness of current systems.

CNN-based Fisheye Image Real-Time Semantic Segmentation

A. Saez, L.M. Bergasa, E. Romera, E. Lopez, R. Barea

Conference PapersIEEE Intelligent Vehicles Symposium (IV), Changshu, China, June 2018

Abstract

Semantic segmentation based on Convolutional Neural Networks (CNNs) has been proven as an efficient way of facing scene understanding for autonomous driving applications. Traditionally, environment information is acquired using narrow-angle pin-hole cameras, but autonomous vehicles need wider field of view to perceive the complex surrounding, especially in urban traffic scenes. Fisheye cameras have begun to play an increasingly role to cover this need. This paper presents a real-time CNN-based semantic segmentation solution for urban traffic images using fisheye cameras. We adapt our Efficient Residual Factorized CNN (ERFNet) architecture to handle distorted fish-eye images. A new fisheye image dataset for semantic segmentation from the existing CityScapes dataset is generated to train and evaluate our CNN. We also test a data augmentation suggestion for fisheye image proposed in [1]. Experiments show outstanding results of our proposal regarding other methods of the state of the art.

Unifying terrain awareness through real-time semantic segmentation

K. Yang, L.M. Bergasa, E. Romera, R. Cheng, T. Chen and K. Wang

Conference PapersIEEE Intelligent Vehicles Symposium (IV), Changshu, China, June 2018

Abstract

Active research on computer vision accelerates the progress in autonomous driving. Following this trend, we aim to leverage the recently emerged methods for Intelligent Vehicles (IV), and transfer them to develop navigation assistive technologies for the Visually Impaired (VI). This topic grows notoriously challenging as it requires to detect a variety of scenes towards higher level of assistance. Computer vision based techniques with monocular detectors or depth sensors sprung up within years of research. These separate approaches achieved remarkable results with relatively low processing time, and improved the mobility of visually impaired people to a large extent. However, running all detectors jointly increases the latency and burdens the computational resources. In this paper, we put forward to seize pixel-wise semantic segmentation to cover the perception needs of navigational assistance in a unified way. This is critical not only for the terrain awareness regarding traversable areas, sidewalks, stairs and water hazards, but also for the avoidance of short-range obstacles, fast-approaching pedestrians and vehicles. At the heart of our proposal is a combination of efficient residual factorized network (ERFNet), pyramid scene parsing network (PSPNet) and 3D point cloud based segmentation. This approach proves to be with qualified accuracy and speed for real-world applications by a comprehensive set of experiments on a wearable navigation system.

ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation

E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo

Journal PapersIEEE Transactions on Intelligent Transportation Systems (T-ITS), Dec. 2017

Abstract

Semantic segmentation is a challenging task that addresses most of the perception needs of Intelligent Vehicles (IV) in an unified way. Deep Neural Networks excel at this task, as they can be trained end-to-end to accurately classify multiple object categories in an image at pixel level. However, a good trade-off between high quality and computational resources is yet not present in state-of-the-art semantic segmentation approaches, limiting their application in real vehicles. In this paper, we propose a deep architecture that is able to run in real-time while providing accurate semantic segmentation. The core of our architecture is a novel layer that uses residual connections and factorized convolutions in order to remain efficient while retaining remarkable accuracy. Our approach is able to run at over 83 FPS in a single Titan X, and 7 FPS in a Jetson TX1 (embedded GPU). A comprehensive set of experiments on the publicly available Cityscapes dataset demonstrates that our system achieves an accuracy that is similar to the state of the art, while being orders of magnitude faster to compute than other architectures that achieve top precision. The resulting trade-off makes our model an ideal approach for scene understanding in IV applications. The code is publicly available at: https://github.com/Eromera/erfnet

Are you ABLE to perform a life-long visual topological localization

R. Arroyo, P.F. Alcantarilla, L.M. Bergasa and E. Romera.

Journal PapersAutonomous Robots (AURO), 2017

Abstract

Visual topological localization is a process typically required by varied mobile autonomous robots, but it is a complex task if long operating periods are considered. This is because of the appearance variations suffered in a place: dynamic elements, illumination or weather. Due to these problems, long-term visual place recognition across seasons has become a challenge for the robotics community. For this reason, we propose an innovative method for a robust and efficient life-long localization using cameras. In this paper, we describe our approach (ABLE), which includes three different versions depending on the type of images: monocular, stereo and panoramic. This distinction makes our proposal more adaptable and effective, because it allows to exploit the extra information that can be provided by each type of camera. Besides, we contribute a novel methodology for identifying places, which is based on a fast matching of global binary descriptors extracted from sequences of images. The presented results demonstrate the benefits of using ABLE, which is compared to the most representative state-of-the-art algorithms in long-term conditions.

Efficient ConvNet for Real-time Semantic Segmentation

E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo

Conference PapersIEEE Intelligent Vehicles Symposium (IV), pp. 1789-1794, Redondo Beach (California, USA), June 2017, Best Student Paper Award

Abstract

Semantic segmentation is a task that covers most of the perception needs of intelligent vehicles in an unified way. ConvNets excel at this task, as they can be trained end-to-end to accurately classify multiple object categories in an image at the pixel level. However, current approaches normally involve complex architectures that are expensive in terms of computational resources and are not feasible for ITS applications. In this paper, we propose a deep architecture that is able to run in real-time while providing accurate semantic segmentation. The core of our ConvNet is a novel layer that uses residual connections and factorized convolutions in order to remain highly efficient while still retaining remarkable performance. Our network is able to run at 83 FPS in a single Titan X, and at more than 7 FPS in a Jetson TX1 (embedded GPU). A comprehensive set of experiments demonstrates that our system, trained from scratch on the challenging Cityscapes dataset, achieves a classification performance that is among the state of the art, while being orders of magnitude faster to compute than other architectures that achieve top precision. This makes our model an ideal approach for scene understanding in intelligent vehicles applications.

A Multi-Sensorial Simultaneous Localization and Mapping (SLAM) System for Low-Cost Micro Aerial Vehicles in GPS-Denied Environments

E. López, S. García, R. Barea, L. M. Bergasa, E. J. Molinos, R. Arroyo, E. Romera and S. Pardo

Journal PapersSensors 2017, vol. 17, no. 4, pp. 802, April 2017

Abstract

One of the main challenges of aerial robots navigation in indoor or GPS-denied environments is position estimation using only the available onboard sensors. This paper presents a Simultaneous Localization and Mapping (SLAM) system that remotely calculates the pose and environment map of different low-cost commercial aerial platforms, whose onboard computing capacity is usually limited. The proposed system adapts to the sensory configuration of the aerial robot, by integrating different state-of-the art SLAM methods based on vision, laser and/or inertial measurements using an Extended Kalman Filter (EKF). To do this, a minimum onboard sensory configuration is supposed, consisting of a monocular camera, an Inertial Measurement Unit (IMU) and an altimeter. It allows to improve the results of well-known monocular visual SLAM methods (LSD-SLAM and ORB-SLAM are tested and compared in this work) by solving scale ambiguity and providing additional information to the EKF. When payload and computational capabilities permit, a 2D laser sensor can be easily incorporated to the SLAM system, obtaining a local 2.5D map and a footprint estimation of the robot position that improves the 6D pose estimation through the EKF. We present some experimental results with two different commercial platforms, and validate the system by applying it to their position control.

Fusion and binarization of CNN features for robust topological localization across seasons

R. Arroyo, P. F. Alcantarilla, L. M. Bergasa and E. Romera

Conference Papers IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4656-4663, Daejeon (Korea), October 2016

Abstract

The extreme variability in the appearance of a place across the four seasons of the year is one of the most challenging problems in life-long visual topological localization for mobile robotic systems and intelligent vehicles. Traditional solutions are typically based on the description of images using manually hand-crafted features, which have demonstrated not to be completely reliable against these seasonal changes. In this paper, we present a new proposal focused on robust automatically learned features, which are processed by means of a revolutionary concept recently popularized in the computer vision community: Convolutional Neural Networks (CNNs). Commonly, deep learning involves a high consumption of resources and computational costs. Due to this, we contribute our CNN-VTL architecture adapted to the conditions of our place recognition system, with the aim of optimizing the efficiency maintaining the effectiveness. The final CNN features are also reduced as possible using compression techniques and binarized for a fast matching based on the Hamming distance. A wide set of results is discussed, confirming the outstanding performance of our method against the main state-of-the-art algorithms and over varied long-term datasets recorded across seasons.

Need Data for Driver Behaviour Analysis? Presenting the Public UAH-DriveSet

E. Romera, L.M. Bergasa and R. Arroyo

Conference PapersIEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 387-392, Rio de Janeiro (Brazil), November 2016

Abstract

Driving analysis is a recent topic of interest due to the growing safety concerns in vehicles. However, the lack of publicly available driving data currently limits the progress on this field. Machine learning techniques could highly enhance research, but they rely on large amounts of data which are difficult and very costly to obtain through Naturalistic Driving Studies (NDSs), resulting in limited accessibility to the general research community. Additionally, the proliferation of smartphones has provided a cheap and easy-to-deploy platform for driver behavior sensing, but existing applications do not provide open access to their data. For these reasons, this paper presents the UAH-DriveSet, a public dataset that allows deep driving analysis by providing a large amount of data captured by our driving monitoring app DriveSafe. The application is run by 6 different drivers and vehicles, performing 3 different behaviors (normal, drowsy and aggressive) on two types of roads (motorway and secondary road), resulting in more than 500 minutes of naturalistic driving with its associated raw data and processed semantic information, together with the video recordings of the trips. This work also introduces a tool that helps to plot the data and display the trip videos simultaneously, in order to ease data analytics. The UAH-DriveSet is available at: http://www.robesafe.com/personal/eduardo.romera/uah-driveset

OpenABLE: An Open-Source Toolbox for Application in Life-Long Visual Localization of Autonomous Vehicles

R. Arroyo, P. F. Alcantarilla, L. M. Bergasa and E. Romera

Conference PapersIEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 965-970, Rio de Janeiro (Brazil), November 2016

Abstract

Life-long visual localization is one of the most challenging topics in robotics over the last few years. The difficulty of this task is in the strong appearance changes that a place suffers due to dynamic elements, illumination, weather or seasons. In this paper, we propose a novel method (ABLE-M) to cope with the main problems of carrying out a robust visual topological localization along time. The novelty of our approach resides in the description of sequences of monocular images as binary codes, which are extracted from a global LDB descriptor and efficiently matched using FLANN for fast nearest neighbor search. Besides, an illumination invariant technique is applied. The usage of the proposed binary description and matching method provides a reduction of memory and computational costs, which is necessary for long-term performance. Our proposal is evaluated in different life-long navigation scenarios, where ABLE-M outperforms some of the main state-of-the-art algorithms, such as WI-SURF, BRIEF-Gist, FAB-MAP or SeqSLAM. Tests are presented for four public datasets where a same route is traversed at different times of day or night, along the months or across all four seasons.

Adaptive Fuzzy Classifier to Detect Driving Events from the Inertial Sensors of a Smartphone

C. Arroyo, L. M. Bergasa and E. Romera

Conference PapersIEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 1896-1901, Rio de Janeiro (Brazil), November 2016

Abstract

In the last years there has been a rising interest in monitoring driver behaviors by using smartphones, due to their increasing market penetration. Inertial sensors embedded in these devices are key to carry out this task. Most of the state-of-the-art apps use fix thresholds to detect driving events from the inertial sensors. However, sensors output values can differ depending on many parameters. In this paper we present an Adaptive Fuzzy Classifier to identify sudden driving events (acceleration, steering, braking) and road bumps from the inertial and GPS sensors. An on-line calibration method is proposed to adjust the decision thresholds of the Membership Functions (MFs) to the specific phone pose and vehicle dynamics. To validate our method, we use the UAH-Driveset database, which includes more than 500 minutes of naturalistic driving, and we compare results with our previous DriveSafe app version, based on fix thresholds. Results show a notable improvement in the events detection regarding our previous version.

Can we unify monocular detectors for autonomous driving by using the pixel-wise semantic segmentation of CNNs?

E. Romera, L.M. Bergasa and R. Arroyo

Conference Papers (WS)IEEE Intelligent Vehicles Symposium (IV), Gothenburg (Sweden), June 2016. Workshop: "DeepDriving. Learning Representations for Intelligent Vehicles"

Abstract

Autonomous driving is a challenging topic that requires complex solutions in perception tasks such as recognition of road, lanes, traffic signs or lights, vehicles and pedestrians. Through years of research, computer vision has grown capable of tackling these tasks with monocular detectors that can provide remarkable detection rates with relatively low processing times. However, the recent appearance of Convolutional Neural Networks (CNNs) has revolutionized the computer vision field and has made possible approaches to perform full pixel-wise semantic segmentation in times close to real time (even on hardware that can be carried on a vehicle). In this paper, we propose to use full image segmentation as an approach to simplify and unify most of the detection tasks required in the perception module of an autonomous vehicle, analyzing major concerns such as computation time and detection performance.

A Real-time Multi-scale Vehicle Detection and Tracking Approach for Smartphones

E. Romera, L.M. Bergasa and R. Arroyo

Conference Papers IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 1298-1303, Las Palmas, Canary Islands (Spain), September 2015

Abstract

Automated vehicle detection is a research field in constant evolution due to the new technological advances and security requirements demanded by the current intelligent transportation systems. For these reasons, in this paper we present a vision-based vehicle detection and tracking pipeline, which is able to run on an iPhone in real time. An approach based on smartphone cameras supposes a versatile solution and an alternative to other expensive and complex sensors on the vehicle, such as LiDAR or other range-based methods. A multi-scale proposal and simple geometry consideration of the roads based on the vanishing point are combined to overcome the computational constraints. Our algorithm is tested on a publicly available road dataset, thus demonstrating its real applicability to ADAS or autonomous driving.

Towards Life-Long Visual Localization using an Efficient Matching of Binary Sequences from Images

R. Arroyo, P. F. Alcantarilla, L. M. Bergasa and E. Romera

Conference Papers IEEE International Conference on Robotics and Automation (ICRA), pp. 6328-6335, Seattle, Washington (United States), May 2015.

Abstract

Downloads

UAH-DriveSet & DriveSafe

Publicly available set of driving data recorded using our DriveSafe app, also available for free at the Apple store.

ERFNet (PyTorch)

Github repository to run ERFNet, an Efficient ConvNet for Real-time Semantic Segmentation (Full PyTorch code for training and evaluation and trained models used in the papers)

ERFNet (Torch)

Github repository to run ERFNet, an Efficient ConvNet for Real-time Semantic Segmentation (Full Torch code for training and evaluation and trained models used in the papers)

Eduardo Romera

University of Alcalá

Eduardo Romera

Work Experience

Research Scientist

Chief Technology Officer (CTO)

AI Tech Lead

Computer Vision Scientist

Researcher @ Electronics dpt.

Research Experience

Post-doctoral researcher

Ph.D. (honored with cum laude)

Researcher (PhD Visit)

Researcher (PhD Visit)

Researcher

Research Assistant

Education

Honors, Awards and Grants

Publications

Filter by type:

Vehicle Detection and Localization using 3D LIDAR Point Cloud and Image Semantic Segmentation

Abstract

Unifying Terrain Awareness for the Visually Impaired through Real-Time Semantic Segmentation

Abstract

Train Here, Deploy There: Robust Segmentation in Unseen Domains

Abstract

CNN-based Fisheye Image Real-Time Semantic Segmentation

Abstract

Unifying terrain awareness through real-time semantic segmentation

Abstract

ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation

Abstract

Are you ABLE to perform a life-long visual topological localization

Abstract

Efficient ConvNet for Real-time Semantic Segmentation

Abstract

A Multi-Sensorial Simultaneous Localization and Mapping (SLAM) System for Low-Cost Micro Aerial Vehicles in GPS-Denied Environments

Abstract

Fusion and binarization of CNN features for robust topological localization across seasons

Abstract

Need Data for Driver Behaviour Analysis? Presenting the Public UAH-DriveSet

Abstract

OpenABLE: An Open-Source Toolbox for Application in Life-Long Visual Localization of Autonomous Vehicles

Abstract

Adaptive Fuzzy Classifier to Detect Driving Events from the Inertial Sensors of a Smartphone

Abstract

Can we unify monocular detectors for autonomous driving by using the pixel-wise semantic segmentation of CNNs?

Abstract

A Real-time Multi-scale Vehicle Detection and Tracking Approach for Smartphones

Abstract

Towards Life-Long Visual Localization using an Efficient Matching of Binary Sequences from Images

Abstract

Downloads

UAH-DriveSet & DriveSafe

ERFNet (PyTorch)

ERFNet (Torch)