Research Experience

  • Present 9/2015

    Ph.D. Student

    University of Alcalá, RobeSafe Research group in Electronics department

  • 12/2016 9/2016

    Visiting Ph.D. Student

    NICTA/CSIRO (Data61), Canberra (Australia)

  • 9/2015 9/2014

    Researcher

    University of Alcalá, RobeSafe Research group in Electronics department

  • 9/2014 10/2013

    Research Assistant

    Fraunhofer IOSB, Karlsruhe (Germany)

Education

  • Ph.D. student present

    Ph.D. student in Computer Vision

    University of Alcalá (UAH), Spain

  • M.Sc.July 2015

    Master in Electronics: "Master in Advanced Electronic Systems. Intelligent Systems"

    University of Alcalá (UAH), Spain

  • Erasmus year2013-2014

    Fulfilled last year and Final Project of my Telecommunication studies

    Karlsruher Institute for Technology (KIT), Karlsruhe (Germany)

  • B.Sc. + M.Sc.Sept 2014

    5-years degree in Telecommunications Engineering (Ingeniería Superior en Telecomunicaciones)

    University of Alcalá (UAH), Spain

Honors, Awards and Grants

  • June 2017
    Best Student Paper Award (1st Prize), IV 2017
    image IEEE Intelligent Vehicles Symposium (IV 2017)
  • November 2015
    Best Master Thesis on Intelligent Transportation Systems - Second Prize
    image IEEE Intelligent Transportation Systems Society (ITSS), Spanish Chapter
  • July 2015
    Honored Master Thesis
    image University of Alcalá (UAH), Madrid, Spain
  • March 2015
    4-year "FPI" grant to perform my Ph.D.
    image University of Alcalá (UAH), Madrid, Spain
  • 2013-2014
    Erasmus grant to study in Germany
    imageimage

Related links / Colleagues

Roberto Arroyo

Ph.D. Student and colleague

Web

Luis Miguel Bergasa

Full Professor and Director of RobeSafe group

Web

RobeSafe group

Robotics and eSafety Research Group

Web

University of Alcalá (UAH)

My University

Web

Filter by type:

Sort by year:

Efficient ConvNet for Real-time Semantic Segmentation

E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo
Conference PapersIEEE Intelligent Vehicles Symposium (IV), pp. 1789-1794, Redondo Beach (California, USA), June 2017, Best Student Paper Award

Abstract

Semantic segmentation is a task that covers most of the perception needs of intelligent vehicles in an unified way. ConvNets excel at this task, as they can be trained end-to-end to accurately classify multiple object categories in an image at the pixel level. However, current approaches normally involve complex architectures that are expensive in terms of computational resources and are not feasible for ITS applications. In this paper, we propose a deep architecture that is able to run in real-time while providing accurate semantic segmentation. The core of our ConvNet is a novel layer that uses residual connections and factorized convolutions in order to remain highly efficient while still retaining remarkable performance. Our network is able to run at 83 FPS in a single Titan X, and at more than 7 FPS in a Jetson TX1 (embedded GPU). A comprehensive set of experiments demonstrates that our system, trained from scratch on the challenging Cityscapes dataset, achieves a classification performance that is among the state of the art, while being orders of magnitude faster to compute than other architectures that achieve top precision. This makes our model an ideal approach for scene understanding in intelligent vehicles applications.

A Multi-Sensorial Simultaneous Localization and Mapping (SLAM) System for Low-Cost Micro Aerial Vehicles in GPS-Denied Environments

E. López, S. García, R. Barea, L. M. Bergasa, E. J. Molinos, R. Arroyo, E. Romera and S. Pardo
Journal PapersSensors 2017, vol. 17, no. 4, pp. 802, April 2017

Abstract

One of the main challenges of aerial robots navigation in indoor or GPS-denied environments is position estimation using only the available onboard sensors. This paper presents a Simultaneous Localization and Mapping (SLAM) system that remotely calculates the pose and environment map of different low-cost commercial aerial platforms, whose onboard computing capacity is usually limited. The proposed system adapts to the sensory configuration of the aerial robot, by integrating different state-of-the art SLAM methods based on vision, laser and/or inertial measurements using an Extended Kalman Filter (EKF). To do this, a minimum onboard sensory configuration is supposed, consisting of a monocular camera, an Inertial Measurement Unit (IMU) and an altimeter. It allows to improve the results of well-known monocular visual SLAM methods (LSD-SLAM and ORB-SLAM are tested and compared in this work) by solving scale ambiguity and providing additional information to the EKF. When payload and computational capabilities permit, a 2D laser sensor can be easily incorporated to the SLAM system, obtaining a local 2.5D map and a footprint estimation of the robot position that improves the 6D pose estimation through the EKF. We present some experimental results with two different commercial platforms, and validate the system by applying it to their position control.

Fusion and binarization of CNN features for robust topological localization across seasons

R. Arroyo, P. F. Alcantarilla, L. M. Bergasa and E. Romera
Conference Papers IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4656-4663, Daejeon (Korea), October 2016

Abstract

The extreme variability in the appearance of a place across the four seasons of the year is one of the most challenging problems in life-long visual topological localization for mobile robotic systems and intelligent vehicles. Traditional solutions are typically based on the description of images using manually hand-crafted features, which have demonstrated not to be completely reliable against these seasonal changes. In this paper, we present a new proposal focused on robust automatically learned features, which are processed by means of a revolutionary concept recently popularized in the computer vision community: Convolutional Neural Networks (CNNs). Commonly, deep learning involves a high consumption of resources and computational costs. Due to this, we contribute our CNN-VTL architecture adapted to the conditions of our place recognition system, with the aim of optimizing the efficiency maintaining the effectiveness. The final CNN features are also reduced as possible using compression techniques and binarized for a fast matching based on the Hamming distance. A wide set of results is discussed, confirming the outstanding performance of our method against the main state-of-the-art algorithms and over varied long-term datasets recorded across seasons.

Need Data for Driver Behaviour Analysis? Presenting the Public UAH-DriveSet

E. Romera, L.M. Bergasa and R. Arroyo
Conference PapersIEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 387-392, Rio de Janeiro (Brazil), November 2016

Abstract

Driving analysis is a recent topic of interest due to the growing safety concerns in vehicles. However, the lack of publicly available driving data currently limits the progress on this field. Machine learning techniques could highly enhance research, but they rely on large amounts of data which are difficult and very costly to obtain through Naturalistic Driving Studies (NDSs), resulting in limited accessibility to the general research community. Additionally, the proliferation of smartphones has provided a cheap and easy-to-deploy platform for driver behavior sensing, but existing applications do not provide open access to their data. For these reasons, this paper presents the UAH-DriveSet, a public dataset that allows deep driving analysis by providing a large amount of data captured by our driving monitoring app DriveSafe. The application is run by 6 different drivers and vehicles, performing 3 different behaviors (normal, drowsy and aggressive) on two types of roads (motorway and secondary road), resulting in more than 500 minutes of naturalistic driving with its associated raw data and processed semantic information, together with the video recordings of the trips. This work also introduces a tool that helps to plot the data and display the trip videos simultaneously, in order to ease data analytics. The UAH-DriveSet is available at: http://www.robesafe.com/personal/eduardo.romera/uah-driveset

OpenABLE: An Open-Source Toolbox for Application in Life-Long Visual Localization of Autonomous Vehicles

R. Arroyo, P. F. Alcantarilla, L. M. Bergasa and E. Romera
Conference PapersIEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 965-970, Rio de Janeiro (Brazil), November 2016

Abstract

Life-long visual localization is one of the most challenging topics in robotics over the last few years. The difficulty of this task is in the strong appearance changes that a place suffers due to dynamic elements, illumination, weather or seasons. In this paper, we propose a novel method (ABLE-M) to cope with the main problems of carrying out a robust visual topological localization along time. The novelty of our approach resides in the description of sequences of monocular images as binary codes, which are extracted from a global LDB descriptor and efficiently matched using FLANN for fast nearest neighbor search. Besides, an illumination invariant technique is applied. The usage of the proposed binary description and matching method provides a reduction of memory and computational costs, which is necessary for long-term performance. Our proposal is evaluated in different life-long navigation scenarios, where ABLE-M outperforms some of the main state-of-the-art algorithms, such as WI-SURF, BRIEF-Gist, FAB-MAP or SeqSLAM. Tests are presented for four public datasets where a same route is traversed at different times of day or night, along the months or across all four seasons.

Adaptive Fuzzy Classifier to Detect Driving Events from the Inertial Sensors of a Smartphone

C. Arroyo, L. M. Bergasa and E. Romera
Conference PapersIEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 1896-1901, Rio de Janeiro (Brazil), November 2016

Abstract

In the last years there has been a rising interest in monitoring driver behaviors by using smartphones, due to their increasing market penetration. Inertial sensors embedded in these devices are key to carry out this task. Most of the state-of-the-art apps use fix thresholds to detect driving events from the inertial sensors. However, sensors output values can differ depending on many parameters. In this paper we present an Adaptive Fuzzy Classifier to identify sudden driving events (acceleration, steering, braking) and road bumps from the inertial and GPS sensors. An on-line calibration method is proposed to adjust the decision thresholds of the Membership Functions (MFs) to the specific phone pose and vehicle dynamics. To validate our method, we use the UAH-Driveset database, which includes more than 500 minutes of naturalistic driving, and we compare results with our previous DriveSafe app version, based on fix thresholds. Results show a notable improvement in the events detection regarding our previous version.

Can we unify monocular detectors for autonomous driving by using the pixel-wise semantic segmentation of CNNs?

E. Romera, L.M. Bergasa and R. Arroyo
Conference Papers (WS)IEEE Intelligent Vehicles Symposium (IV), Gothenburg (Sweden), June 2016. Workshop: "DeepDriving. Learning Representations for Intelligent Vehicles"

Abstract

Autonomous driving is a challenging topic that requires complex solutions in perception tasks such as recognition of road, lanes, traffic signs or lights, vehicles and pedestrians. Through years of research, computer vision has grown capable of tackling these tasks with monocular detectors that can provide remarkable detection rates with relatively low processing times. However, the recent appearance of Convolutional Neural Networks (CNNs) has revolutionized the computer vision field and has made possible approaches to perform full pixel-wise semantic segmentation in times close to real time (even on hardware that can be carried on a vehicle). In this paper, we propose to use full image segmentation as an approach to simplify and unify most of the detection tasks required in the perception module of an autonomous vehicle, analyzing major concerns such as computation time and detection performance.

A Real-time Multi-scale Vehicle Detection and Tracking Approach for Smartphones

E. Romera, L.M. Bergasa and R. Arroyo
Conference Papers IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 1298-1303, Las Palmas, Canary Islands (Spain), September 2015

Abstract

Automated vehicle detection is a research field in constant evolution due to the new technological advances and security requirements demanded by the current intelligent transportation systems. For these reasons, in this paper we present a vision-based vehicle detection and tracking pipeline, which is able to run on an iPhone in real time. An approach based on smartphone cameras supposes a versatile solution and an alternative to other expensive and complex sensors on the vehicle, such as LiDAR or other range-based methods. A multi-scale proposal and simple geometry consideration of the roads based on the vanishing point are combined to overcome the computational constraints. Our algorithm is tested on a publicly available road dataset, thus demonstrating its real applicability to ADAS or autonomous driving.

Towards Life-Long Visual Localization using an Efficient Matching of Binary Sequences from Images

R. Arroyo, P. F. Alcantarilla, L. M. Bergasa and E. Romera
Conference Papers IEEE International Conference on Robotics and Automation (ICRA), pp. 6328-6335, Seattle, Washington (United States), May 2015.

Abstract

Life-long visual localization is one of the most challenging topics in robotics over the last few years. The difficulty of this task is in the strong appearance changes that a place suffers due to dynamic elements, illumination, weather or seasons. In this paper, we propose a novel method (ABLE-M) to cope with the main problems of carrying out a robust visual topological localization along time. The novelty of our approach resides in the description of sequences of monocular images as binary codes, which are extracted from a global LDB descriptor and efficiently matched using FLANN for fast nearest neighbor search. Besides, an illumination invariant technique is applied. The usage of the proposed binary description and matching method provides a reduction of memory and computational costs, which is necessary for long-term performance. Our proposal is evaluated in different life-long navigation scenarios, where ABLE-M outperforms some of the main state-of-the-art algorithms, such as WI-SURF, BRIEF-Gist, FAB-MAP or SeqSLAM. Tests are presented for four public datasets where a same route is traversed at different times of day or night, along the months or across all four seasons.

Lab Location