In complex real-world tasks such as robotic manipulation and autonomous driving, collecting expert demonstrations is often more straightforward than specifying precise learning objectives and task descriptions. Learning from expert data can be achieved through behavioral cloning or by learning a reward function, i.e., inverse reinforcement learning. The latter allows for training with additional data outside the training distribution, guided by the inferred reward function. We propose a novel approach to construct compact and transparent reward models from automatically selected state features. These inferred rewards have an explicit form and enable the learning of policies that closely match expert behavior by training standard reinforcement learning algorithms from scratch. We validate our method’s performance in various robotic environments with continuous and high-dimensional state spaces.
Pre-defined manipulation primitives are widely used for cloth manipulation. However, cloth properties such as its stiffness or density can highly impact the performance of these primitives. Although existing solutions have tackled the parameterisation of pick and place locations, the effect of factors such as the velocity or trajectory of quasi-static and dynamic manipulation primitives has been neglected. Choosing appropriate values for these parameters is crucial to cope with the range of materials present in house-hold cloth objects. To address this challenge, we introduce the Quasi-Dynamic Parameterisable (QDP) method, which optimises parameters such as the motion velocity in addition to the pick and place positions of quasi-static and dynamic manipulation primitives. In this work, we leverage the framework of Sequential Reinforcement Learning to decouple sequentially the parameters that compose the primitives. To evaluate the effectiveness of the method we focus on the task of cloth unfolding with a robotic arm in simulation and real-world experiments. Our results in simulation show that by deciding the optimal parameters for the primitives the performance can improve by 20% compared to sub-optimal ones. Real-world results demonstrate the advantage of modifying the velocity and height of manipulation primitives for cloths with different mass, stiffness, shape and size.
Safe operation of systems such as robots requires them to plan and execute trajectories subject to safety constraints. When those systems are subject to uncertainties in their dynamics, it is challenging to ensure that the constraints are not violated. In this letter, we propose Safe-CDDP, a safe trajectory optimization and control approach for systems under additive uncertainties and nonlinear safety constraints based on constrained differential dynamic programming (DDP). The safety of the robot during its motion is formulated as chance constraints with user-chosen probabilities of constraint satisfaction. The chance constraints are transformed into deterministic ones in DDP formulation by constraint tightening. To avoid over-conservatism during constraint tightening, linear control gains of the feedback policy derived from the constrained DDP are used in the approximation of closed-loop uncertainty propagation in prediction. The proposed algorithm is empirically evaluated on three different robot dynamics with up to 12 degrees of freedom in simulation. The computational feasibility and applicability of the approach are demonstrated with a physical hardware implementation.
@article{alcan2022differential, title={Differential dynamic programming with nonlinear safety constraints under system uncertainties}, author={Alcan, Gokhan and Kyrki, Ville}, journal={IEEE Robotics and Automation Letters}, volume={7}, number={2}, pages={1760--1767}, year={2022}, publisher={IEEE} }
Robotic manipulation of cloth is a challenging task due to the high dimensionality of the configuration space and complexity of dynamics affected by various material properties.The effect of the complex dynamics is even more pronounced in dynamic folding, for example, when a square piece of fabric is folded in two by a single manipulator. To account for the complexity and uncertainties, feedback of the cloth state using e.g. vision is typically needed. However, construction of visual feedback policies for dynamic cloth folding is an open problem. In this paper, we present a solution that learns policies in simulation using Reinforcement Learning (RL) and transfers the learned policies directly to the real world. In addition, to learn a single policy that manipulates multiple materials, we randomize the material properties in simulation. We evaluate the contributions of visual feedback and material randomization in real world experiments. The experimental results demonstrate that the proposed solution can fold successfully different fabric types using dynamic manipulation in the real world.
@inproceedings{hietala2022learning, title={Learning visual feedback control for dynamic cloth folding}, author={Hietala, Julius and Blanco--Mulero, David and Alcan, Gokhan and Kyrki, Ville}, booktitle={2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, pages={1455--1462}, year={2022}, organization={IEEE} }
Magnetic manipulation of particles at close vicinity is a challenging task. In this paper, we propose simultaneous and independent manipulation of two identical particles at close vicinity using two mobile robotic electromagnetic needles. We developed a neural network that can predict the magnetic flux density gradient for any given needle positions. Using the neural network, we developed a control algorithm to solve the optimal needle positions that generate the forces in the required directions while keeping a safe distance between the two needles and particles. We applied our method in five typical cases of simultaneous and independent microparticle manipulation, with the closest particle separation of 30 µm.
@inproceedings{isitman2022simultaneous, title={Simultaneous and Independent Micromanipulation of Two Identical Particles with Robotic Electromagnetic Needles}, author={Isitman, Ogulcan and Kandemir, Hakan and Alcan, Gokhan and Cenev, Zoran and Zhou, Quan}, booktitle={2022 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS)}, pages={1--6}, year={2022}, organization={IEEE} }
Overtaking is one of the most challenging tasks in driving, and the current solutions to autonomous overtaking are limited to simple and static scenarios. In this paper, we present a method for behaviour and trajectory planning for safe autonomous overtaking. The proposed method optimizes the trajectory by simultaneously enforcing safety and minimizing intrusion onto the adjacent lane. Furthermore, the method allows the overtaking to be aborted, enabling the autonomous vehicle to merge back in the lane, if safety is compromised, because of e.g. traffic in opposing direction appearing during the maneuver execution. A finite state machine is used to select an appropriate maneuver at each time, and a combination of safe and reachable sets is used to iteratively generate intermediate reference targets based on the current maneuver. A nonlinear model predictive controller then plans dynamically feasible and collision-free trajectories to these intermediate reference targets. Simulation experiments demonstrate that the combination of intermediate reference generation and model predictive control is able to handle multiple behaviors, including following a lead vehicle, overtaking and aborting the overtake, within a single framework.
@inproceedings{palatti2021planning, title={Planning for safe abortable overtaking maneuvers in autonomous driving}, author={Palatti, Jiyo and Aksjonov, Andrei and Alcan, Gokhan and Kyrki, Ville}, booktitle={2021 IEEE International Intelligent Transportation Systems Conference (ITSC)}, pages={508--514}, year={2021}, organization={IEEE} }
In this paper, we present a real-time driver evaluation system for heavy-duty vehicles by focusing on the classification of risky acceleration and braking behaviors. We utilize an improved version of our previous Long Short Term Memory (LSTM) based acceleration behavior model [10] to evaluate varying acceleration behaviors of a truck driver in small time periods. This model continuously classifies a driver as one of six driver classes with specified longitudinal-lateral aggression levels, using driving signals as time-series inputs. The driver gets acceleration score updates based on assigned classes and the geometry of driven road sections. To evaluate the braking behaviors of a truck driver, we propose a braking behavior model, which uses a novel approach to analyze deceleration patterns formed during brake operations. The braking score of a driver is updated for each brake event based on the pattern, magnitude, and frequency evaluations. The proposed driver evaluation system has achieved significant results in both the classification and evaluation of acceleration and braking behaviors.
@inproceedings{mumcuoglu2020driver, title={Driver evaluation in heavy duty vehicles based on acceleration and braking behaviors}, author={Mumcuoglu, Mehmet Emin and Alcan, Gokhan and Unel, Mustafa and Cicek, Onur and Mutluergil, Mehmet and Yilmaz, Metin and Koprubasi, Kerem}, booktitle={IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society}, pages={447--452}, year={2020}, organization={IEEE} }
In this paper, a new data-driven modeling of a diesel engine soot emission formation using gated recurrent unit (GRU) networks is proposed. Different from the traditional time series prediction methods such as nonlinear autoregressive with exogenous input (NARX) approach, GRU structure does not require the determination of the pure time delay between the inputs and the output, and the number of regressors does not have to be chosen beforehand. Gates in a GRU network enable to capture such dependencies on the past input values without any prior knowledge. As a design of experiment, 30 different points in engine speed – injected fuel quantity plane are determined and the rest of the input channels, i.e., rail pressure, main start of injection, equivalence ratio, and intake oxygen concentration are excited with chirp signals in the intended regions of operation. Experimental results show that the prediction performances of GRU based soot models are quite satisfactory with 77% training and 57% validation fit accuracies and normalized root mean square error (NRMSE) values are less than 0.038 and 0.069, respectively. GRU soot models surpass the traditional NARX based soot models in both steady-state and transient cycles.
@article{alcan2019estimating, title={Estimating Soot Emission in Diesel Engines Using Gated Recurrent Unit Network}, author={Alcan, Gokhan and Yilmaz, Emre and Unel, Mustafa and Aran, Volkan and Yilmaz, Metin and Gurel, Cetin and Koprubasi, Kerem}, journal={IFAC-PapersOnLine}, volume={52}, number={5}, pages={544--549}, year={2019}, publisher={Elsevier} }
Researchers in the automotive industry aim to enhance the performance, safety and energy management of intelligent vehicles with driver assistance systems. The performance of such systems can be improved with a better understanding of driving behaviors. In this paper, a driving behavior recognition algorithm is developed with a Long Short Term Memory (LSTM) Network using driver models of IPG’s TruckMaker. Six driver models are designed based on longitudinal and lateral acceleration limits. The proposed algorithm is trained with driving signals of those drivers controlling a realistic truck model with five different trailer loads on an artificial training road. This training road is designed to cover possible road curves that can be seen in freeways and rural highways. Finally, the algorithm is tested with driving signals that are collected with the same method on a realistic road. Results show that the LSTM structure has a substantial capability to recognize dynamic relations between driving signals even in small time periods.
@inproceedings{mumcuoglu2019driving, title={Driving Behavior Classification Using Long Short Term Memory Networks}, author={Mumcuoglu, Mehmet Emin and Alcan, Gokhan and Unel, Mustafa and Cicek, Onur and Mutluergil, Mehmet and Yilmaz, Metin and Koprubasi, Kerem}, booktitle={2019 AEIT International Conference of Electrical and Electronic Technologies for Automotive (AEIT AUTOMOTIVE)}, pages={1--6}, year={2019}, organization={IEEE} }
In this paper, NOx emissions from a diesel engine are modeled with nonlinear autoregressive with exogenous input (NARX) model. Airpath and fuelpath channels are excited by chirp signals where the frequency profile of each channel is generated by increasing the number of sweeps. Past values of the output are employed only in linear prediction with all input regressors, and the most significant input regressors are selected for the nonlinear prediction by orthogonal least square (OLS) algorithm and error reduction ratio. Experimental results show that NOx emissions can be modeled with high validation performance and models obtained using a reduced set of regressors perform better in terms of stability and robustness.
@article{alcan2018diesel, title={Diesel engine NOx emission modeling using a new experiment design and reduced set of regressors}, author={Alcan, Gokhan and Unel, Mustafa and Aran, Volkan and Yilmaz, Metin and Gurel, Cetin and Koprubasi, Kerem}, journal={IFAC-PapersOnLine}, volume={51}, number={15}, pages={168--173}, year={2018}, publisher={Elsevier} }