Optimal control of wind farms to maximize power is a challenging task since the wake interaction between the turbines is a highly nonlinear phenomenon. In recent years the field of Reinforcement Learning has made great contributions to nonlinear control problems and has been successfully applied to control and optimization in 2D laminar flows. In this work, Reinforcement Learning is applied to wind farm control for the first time to the authors’ best knowledge. To demonstrate the optimization abilities of the newly developed framework, parameters of an already existing control strategy, the helix approach, are tuned to optimize the total power production of a small wind farm. This also includes an extension of the helix approach to multiple turbines. Furthermore, it is attempted to develop novel control strategies based on the control of the generator torque. The results are analysed and difficulties in the setup in regards to Reinforcement Learning are discussed. The tuned helix approach yields a total power increase of 6.8% on average for the investigated case, while the generator torque controller does not yield an increase in total power. Finally, an alternative setup is proposed to improve the design of the problem.