Fredrick Chew
Camille Girabawe
Joanna Li
Hameesh Manadath
Alexander Schaefer
28.07.2017
System/Agent
Environment
OBSERVATIONS
ACTIONS
Environment State:
Â
Action Space:
Â
Goal/Rewards:
HOT
COLD
Workload: Â Â 1, 2, 0, 4, 3, 5 ...1, 2, 0, 4...
fast access
reward = +10
slow access
reward = -10
data movement
reward = -1
Environment State:
Emulator inspired by Data-Aging
Single Query Template: Â Â analytical query that is not cached
Adjusting HANA's max_concurrency based on predicted workload as a POC
Environment State:
Action Space:
Goal/Rewards:
Dopamine the reward signal for the human brain
DQN Formula:
Key take away: We are formalize "Learning through failures" via rewarding signals
Be careful on how you reward !:
Unintended consequences may happen. Next game show this effect
Epsilon: Exploration vs Exploitation
Google DeepMind published a paper in 2015 NatureÂ
Human-level control through deep reinforcement learning
Google bought DeepMind for 500 million
Environment State:
Â
Action Space:
Â
Goal/Rewards:
Agent learnt a nice strategy to maximized reward with least movement
Pong
BreakOut
What2Cache
ReinforceJS
fredrick.chew@sap.com
Demos: