Building Reflection-Gemma-2

Author(s): Arthur Lagacherie Originally published on Towards AI. image original from google blog mofied with Adobe Firefly In the world of artificial intelligence, the ability of models to think and correct their mistakes is crucial. This article explores how Gemma 2–2b, can learn to reflect and improve its problem-solving skills. How does reflection work? During the inference: First, the model generates a reasoning inside the <thinking> tags. Then if the model finds an error, it corrects the mistake in the <reflection> tags as many times as necessary. To finish the model generate the output in the <output> tags. Example: ### QuestionJungkook is the 5th place. Find the number of people who crossed the finish line faster than Jungkook.### Answer<thinking>The question asks how many people finished before Jungkook. If he is in 5th place, it means 4 people finished ahead of him.</thinking><reflection>It's important to understand that place numbers start from the first person to finish. So, 5th place means 4 people finished before him.</reflection><output>4 people crossed the finish line faster than Jungkook.</output> Reflection-tuning We first need some datasets to train Gemma. I choose four datasets: mahiatlinux/Reflection-Dataset-v2 Harshkmr/orca-math-word-reflection gabrielmbmb/distilabel-reflection-tuning Replete-AI/Replete-Reflection The first one is composed of 9.17k rows organized into two columns, the prompt and the responses. The question is like “Explain the concept of…”. image of the dataset The second is composed of 2.44k rows of mathematical problems. image of the dataset Etc… First, I began by downloading them. from datasets import load_datasetdataset1 = load_dataset("mahiatlinux/Reflection-Dataset-ShareGPT-v2")["train"]dataset2 = load_dataset("gabrielmbmb/distilabel-reflection-tuning")["train"]dataset3 = load_dataset("Replete-AI/Replete-Reflection")["train"]dataset4 = load_dataset("mahiatlinux/Reflection-Dataset-v2")["train"]dataset5 = load_dataset("Harshkmr/orca-math-word-reflection")["train"] image from author And now to be able to train our model we need to put the prompt and the answer in the same row, following the prompt template of Gemma2. <bos><start_of_turn>userWrite a hello world program<end_of_turn><start_of_turn>model```pythonprint("Hello World")```<end_of_turn><eos> import osimport pandas as pddata = {"text":[]}### mahiatlinux/Reflection-Dataset-ShareGPT-v2for i in range(len(dataset1)): user = f"""<bos><start_of_turn>user\n{dataset1[1]["conversations"][0]["value"]}<end_of_turn>""" llm = f"""<start_of_turn>model\n{dataset1[1]["conversations"][1]["value"]}<end_of_turn><eos>""" data["text"].append(user+llm) ### gabrielmbmb/distilabel-reflection-tuningprompts = dataset2["instruction"]responses = dataset2["generation"]for i in range(len(prompts)): user = f"""<bos><start_of_turn>user\n{prompts[i]}<end_of_turn>""" llm = f"""<start_of_turn>model\n{responses[i]}<end_of_turn><eos>""" data["text"].append(user+llm) ### Replete-AI/Replete-Reflectionprompts = dataset3["input"]responses = dataset3["output"]for i in range(int(len(prompts)/3)): user = f"""<bos><start_of_turn>user\n{prompts[i]}<end_of_turn>""" llm = f"""<start_of_turn>model\n{responses[i]}<end_of_turn><eos>""" data["text"].append(user+llm)### mahiatlinux/Reflection-Dataset-v2prompts = dataset4["prompt"]responses = dataset4["response"]for i in range(int(len(prompts)/3)): user = f"""<bos><start_of_turn>user\n{prompts[i]}<end_of_turn>""" llm = f"""<start_of_turn>model\n{responses[i]}<end_of_turn><eos>""" data["text"].append(user+llm)### Harshkmr/orca-math-word-reflectionprompts = dataset5["question"]responses = dataset5["Reflection"]for i in range(len(prompts)): user = f"""<bos><start_of_turn>user\n{prompts[i]}<end_of_turn>""" llm = f"""<start_of_turn>model\n{responses[i]}<end_of_turn><eos>""" data["text"].append(user+llm) ### save as csvdata = pd.DataFrame.from_dict(data)print(data)try: os.mkdir("/kaggle/working/data")except: passdata.to_csv('data/dataset.csv', index=False, escapechar='\\') image from author Now we have the dataset we can install AutoTrain which is the library I used to train the model. It manages the training of any Huggingface model with just one command. !pip install -U autotrain-advanced!autotrain setup After the installation, we can set up the settings. import osproject_name = 'Reflection-Gemma-2-2b' # @param {type:"string"}model_name = 'Arthur-LAGACHERIE/Gemma-2-2b-4bit' # @param {type:"string"}push_to_hub = True # @param ["False", "True"] {type:"raw"}hf_token = "hf_token" #@param {type:"string"}hf_username = "Arthur-LAGACHERIE" #@param {type:"string"}unsloth = False # @param ["False", "True"] {type:"raw"}learning_rate = 1e-4 # @param {type:"number"}num_epochs = 1 #@param {type:"number"}batch_size = 2 # @param {type:"slider", min:1, max:32, step:1}block_size = 1124 # @param {type:"number"}trainer = "sft" # @param ["generic", "sft"] {type:"string"}warmup_ratio = 0.1 # @param {type:"number"}weight_decay = 0.01 # @param {type:"number"}gradient_accumulation = 4 # @param {type:"number"}mixed_precision = "fp16" # @param ["fp16", "bf16", "none"] {type:"string"}peft = True # @param ["False", "True"] {type:"raw"}quantization = "int4" # @param ["int4", "int8", "none"] {type:"string"}lora_r = 16 #@param {type:"number"}lora_alpha = 32 #@param {type:"number"}lora_dropout = 0.05 #@param {type:"number"}merge_adapter = "true"os.environ["HF_TOKEN"] = hf_tokenos.environ["HF_USERNAME"] = hf_usernameconf = f"""task: llm-{trainer}base_model: {model_name}project_name: {project_name}log: tensorboardbackend: localdata: path: data/ train_split: train valid_split: null chat_template: null column_mapping: text_column: textparams: block_size: {block_size} lr: {learning_rate} warmup_ratio: {warmup_ratio} weight_decay: {weight_decay} epochs: {num_epochs} batch_size: {batch_size} gradient_accumulation: {gradient_accumulation} mixed_precision: {mixed_precision} peft: {peft} quantization: {quantization} lora_r: {lora_r} lora_alpha: {lora_alpha} lora_dropout: {lora_dropout} unsloth: {unsloth} merge_adapter: {merge_adapter}hub: username: ${{HF_USERNAME}} token: ${{HF_TOKEN}} push_to_hub: {push_to_hub}"""with open("conf.yaml", "w") as f: f.write(conf) And now we just have to run this simple command to train our model. !autotrain --config conf.yaml And 4 hours later, it’s ready our model is fine-tuned. 👍 image from author Inference Now we have our fine-tuned model we can test it to solve problems. !pip install -q bitsandbytesfrom transformers import AutoModelForCausalLM, AutoTokenizer, pipelineimport torchfrom transformers import TextIteratorStreamerfrom threading import Threaddevice = 'cuda' if torch.cuda.is_available() else 'cpu'model_id = "Arthur-LAGACHERIE/Reflection-Gemma-2-2b"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id)prompt = """### SystemYou are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.Try an answer and see if it's correct before generate the ouput. But don't forget to think very carefully.### QuestionThe question here."""chat = [ { "role": "user", "content": prompt},]question = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)question = tokenizer(question, return_tensors="pt").to(device)streamer = TextIteratorStreamer(tokenizer, skip_prompt=True)generation_kwargs = dict(question, streamer=streamer, max_new_tokens=4000)thread = Thread(target=model.generate, kwargs=generation_kwargs)# generatethread.start()for new_text in streamer: print(new_text, end="") Problems Let’s see if it can solve some problems. 1. The length of one span of Jinseo is about 12 centimeters (cm). When Jinseo measured the length of the shorter side of the bookshelf, it was about two spans. How many centimeters (cm) is the short side of the bookshelf? <thinking> To find the length of the shorter side of the bookshelf, we need to figure out how many spans are in the shorter side. We know that one span is 12 cm long. The shorter side is two spans, so we need to multiply the length of one span by 2. </thinking><output> The shorter side of the bookshelf is 2 * 12 cm = 24 cm. </output> 👌 Ok first passed with success. (but it was very simple) 2. How many diagonals can you draw in a decagon? <thinking> A decagon has 10 sides. A diagonal is a line segment that connects two non-adjacent vertices. To find the number of diagonals, we can use the formula for the number of diagonals in a polygon. The formula is n(n-3)/2, where n is the number of sides. So, for a decagon with 10 sides, the number of diagonals would be 10(10-3)/2. Let's calculate that. </thinking><output> The number of diagonals in a decagon is 10(10-3)/2 = 10(7)/2 = 35 diagonals. </output> ✅ Also […]