Author(s): Han Qi Originally published on Towards AI. Photo by Hert Niks on Unsplash import osfrom multiprocessing import Poolimport timefrom functools import wrapsimport heartrateport_base = 10000def initialize_worker(): # This function runs only in the worker processes process_id = os.getpid() port = port_base + process_id % 10000 # Unique port for each process print(f"Tracing on port {port} for process {process_id}") heartrate.trace(browser=True, port=port)def track_execution_time(func): @wraps(func) def wrapper(*args, **kwargs): start_time = time.time() print(f"Starting task at {start_time}") result = func(*args, **kwargs) end_time = time.time() print(f"Ending task at {end_time}") print(f"Task duration: {end_time - start_time}") return result return wrapper@track_execution_timedef meaningless_task(dummy_text_partial): words = dummy_text_partial.split() lorem_count = sum(1 for word in words if word.lower() == "lorem") for i in range(5): time.sleep(1) return lorem_countdef main(): dummy_text = """ Lorem ipsum dolor sit amet, consectetur Lorem adipiscing Lorem elit """ with Pool(processes=2, initializer=initialize_worker) as pool: results = pool.map(meaningless_task, dummy_text.split(",")) print("Word count results:", results) pool.close()if __name__ == "__main__": main() Above code uses 2 workers in a multiprocessing pool to count the number of times the string lorem appears in each clause produced by splitting a sentence on comma. The worker processing logic is not the point of this article, but it returns Word count results: [1, 2] because Lorem ipsum dolor sit amet has 1 lorem and consectetur Lorem adipiscing Lorem elit has 2. Heartrate is required pip install heartrate (https://github.com/alexmojaki/heartrate) if you want fancy execution tracking. Otherwise, delete import heartrate , delete the whole def initialize_worker function and remove initializer=initialize_worker in Pool. The problem If you comment out the line @wraps(func) , you should getAttributeError: Can’t pickle local object ‘track_execution_time.<locals>.wrapper’ Why is this a problem Multiprocessing requires pickling the worker function (meaningless_task in the above example). Pickling a function requires being able to find the function at the global scope of the module Decorators wrap functions and return another function of the same name (if using @ syntax). These wrapped functions (def wrapper) are defined in a decorating function (def track_execution_time). The wrapped function goes out of scope once the decorating function returns, and so cannot be found in global scope. Photo by Val Vesa on Unsplash How does functools.wraps solve the problem? wraps copies attributes from the raw function to the decorated function, so pickle can get what it needs. From https://docs.python.org/3/library/functools.html#functools.update_wrapper, wraps copies attributes defined in WRAPPER_ASSIGNMENTS (__module__, __name__, __qualname__, __annotations__, __type_params__, and __doc__) Which attribute does pickle need? From https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled: Note that functions (built-in and user-defined) are pickled by fully qualified name, not by value. [2] This means that only the function name is pickled, along with the name of the containing module and classes. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised. Pickle needs __qualname__ of the function being pickled to be a globally accessible name. track_execution_time.<locals>.wrapper in the AttributeError above describes the path from the module’s global scope, but the wrapper is not accessible anymore. Why use wraps You don’ t need to, but it’s nice to have wraps copy the other useful attributes in case you want to use them, like __docs__ to show documentation. def track_execution_time(func): # @wraps(func) def wrapper(*args, **kwargs): start_time = time.time() print(f"Starting task at {start_time}") result = func(*args, **kwargs) end_time = time.time() print(f"Ending task at {end_time}") print(f"Task duration: {end_time - start_time}") return result wrapper.__qualname__ = func.__qualname__ return wrapper You could have removed @wraps(func) and done wrapper.__qualname__ = func.__qualname__ like above (both have the same value of meaningless_task ) after the assignment. Why does pickle need __qualname__ Pickle needs the name of the object being pickled, so it can use that name to find the definition when unpickling. It’s time to go down the rabbit hole to another example to learn pickling fundamentals. Photo by Tine Ivanič on Unsplash import pickledef add(x, y): return x + y# with open("func.pkl", "wb") as f:# pickle.dump(add, f)pickled = pickle.dumps(add)# del globals()["add"]# globals()["add"] = lambda x, y: x * y# with open("func.pkl", "rb") as f:# loaded_add = pickle.load(f)loaded_add = pickle.loads(pickled)print(loaded_add(2, 3)) The above code should pickle, and unpickle successfully, and print 5 after adding 2+3. Deleting the pickled object between pickling and unpickling If you uncomment del globals()[“add”] , you should see AttributeError: Can’t get attribute ‘add’ on <module ‘__main__’ from ‘/home/hanqi/code/pickling/test_pickle.py’> That means unpickling failed. Pickling requires the __qualname__ of the object being pickled to be globally accessible. By artificially deleting it, the unpickling step is unable to find it. This is a slightly different problem from before. Previously, we could not even pickle. Here we can pickle but cannot unpickle. However, this unpickling failure indirectly explains why __qualname__ must be correctly specified in the heartrate example. Inserting fake implementation to mess with unpickling If you uncomment globals()[“add”] = lambda x, y: x * y , you will see output 6 instead of 5. (add = lambda x, y: x * y works too)because addition changed to multiplication (2 * 3 = 6). This code overwrites the def add previously defined. This shows that pickle does not care what is the implementation of the code object that was pickled initially. Any code at runtime has an opportunity to change the unpickled implementation as long as it refers to the same name seen during pickling. Pickle uses the __qualname__ (add in this case) to search for whatever add is bound to in the OS process that is unpickling, like the injected wrong implementation of multiplication instead of addition. You can even assign arbitrary constants like add = 2 before unpickling and get TypeError: ‘int’ object is not callable The above example uses a single process. In reality, pickle is more commonly used to pass objects across different files or even machines. For example, a machine learning model is trained and pickled on a training machine with development libraries, then quantized and deployed and another machine with different hardware characteristics more suited for inference. You can play with the commented code of pickle interfacing with files, […]
↧