=====================
Reloading Source Code
=====================

This document contains information about mechanisms available in mod_wsgi
for automatic reloading of source code when an application is changed and
any issues related to those mechanisms.

Embedded Mode Vs Daemon Mode
----------------------------

What is achievable in the way of automatic source code reloading depends on
which mode your WSGI application is running.

If your WSGI application is running in embedded mode then what happens when
you make code changes is largely dictated by how Apache works, as it
controls the processes handling requests. In general, if using embedded
mode you will have no choice but to manually restart Apache in order for code
changes to be used.

If using daemon mode, because mod_wsgi manages directly the processes
handling requests and in which your WSGI application runs, there is more
avenue for performing automatic source code reloading.

As a consequence, it is important to understand what mode your WSGI
application is running in.

If you are running on Windows, or have not used
WSGIDaemonProcess/WSGIProcessGroup directives to delegate your WSGI
application to a mod_wsgi daemon mode process, then you will be using
embedded mode. Note that ``mod_wsgi-express`` always runs in daemon
mode by default, so applications served via ``mod_wsgi-express
start-server`` benefit from the daemon-mode reloading behaviour
described below.

If you are not sure whether you are using embedded mode or daemon mode,
then substitute your WSGI application entry point with::

    def application(environ, start_response):
        status = '200 OK'

        if not environ['mod_wsgi.process_group']:
          output = b'EMBEDDED MODE'
        else:
          output = b'DAEMON MODE'

        response_headers = [('Content-Type', 'text/plain'),
                            ('Content-Length', str(len(output)))]

        start_response(status, response_headers)

        return [output]

If your WSGI application is running in embedded mode, this will output to
the browser 'EMBEDDED MODE'. If your WSGI application is running in daemon
mode, this will output to the browser 'DAEMON MODE'.

Reloading In Embedded Mode
--------------------------

However you have configured Apache to mount your WSGI application, you will
have a script file which contains the entry point for the WSGI application.
This script file is not treated exactly like a normal Python module and
need not even use a '.py' extension. It is even preferred that a '.py'
extension not be used for reasons described below.

For embedded mode, one of the properties of the script file is that by
default it will be reloaded whenever the file is changed. The primary
intent with the file being reloaded is to provide a second chance at
getting any configuration in it and the mapping to the application correct.
If the script weren't reloaded in this way, you would need to restart
Apache even for a trivial change to the script file. This reload
behaviour is governed by the
:doc:`../configuration-directives/WSGIScriptReloading` directive.

Do note though that this script reloading mechanism is not intended as a
general purpose code reloading mechanism. Only the script file itself is
reloaded, no other Python modules are reloaded. This means that if modifying
normal Python code files which are used by your WSGI application, you will
need to trigger a restart of Apache. For example, if you are using Django
in embedded mode and needed to change your 'settings.py' file, you would
still need to restart Apache.

That only the script file and not the whole process is reloaded also has a
number of implications and imposes certain restrictions on what code in the
script file can do or how it should be implemented.

The first issue is that when the script file is imported, if the code makes
modifications to ``sys.path`` or other global data structures and the
changes are additive, checks should first be made to ensure that the change
has not already been made, else duplicate data will be added every time the
script file is reloaded.

This means that when updating ``sys.path``, instead of using::

    import sys
    sys.path.append('/usr/local/wsgi/modules')

the more correct way would be to use::

    import sys
    path = '/usr/local/wsgi/modules'
    if path not in sys.path:
        sys.path.append(path)

This will ensure that the path doesn't get added multiple times.

Even where the script file is named so as to have a '.py' extension, that
the script file is not treated like a normal module means that you should
never try to import the file from another code file using the 'import'
statement or any other import mechanism. The easiest way to avoid this is
not use the '.py' extension on script files or never place script files in
a directory which is located on the standard module search path, nor add
the directory containing the script into ``sys.path`` explicitly.

If an attempt is made to import the script file as a module the result will
be that it will be loaded a second time as an independent module. This is
because script files are loaded under a module name which is keyed to the
full absolute path for the script file and not just the basename of the
file. Importing the script file directly and accessing it will therefore
not result in the same data being accessed as exists in the script file
when loaded.

Because the script file is not treated like a normal Python module also has
implications when it comes to using the "pickle" module in conjunction
with objects contained within the script file.

In practice what this means is that neither function objects, class objects
or instances of classes which are defined in the script file should be
stored using the "pickle" module.

The technical reasons for the limitations on the use of the "pickle" module
in conjunction with objects defined in the script file are further
discussed in the document :doc:`../user-guides/issues-with-pickle-module`.

The act of reloading script files also means that any data previously held
by the module corresponding to the script file will be deleted. If such
data constituted handles to database connections, and the connections are
not able to clean up themselves when deleted, it may result in resource
leakage.

One should therefore be cautious of what data is kept in a script file.
Preferably the script file should only act as a bridge to code and data
residing in a normal Python module imported from an entirely different
directory.

Restarting Apache Processes
---------------------------

As explained above, the only facility that mod_wsgi provides for
reloading source code files in embedded mode is the reloading of the
script file itself. There is no embedded-mode mechanism for reloading
the Python modules the script imports without restarting the process.

The strong recommendation is to switch to daemon mode for any
deployment where automatic code reloading matters. The daemon-mode
behaviour described below — touch the script file, daemon process
recycles, new code picked up — does not have an equivalent in
embedded mode.

If switching to daemon mode is not possible (for example on Windows,
where daemon mode is not available), one workaround is to set the
Apache ``MaxRequestsPerChild`` directive to ``1``::

    MaxRequestsPerChild 1

This causes the Apache child process to be recycled after every
request, which means each request is served by a fresh Python
interpreter that imports the latest code. The cost is high: the
recycle happens after *every* request, not just requests that hit
your WSGI application, so static files and any other content served
by the same Apache instance pay the same overhead. It is suitable
only as a development convenience, not for production.

Reloading In Daemon Mode
------------------------

If using mod_wsgi daemon mode, what happens when the script file is changed
is different to what happens in embedded mode. In daemon mode, if the
script file changed, rather than just the script file being reloaded, the
daemon process which contains the application will be shutdown and
restarted automatically.

Detection of the change in the script file will occur at the time of the
first request to arrive after the change has been made. The way that the
restart is performed does not affect the handling of the request, with it
still being processed once the daemon process has been restarted.

In the case of there being multiple daemon processes in the process group,
then a cascade effect will occur, with successive processes being restarted
until the request is again routed to one of the newly restarted processes.

In this way, restarting of a WSGI application when a change has been made
to the code is a simple matter of touching the script file if daemon mode
is being used. Any daemon processes will then automatically restart without
the need to restart the whole of Apache.

So, if you are using Django in daemon mode and needed to change your
'settings.py' file, once you have made the required change, also touch the
script file containing the WSGI application entry point. Having done that,
on the next request the process will be restarted and your Django
application reloaded.

Apart from script-file modification, daemon processes can also be
recycled by various ``WSGIDaemonProcess`` options including
``maximum-requests``, ``restart-interval``, ``inactivity-timeout``
and ``cpu-time-limit``. Those options exist for operational reasons
(memory pressure, leaks, periodic refresh) rather than for source-code
reloading, but in practice any one of them can result in the latest
on-disk code being picked up the next time a process is recycled.
See :doc:`../configuration-directives/WSGIDaemonProcess` for the full
set of options.

Restarting Daemon Processes
---------------------------

If you are using daemon mode of mod_wsgi, restarting of processes can to a
degree also be controlled by a user, or by the WSGI application itself,
without restarting the whole of Apache.

To force a daemon process to be restarted, if you are using a single daemon
process with many threads for the application, then you can embed a page in
your application (password protected hopefully), that sends an appropriate
signal to itself.

This should only be done for daemon processes and not within the Apache
child processes, as sending such a signal within a child process may
interfere with the operation of Apache. That the code is executing within a
daemon process can be determined by checking the 'mod_wsgi.process_group'
variable in the WSGI environment passed to the application. The value will
be non empty if a daemon process::

    if environ['mod_wsgi.process_group'] != '':
        import signal, os
        os.kill(os.getpid(), signal.SIGINT)

This will cause the daemon process your application is in to shutdown. The
Apache process supervisor will then automatically restart your process
ready for subsequent requests. On the restart it will pick up your new
code. This way you can control a reload from your application through some
special web page specifically for that purpose.

The same signal can also be sent from outside the application — for
example from a shell script, deployment tool, or operator command —
in which case the harder part is identifying which processes to
target. If the daemon process group is configured to run as a
different user or group from Apache itself, and each application is
running as its own user, you can simply look for the Apache
(``httpd``) processes owned by that user (as opposed to the Apache
user) and signal them all.

If the daemon process is running as the same user as Apache or there are
distinct applications running in different daemon processes but as the same
user, knowing which daemon processes to send the signal may be harder to
determine.

Either way, to make it easier to identify which processes belong to
a daemon process group, you can use the ``display-name`` option to
``WSGIDaemonProcess`` to name the process. By default the daemon
processes retain Apache's own ``argv[0]``, so they are
indistinguishable from the rest of the Apache process tree in ``ps``
output. With ``display-name`` set, that custom name appears in
``ps`` output instead on most platforms, which is what makes
external identification practical.

Once daemon processes are nameable in this way, ``pkill`` can be
used to send the signal directly. For example, with::

    WSGIDaemonProcess myapp display-name=%{GROUP}

the daemon processes will appear in ``ps`` output as
``(wsgi:myapp)``, and they can be signalled with::

    pkill -INT -f 'wsgi:myapp'

The important caveat is that ``pkill -f`` matches against the full
command line as a regular expression, so the chosen display name
must be specific enough that no unrelated processes match. Generic
names like ``wsgi`` or ``app`` will match too widely; daemon-group
names should be unique per application within the host. The
``%{GROUP}`` form above is the safest pattern, since the
``WSGIDaemonProcess`` group name is already required to be unique
within the Apache configuration and the ``wsgi:`` prefix is
distinctive enough not to collide with anything else in normal
process listings.

Always sanity-check the pattern before sending the signal by
listing the matching processes first::

    pgrep -fl 'wsgi:myapp'

This prints the PID and command line of every process the same
pattern would target, so any unintended matches are visible before
``pkill`` actually delivers the signal.

Monitoring For Code Changes
---------------------------

The use of signals to restart a daemon process could also be employed in a
mechanism which automatically detects changes to any Python modules or
dependent files. This could be achieved by creating a thread at startup
which periodically looks to see if file timestamps have changed and trigger
a restart if they have.

Example code for such an automatic restart mechanism which is compatible
with how mod_wsgi works is shown below::

    import os
    import sys
    import signal
    import threading
    import queue

    _interval = 1.0
    _times = {}
    _files = []

    _running = False
    _queue = queue.Queue()
    _lock = threading.Lock()

    def _restart(path):
        _queue.put(True)
        prefix = f'monitor (pid={os.getpid()}):'
        print(f'{prefix} Change detected to {path!r}.', file=sys.stderr)
        print(f'{prefix} Triggering process restart.', file=sys.stderr)
        os.kill(os.getpid(), signal.SIGINT)

    def _modified(path):
        try:
            # If path doesn't denote a file and were previously
            # tracking it, then it has been removed or the file type
            # has changed so force a restart. If not previously
            # tracking the file then we can ignore it as probably
            # pseudo reference such as when file extracted from a
            # collection of modules contained in a zip file.

            if not os.path.isfile(path):
                return path in _times

            # Check for when file last modified.

            mtime = os.stat(path).st_mtime
            if path not in _times:
                _times[path] = mtime

            # Force restart when modification time has changed, even
            # if time now older, as that could indicate older file
            # has been restored.

            if mtime != _times[path]:
                return True
        except Exception:
            # If any exception occurred, likely that file has been
            # removed just before stat(), so force a restart.

            return True

        return False

    def _monitor():
        while True:
            # Check modification times on all files in sys.modules.

            for module in list(sys.modules.values()):
                if not hasattr(module, '__file__'):
                    continue
                path = getattr(module, '__file__')
                if not path:
                    continue
                if os.path.splitext(path)[1] in ['.pyc', '.pyo', '.pyd']:
                    path = path[:-1]
                if _modified(path):
                    return _restart(path)

            # Check modification times on files which have
            # specifically been registered for monitoring.

            for path in _files:
                if _modified(path):
                    return _restart(path)

            # Go to sleep for specified interval.

            try:
                return _queue.get(timeout=_interval)
            except queue.Empty:
                pass

    _thread = threading.Thread(target=_monitor, daemon=True)

    def track(path):
        if path not in _files:
            _files.append(path)

    def start(interval=1.0):
        global _interval
        if interval < _interval:
            _interval = interval

        global _running
        with _lock:
            if not _running:
                prefix = f'monitor (pid={os.getpid()}):'
                print(f'{prefix} Starting change monitor.', file=sys.stderr)
                _running = True
                _thread.start()

This would be used by importing into the script file the Python module
containing the above code, starting the monitoring system and adding any
additional non Python files which should be tracked::

    import os

    import monitor
    monitor.start(interval=1.0)
    monitor.track(os.path.join(os.path.dirname(__file__), 'site.cf'))

    def application(environ, start_response):
        ...

Where needing to add many non Python files in a directory hierarchy,
such as template files which would otherwise be cached within the
running process, the ``os.walk()`` function can be used to traverse
all files and add required files based on extension or other criteria
using the ``track()`` function.

This mechanism would generally work adequately where a single daemon
process is used within a process group. You would need to be careful
however when multiple daemon processes are used. This is because it may not
be possible to synchronise the checks exactly across all of the daemon
processes. As a result you may end up with the daemon processes running a
mixture of old and new code until they all synchronise with the new code
base. This problem can be minimised by defining a short interval time
between scans, however that will increase the overhead of the checks.

Using such an approach may in some cases be useful if using mod_wsgi as a
development platform. It certainly would not be recommended you use this
mechanism for a production system.

The reasons for not using it on a production system is due to the
additional overhead and chance that daemon processes are restarted when you
are not expecting them to be. For example, in a production environment
where requests are coming in all the time, you do not want a restart
triggered when you are part way through making a set of changes which cover
multiple files as likely then that an inconsistent set of code will be
loaded and the application will fail.

Note that you should also not use this mechanism on a system where you have
configured mod_wsgi to preload your WSGI application as soon as the daemon
process has started. If you do that, then the monitor thread will be recreated
immediately and so for every single code change on a preloaded file you
make, the daemon process will be restarted, even if there is no intervening
request.

If preloading was really required, the example code would need to be
modified so as to not use signals to restart the daemon process, but reset
to zero the variable saved away in the WSGI script file that records the
modification time of the script file. This will have the affect of delaying
the restart until the next request has arrived. Because that variable holding
the modification time is an internal implementation detail of mod_wsgi and
not strictly part of its published API or behaviour, you should only use
that approach if it is warranted.

Restarting Windows Apache
-------------------------

On the Windows platform there is no daemon mode only embedded mode. The MPM
used on Apache is the 'winnt' MPM. This MPM is like the worker MPM on UNIX
systems except that there is only one process.

Being embedded mode, modifying the WSGI script file only results in the WSGI
script file itself being reloaded, the process as a whole is not reloaded.
Thus there is no way normally through modifying the WSGI script file or any
other Python code file used by the application, of having the whole
application reloaded automatically.

The recipe in the previous section can be used with daemon mode on UNIX
systems to implement an automated scheme for restarting the daemon
processes when any code change is made, but because Windows lacks the
'fork()' system call daemon mode isn't supported in the first place.

Thus, the only way one can have code changes picked up on Windows is to
restart Apache as a whole. Although a full restart is required, Apache on
Windows only uses a single child server process and so the impact isn't as
significant as on UNIX platforms, where many processes may need to be
shutdown and restarted.

With that in mind, it is actually possible to modify the prior recipe for
restarting a daemon process to restart Apache itself. To achieve this slight
of hand, it is necessary to use the Python 'ctypes' module to get access to
a special internal Apache function which is available in the Windows version
of Apache called 'ap_signal_parent()'.

The required change to get this to work is to replace the restart
function in the previous code with the following::

    def _restart(path):
        _queue.put(True)
        prefix = 'monitor (pid=%d):' % os.getpid()
        print('%s Change detected to \'%s\'.' % (prefix, path), file=sys.stderr)
        print('%s Triggering Apache restart.' % prefix, file=sys.stderr)
        import ctypes
        ctypes.windll.libhttpd.ap_signal_parent(1)

Other than that, the prior code would be used exactly as before. Now when
any change is made to Python code used by the application or any other
monitored files, Apache will be restarted automatically for you.

As before, probably recommended that this only be used during development
and not on a production system.