====================
Debugging Techniques
====================

Be it when initially setting up mod_wsgi for the first time, or later
during development or use of your WSGI application, you are bound to get
some sort of unexpected Python error. By default all you are usually going
to see as evidence of this is a HTTP 500 "Internal Server Error" being
displayed in the browser window and little else.

The purpose of this document is to explain where to go look for more
details on what caused the error, describe techniques one can use to have
your WSGI application generate more useful debug information, as well as
mechanisms that can be employed to interactively debug your application.

Note that although this document is intended to deal with techniques which
can be used when using mod_wsgi, many of the techniques are also directly
transferable or adaptable to other web hosting mechanisms for WSGI
applications.

Apache Error Log Files
----------------------

When using mod_wsgi, unless you or the web framework you are using takes
specific action to catch exceptions and present the details in an alternate
manner, the only place that details of uncaught exceptions will be recorded
is in the Apache error log files. The Apache error log files are therefore
your prime source of information when things go wrong.

Do note though that log messages generated by mod_wsgi are logged with
various severity levels and which ones will be output to the Apache error
log files will depend on how Apache has been configured. The standard
configuration for Apache has the ``LogLevel`` directive being set to 'warn'.
With this setting any important error messages will be output, but
informational messages generated by mod_wsgi which can assist in working
out what it is doing are not. Thus, if new to mod_wsgi or trying to debug a
problem, it is worthwhile setting the Apache configuration to use 'info'
log level instead::

    LogLevel info

If you don't want to turn up log verbosity for the whole server, you can
also set the log level for just the mod_wsgi module::

    LogLevel warn wsgi:info

If your Apache web server is only providing services for one host, it is
likely that you will only have one error log file. If however the Apache
web server is configured for multiple virtual hosts, then it is possible
that there will be multiple error log files, one corresponding to the main
server host and an additional error log file for each virtual host. Such a
virtual host specific error log if one is being used, would have been
configured through the placing of the Apache CustomLog directive within
the context of the VirtualHost container.

Although your WSGI application may be hosted within a particular virtual
host and that virtual host has its own error log file, some error and
informational messages will still go to the main server host error log
file. Thus you may still need to consult both error log files when using
virtual hosts.

Messages of note that will end up in the main server host error log file
include notifications in regard to initialisation of Python and the
creation and destruction of Python sub interpreters, plus any errors which
occur when doing this.

Messages of note that would end up in the virtual host error log file, if
it exists, include details of uncaught Python exceptions which occur when
the WSGI application script is being loaded, or when the WSGI application
callable object is being executed.

Messages that are logged by a WSGI application via the 'wsgi.errors' object
passed through to the application in the WSGI environment are also logged.
These will go to the virtual host error log file if it exists, or the main
error log file if the virtual host is not setup with its own error log file.
Thus, if you want to add debugging messages to your WSGI application code,
you can use 'wsgi.errors' in conjunction with the 'print' statement as shown
below::

    def application(environ, start_response):
        status = '200 OK'
        output = b'Hello World!'

        print("application debug #1", file=environ['wsgi.errors'])

        response_headers = [('Content-type', 'text/plain'),
                            ('Content-Length', str(len(output)))]
        start_response(status, response_headers)

        print("application debug #2", file=environ['wsgi.errors'])

        return [output]

If ``wsgi.errors`` is not available to the code which needs to output
log messages, it should explicitly direct output from ``print`` to
``sys.stderr``::

    import sys

    def function():
        print("application debug #3", file=sys.stderr)
        ...

If ``sys.stderr`` or ``sys.stdout`` is used directly the messages will
end up in the main server host error log file and not the virtual host
log file, unless the WSGI application is running in a daemon process
specifically associated with the virtual host.

A portable WSGI application should not write to ``sys.stdout`` or use
``print`` without redirecting it to an alternate stream, since some
ways of hosting WSGI use ``sys.stdout`` as the response channel back
to the web server (CGI being the canonical example). Under mod_wsgi
the default behaviour is to redirect anything written to
``sys.stdout`` to the Apache error log, so writes do not fail —
but ``sys.stderr`` should still be preferred. See
:doc:`../configuration-directives/WSGIRestrictStdout` if the older
strict behaviour (raise on ``sys.stdout`` use) is required.

In general, a WSGI application should always endeavour to log messages
via the ``wsgi.errors`` object passed through in the WSGI environment.
This is the only logging path with any guarantee of ending up in a log
file the operator can reach on a shared server.

An application shouldn't however cache 'wsgi.errors' and try to use it
outside of the context of a request. If this is done an exception will be
raised indicating that the request has expired and the error log object
is now invalid.

That messages output via ``sys.stderr`` and ``sys.stdout`` end up in
the Apache error logs at all is provided as a convenience but there is no
requirement in the WSGI specification that they are valid means of a WSGI
application logging messages.

Displaying Request Environment
------------------------------

When a WSGI application is invoked, the request headers are passed as CGI
variables in the WSGI request environment. The dictionary used for this
also holds information about the WSGI execution environment and mod_wsgi.
This includes mod_wsgi specific variables indicating the name of the
process and application groups within which the WSGI application is
executing.

Knowing the values of the process and application group variables can be
important when needing to validate that your Apache configuration is doing
what you intended as far as ensuring your WSGI application is running in
daemon mode or otherwise.

A simple way of validating such details or getting access to any of the
other WSGI request environment variables is to substitute your existing
WSGI application with one which echos back the details to your browser.
Such a task can be achieved with the following test application. The
application could be extended as necessary to display other information as
well, with process ID, user ID and group ID being shown as examples::

    import io
    import os

    def application(environ, start_response):
        headers = [('Content-Type', 'text/plain')]
        start_response('200 OK', headers)

        output = io.StringIO()

        print(f'PID: {os.getpid()}', file=output)
        print(f'UID: {os.getuid()}', file=output)
        print(f'GID: {os.getgid()}', file=output)
        print(file=output)

        for key in sorted(environ.keys()):
            print(f'{key}: {environ[key]!r}', file=output)
        print(file=output)

        body = environ['wsgi.input'].read(
            int(environ.get('CONTENT_LENGTH', '0')))
        if body:
            print(body.decode('latin-1', errors='replace'), file=output)

        return [output.getvalue().encode('utf-8')]

For the case of the process group as recorded by the
'mod_wsgi.process_group' variable in the WSGI request environment, if the
value is an empty string then the WSGI application is running in embedded
mode. For any other value it will be running in daemon mode with the process
group named by the variables value.

Note that by default WSGI applications run in embedded mode, which means
within the Apache server child processes which accept the original requests.
Daemon mode processes would only be used through appropriate use of the
WSGIDaemonProcess and WSGIProcessGroup directives to delegate the WSGI
application to a named daemon process group.

For the case of the application group as recorded by the
'mod_wsgi.application_group' variable in the WSGI request environment, if the
value is an empty string then the WSGI application is running in the main
Python interpreter — the one Python creates at process startup, before any
sub-interpreters are spawned. For any other value it indicates it is
running in the named Python sub interpreter.

Note that by default WSGI applications would always run in a sub
interpreter rather than the main interpreter. The name of this sub
interpreter would be automatically constructed from the name of the server
or virtual host, the URL mount point of the WSGI application and the number
of the listener port when it is other than ports 80 or 443.

To delegate a WSGI application to run in main Python interpreter, the
WSGIApplicationGroup directive would need to have been used with the value
'%{GLOBAL}'. Although the value is '%{GLOBAL}', this translates to the
empty string seen for the value of 'mod_wsgi.application_group' within the
WSGI request environment.

The WSGIApplicationGroup directive could also be used to designate a
specific named sub interpreter rather than that selected automatically.

For newcomers this can all be a bit confusing, which is where the test
application comes in as you can use it to validate where your WSGI
application is running is where you intended it to run.

The set of WSGI request environment variables will also show the WSGI
variables indicating whether process is multithreaded and whether the
process group is multiprocess or not. For a more complete explanation
of what that means see documentation of
:doc:`../user-guides/processes-and-threading`.

Tracking Request and Response
-----------------------------

Although one can use above test application to display the request
environment, it is replacing your original WSGI application. Rather than
replace your existing application you can use a WSGI middleware wrapper
application which logs the details to the Apache error log instead::

    # Original WSGI application.

    def application(environ, start_response):
        ...

    # Logging WSGI middleware.

    import pprint

    class LoggingMiddleware:

        def __init__(self, application):
            self.__application = application

        def __call__(self, environ, start_response):
            errors = environ['wsgi.errors']
            pprint.pprint(('REQUEST', environ), stream=errors)

            def _start_response(status, headers, *args):
                pprint.pprint(('RESPONSE', status, headers), stream=errors)
                return start_response(status, headers, *args)

            return self.__application(environ, _start_response)

    application = LoggingMiddleware(application)

The output from the middleware would end up in the Apache error log for the
virtual host, or if no virtual host specific error log file, in the main
Apache error log file.

For more complicated problems it may also be necessary to track both the
request and response content as well. A more complicated middleware which
can log these as well as header information to the file system is as
follows::

    # Original WSGI application.

    def application(environ, start_response):
        ...

    # Logging WSGI middleware.

    import threading
    import pprint
    import time
    import os

    class LoggingInstance:
        def __init__(self, start_response, oheaders, ocontent):
            self.__start_response = start_response
            self.__oheaders = oheaders
            self.__ocontent = ocontent

        def __call__(self, status, headers, *args):
            pprint.pprint((status, headers)+args, stream=self.__oheaders)
            self.__oheaders.close()

            self.__write = self.__start_response(status, headers, *args)
            return self.write

        def __iter__(self):
            return self

        def write(self, data):
            self.__ocontent.write(data)
            self.__ocontent.flush()
            return self.__write(data)

        def __next__(self):
            data = next(self.__iterable)
            self.__ocontent.write(data)
            self.__ocontent.flush()
            return data

        def close(self):
            if hasattr(self.__iterable, 'close'):
                self.__iterable.close()
            self.__ocontent.close()

        def link(self, iterable):
            self.__iterable = iter(iterable)

    class LoggingMiddleware:

        def __init__(self, application, savedir):
            self.__application = application
            self.__savedir = savedir
            self.__lock = threading.Lock()
            self.__pid = os.getpid()
            self.__count = 0

        def __call__(self, environ, start_response):
            self.__lock.acquire()
            self.__count += 1
            count = self.__count
            self.__lock.release()

            key = "%s-%s-%s" % (time.time(), self.__pid, count)

            iheaders = os.path.join(self.__savedir, key + ".iheaders")
            iheaders_fp = open(iheaders, 'w')

            icontent = os.path.join(self.__savedir, key + ".icontent")
            icontent_fp = open(icontent, 'w+b')

            oheaders = os.path.join(self.__savedir, key + ".oheaders")
            oheaders_fp = open(oheaders, 'w')

            ocontent = os.path.join(self.__savedir, key + ".ocontent")
            ocontent_fp = open(ocontent, 'w+b')

            errors = environ['wsgi.errors']
            pprint.pprint(environ, stream=iheaders_fp)
            iheaders_fp.close()

            length = int(environ.get('CONTENT_LENGTH', '0'))
            input = environ['wsgi.input']
            while length != 0:
                data = input.read(min(4096, length))
                if data:
                    icontent_fp.write(data)
                    length -= len(data)
                else:
                    length = 0
            icontent_fp.flush()
            icontent_fp.seek(0, os.SEEK_SET)
            environ['wsgi.input'] = icontent_fp

            iterable = LoggingInstance(start_response, oheaders_fp, ocontent_fp)
            iterable.link(self.__application(environ, iterable))
            return iterable

    application = LoggingMiddleware(application, '/tmp/wsgi')

For this middleware, the second argument to the constructor should be a
preexisting directory. For each request four files will be saved. These
correspond to input headers, input content, response status and headers,
and request content.

Under ``mod_wsgi-express`` the equivalent capability is
built in via ``--enable-recorder``, which wraps the
application in a middleware functionally equivalent to the
``LoggingMiddleware`` shown above. Per-request files use
the same ``.iheaders`` / ``.icontent`` / ``.oheaders`` /
``.ocontent`` naming, plus ``.oaexcept`` / ``.orexcept`` /
``.ofexcept`` capturing exceptions raised during the
application call, response iteration, and ``close()``
respectively. The headers files contain a
``pprint``-formatted Python repr (inspectable in any text
editor; round-trip with ``ast.literal_eval()``). The
output directory defaults to ``<server-root>/archive/``
and is overridable with ``--recorder-directory``.
``--enable-recorder`` implies ``--debug-mode`` (single
process, single-threaded) so output from concurrent
workers does not interleave.

Poorly Performing Code
----------------------

The WSGI specification allows any iterable object to be returned as
the response, so long as the iterable yields ``bytes``. That this is
the case means that one can too easily return an object which
satisfies the requirement but has some sort of performance-related
issue.

The worst case of this is where instead of returning a list
containing one ``bytes`` value, the ``bytes`` value itself is
returned. When a ``bytes`` value is iterated over a single integer
is yielded each time, so mod_wsgi ends up processing the response
one byte at a time, with a flush between each, to ensure each byte
is actually written rather than just buffered.

Although for small byte-strings a performance impact may not be noticed, if
returning more data the effect on request throughput could be quite
significant.

Another case which can cause problems is to return a file like object. For
iteration over a file like object, typically what can occur is that a
single line within the file is returned each time. If the file is a line
oriented text file where each line is a of a reasonable length, this may be
okay, but if the file is a binary file there may not actually be line
breaks within the file.

For the case where file contains many short lines, throughput would be
affected much like in the case where a byte-string is returned. For the case
where the file is just binary data, the result can be that the complete
file may be read in on the first loop. If the file is large, this could
cause a large transient spike in memory usage. Once that memory is
allocated, it will then usually be retained by the process, albeit that
it may be reused by the process at a later point.

Because of the performance impacts in terms of throughput and memory usage,
both these cases should be avoided. For the case of returning a byte-string, it
should be returned with a single element list. For the case of a file like
object, the 'wsgi.file_wrapper' extension should be used, or a wrapper
which suitably breaks the response into chunks.

In order to identify where code may be inadvertently returning such iterable
types, the following code can be used::

    import io
    import socket
    import sys

    BAD_ITERABLES = [
      bytes,
      socket.socket,
      io.IOBase,
    ]

    class ValidatingMiddleware:

        def __init__(self, application):
            self.__application = application

        def __call__(self, environ, start_response):
            errors = environ['wsgi.errors']

            result = self.__application(environ, start_response)

            value = type(result)
            if value in BAD_ITERABLES:
                print('BAD ITERABLE RETURNED: ', file=errors, end='')
                print('URL=%s ' % environ['REQUEST_URI'], file=errors, end='')
                print('TYPE=%s' % value, file=errors)

            return result

    def application(environ, start_response):
        ...

    application = ValidatingMiddleware(application)

Profiling Code Execution
------------------------

When a WSGI application is slower than expected and the
cause is not obvious from the patterns covered above,
profiling is the next step. The standard library
``cProfile`` module measures call counts and total /
inline time per function for the duration the profiler is
running.

The pattern that works for an Apache + mod_wsgi
deployment is to start the profiler at WSGI script load
time and dump its data on process shutdown. A minimal
version that can be added to any WSGI script::

    import os
    import time

    import mod_wsgi
    from cProfile import Profile

    _profiler = Profile()
    _profiler.enable()

    def _dump_profile(*args, **kwargs):
        _profiler.disable()
        output = '%d-%d.pstats' % (
                int(time.time() * 1000000), os.getpid())
        _profiler.dump_stats(os.path.join('/tmp/pstats', output))

    mod_wsgi.subscribe_shutdown(_dump_profile)

This produces a ``.pstats`` file in ``/tmp/pstats/``
(which must already exist) every time the daemon process
shuts down. The filename includes a microsecond timestamp
and the process ID so concurrent workers do not collide.
``mod_wsgi.subscribe_shutdown()`` is preferred over
``atexit`` for dumping data on shutdown because it fires
reliably regardless of interpreter-shutdown ordering; see
:doc:`registering-cleanup-code`.

Inspect the data with the standard library's ``pstats``
module, or with ``snakeviz`` (browser-rendered icicle /
sunburst) or ``gprof2dot`` plus Graphviz for a
call-graph visualisation.

For meaningful results the profiler should observe a
single process; with multiple daemon processes the
measurements split across them. Restrict the daemon to
one process before profiling::

    WSGIDaemonProcess myapp processes=1 threads=N

Under ``mod_wsgi-express`` the equivalent of the manual
recipe is automated by ``--enable-profiler``. It implies
``--debug-mode`` (single process, single-threaded), and
writes ``.pstats`` files under ``<server-root>/pstats/``
by default; override with ``--profiler-directory``::

    mod_wsgi-express start-server wsgi.py \
        --enable-profiler \
        --profiler-directory /tmp/pstats

For profiling under realistic concurrency the manual
recipe above is preferable, since ``--enable-profiler``
forces single-process / single-threaded operation.

Measuring Code Coverage
-----------------------

Code coverage measurement records which Python source
lines actually execute during a run. The third-party
``coverage`` package (installable from PyPI) plugs into
mod_wsgi the same way profiling does: start it at WSGI
script load time and emit its report on process
shutdown::

    import mod_wsgi
    from coverage import coverage

    _cov = coverage()
    _cov.start()

    def _dump_coverage(*args, **kwargs):
        _cov.stop()
        _cov.html_report(directory='/tmp/htmlcov')

    mod_wsgi.subscribe_shutdown(_dump_coverage)

This writes an HTML coverage report under
``/tmp/htmlcov/`` (created on demand) every time the
daemon process shuts down; open ``index.html`` in a
browser to see per-line coverage with red / green
highlighting. As with the profiler recipe above,
``mod_wsgi.subscribe_shutdown()`` is the recommended way
to register the dump callback rather than
``atexit.register()``; see :doc:`registering-cleanup-code`.

As with profiling, multi-process operation produces one
report per process. Restrict the daemon to a single
process before measuring::

    WSGIDaemonProcess myapp processes=1 threads=N

Under ``mod_wsgi-express`` the equivalent is
``--enable-coverage``, which implies ``--debug-mode``.
The output directory defaults to
``<server-root>/htmlcov/`` and is overridable with
``--coverage-directory``::

    mod_wsgi-express start-server wsgi.py \
        --enable-coverage \
        --coverage-directory /tmp/htmlcov

The ``coverage`` package must be installed in the same
environment as ``mod_wsgi-express`` for the express form
to work.

Error Catching Middleware
-------------------------

Because mod_wsgi only logs details of uncaught exceptions to the
Apache error log and returns a generic HTTP 500 "Internal Server
Error" response, if you want the details of any exception to be
displayed in the browser, you will need to use a WSGI error-catching
middleware component or rely on the framework's own debug page.

If you are using a Python web framework, the simplest path is the
framework's built-in debug mode — for example Django with
``DEBUG = True`` in ``settings.py``, Flask with
``app.run(debug=True)`` or ``app.config['DEBUG'] = True``, or any
WSGI framework's equivalent setting. These all render exception
tracebacks back to the browser, often with interactive
introspection.

If you are running a bare WSGI application without a framework, the
canonical generic option is Werkzeug's ``DebuggedApplication``::

    def application(environ, start_response):
        status = '200 OK'
        output = b'Hello World!\n'

        response_headers = [('Content-type', 'text/plain'),
                            ('Content-Length', str(len(output)))]
        start_response(status, response_headers)

        return [output]

    from werkzeug.debug import DebuggedApplication
    application = DebuggedApplication(application, evalex=True)

This wraps the application so that any uncaught exception produces an
interactive traceback in the browser, with clickable frames and an
inline Python REPL bound to the selected frame's locals when
``evalex=True``.

Note that error-catching middleware is of no use for capturing errors
that occur at global scope when the WSGI application script is being
imported. Details of those errors will only be captured in the Apache
error log. As much as possible you should avoid performing complicated
tasks at import time and defer such actions to the first request
instead, so that error-catching middleware can capture them.

Debug mode of any of the options above must only be used during
development and never in production. The traceback page exposes
source code, locals, and (with ``evalex=True``) an arbitrary Python
REPL — anyone who can hit the URL can run code in your application
process.

Python Interactive Debugger
---------------------------

Python debuggers such as implemented by the 'pdb' module can sometimes be
useful in debugging Python applications, especially where there is a need
to single step through code and analyse application state at each point.
Use of such debuggers in web applications can be a bit more tricky than
normal applications though and especially so with mod_wsgi.

The problem with mod_wsgi is that the Apache web server can create multiple
child processes to respond to requests. Partly because of this, but also
just to prevent problems in general, Apache closes off standard input at
startup. Thus there is no actual way to interact with the Python debugger
module if it were used.

To get around this requires having complete control of the Apache web
server that you are using to host your WSGI application. In particular, it
will be necessary to shutdown the web server and then startup the 'httpd'
process explicitly in single process debug mode, avoiding the 'apachectl'
management application altogether::

    $ apachectl stop
    $ httpd -X

If Apache is normally started as the 'root' user, this also will need to be
run as the 'root' user otherwise the Apache web server will not have the
required permissions to write to its log directories etc.

The result of starting the 'httpd' process in this way will be that the
Apache web server will run everything in one process rather than using
multiple processes. Further, it will not close off standard input thus
allowing the Python debugger to be used.

Do note though that one cannot be using the ability of mod_wsgi to run
your application in a daemon process when doing this. The WSGI application
must be running within the main Apache process.

To trigger the Python debugger for any call within your code, the following
customised wrapper for the 'Pdb' class should be used::

    class Debugger:

        def __init__(self, object):
            self.__object = object

        def __call__(self, *args, **kwargs):
            import pdb, sys
            debugger = pdb.Pdb()
            debugger.use_rawinput = 0
            debugger.reset()
            sys.settrace(debugger.trace_dispatch)

            try:
                return self.__object(*args, **kwargs)
            finally:
                debugger.quitting = 1
                sys.settrace(None)

This might for example be used to wrap the actual WSGI application callable
object::

    def application(environ, start_response):
        status = '200 OK'
        output = b'Hello World!\n'

        response_headers = [('Content-type', 'text/plain'),
                            ('Content-Length', str(len(output)))]
        start_response(status, response_headers)

        return [output]

    application = Debugger(application)

When a request is now received, the Python debugger will be triggered and
you can interactively debug your application from the window you ran the
'httpd' process. For example::

    > /usr/local/wsgi/scripts/hello.py(21)application()
    -> status = '200 OK'

    (Pdb) list
     16             finally:
     17                 debugger.quitting = 1
     18                 sys.settrace(None)
     19
     20     def application(environ, start_response):
     21  ->     status = '200 OK'
     22         output = b'Hello World!\n'
     23
     24         response_headers = [('Content-type', 'text/plain'),
     25                             ('Content-Length', str(len(output)))]
     26         start_response(status, response_headers)

    (Pdb) print start_response
    <built-in method start_response of mod_wsgi.Adapter object at 0x1160180>

    cont

When wishing to allow the request to complete, issue the 'cont' command. If
wishing to cause the request to abort, issue the 'quit' command. This will
result in a 'BdbQuit' exception being raised and would result in a HTTP
500 "Internal Server Error" response being returned to the client. To kill
off the whole 'httpd' process, after having issued 'cont' or 'quit' to exit
the debugger, interrupt the process using 'CTRL-C'.

To see what commands the Python debugger accepts, issue the 'help' command
and also consult the documentation for the 'pdb' module on the Python web
site.

Note that the Python debugger expects to be able to write to
``sys.stdout`` to display information to the terminal. If a web
framework or middleware replaces ``sys.stdout`` (capturing or
redirecting it), the Python debugger will not be usable until that
behaviour is disabled.

Under ``mod_wsgi-express`` two flags provide a similar but
narrower capability than the per-call ``settrace`` wrapper
above:

* ``--debug-mode`` runs Apache in single-process mode with
  stdin/stdout attached to the terminal. This is the
  equivalent of the ``apachectl stop`` plus ``httpd -X``
  flow described above; the express form does not require
  manual control of the underlying Apache.

* ``--enable-debugger`` wraps the WSGI application in a
  *post-mortem* ``pdb`` handler. Rather than stepping
  through every line, it activates ``pdb`` only when an
  exception propagates out of the application. The
  controlling terminal then drops into
  ``pdb.Pdb().interaction(None, traceback)`` so the
  failing frame can be inspected. ``c`` or ``q`` lets the
  request finish with a 500 response and the server keeps
  running.

* ``--debugger-startup`` pairs with ``--enable-debugger``
  and drops into ``pdb`` once *immediately* after the WSGI
  script has been imported, before any request is served.
  Use this to set breakpoints (``b file.py:NN``) on
  application code, then ``c`` to resume into normal
  request handling.

``--enable-debugger`` and ``--debugger-startup`` both
imply ``--debug-mode``. For the per-line ``settrace``
pattern the manual ``Debugger`` wrapper above is still
the right tool.

Browser Based Debugger
----------------------

In order to use the Python debugger modules you need direct access
to the host and the Apache web server running your WSGI application.
If your only access to the system is via your web browser, the full
``pdb`` debugger described above is not usable.

The browser-based equivalent is Werkzeug's ``DebuggedApplication``
already described in `Error Catching Middleware`_, with the
``evalex=True`` option enabled::

    from werkzeug.debug import DebuggedApplication
    application = DebuggedApplication(application, evalex=True)

When an unexpected exception occurs, the resulting traceback page
allows you to inspect the local variables in each stack frame and to
type Python code which is evaluated against the locals of the
selected frame. This gives the same investigative capability as
``pdb`` without needing terminal access.

For this to work reliably, subsequent requests from the traceback
page must reach the same process where the error originally
occurred. mod_wsgi can route requests across multiple processes, so
the application must be configured to use only one.

The standard way to ensure this is to put the WSGI application in a
mod_wsgi daemon process group with a single process::

    WSGIDaemonProcess mydebug processes=1 threads=N
    WSGIProcessGroup mydebug

(Choose ``threads`` to suit the application — multithreaded is fine.)

If embedded mode is unavoidable (for example on Windows, where
daemon mode is not available), restrict Apache itself to a single
child process by adding the following at global scope to the main
Apache configuration::

    StartServers 1
    ServerLimit 1

Bear in mind this affects the whole Apache instance, not just the
WSGI application. With prefork MPM that means only one request at a
time is handled, which can break AJAX-heavy pages that rely on
parallel requests; with worker or event MPM other requests can still
be handled by additional threads in the single child process.

Whichever configuration is used, the browser-based interactive
debugger must only be used on a development system. It must never be
deployed to a production environment or any shared host: the
``evalex=True`` REPL allows arbitrary Python code execution by
anyone who can reach the URL.

Debugging Crashes With GDB
--------------------------

In cases where Apache itself crashes for no apparent reason, the above
techniques are not always particularly useful. This is especially the case
where the crash occurs in non Python code outside of your WSGI application.

The most common cause of Apache crashing, besides any still-latent
bugs in mod_wsgi, is shared library version mismatch. The other
major cause is third-party C extension modules that are not
compatible with running in a Python sub interpreter (rather than the
main interpreter), or that cache references to one interpreter and
then misbehave when used from another.

A classic shared-library mismatch is between mod_ssl statically
linked into Apache against one OpenSSL version, and the Python
``ssl`` module dynamically linked against a different OpenSSL
version. The two copies of the OpenSSL state collide, and the
process can segfault or behave nonsensically. Background and
workaround are covered under "Anaconda Python Conflicting With
System Shared Libraries" in
:doc:`../user-guides/installation-issues`. The same class of
problem can occur with any other library that gets linked into both
Apache (or another Apache module) and a Python C extension at
incompatible versions.

For the C-extensions-and-sub-interpreters case, the prominent
modern examples are NumPy, SciPy and modules built on top of them.
Symptoms range from import errors to deadlocks to crashes. The
workaround is to force the affected WSGI application to run in the
main Python interpreter by setting ``WSGIApplicationGroup
%{GLOBAL}``. Failure modes and trade-offs are discussed under
"WSGIApplicationGroup and C extension modules" in
:doc:`../user-guides/configuration-issues` and "Multiple Python Sub
Interpreters" in :doc:`../user-guides/application-issues`.

Whatever the reason, in some cases the only way to determine why Apache or
Python is crashing is to use a C code debugger such as 'gdb'. Now although
it is possible to attach 'gdb' to a running process, the preferred method
for using 'gdb' in conjunction with Apache is to run Apache in single
process debug mode from within 'gdb'.

To do this it is necessary to first shutdown Apache. The 'gdb' debugger can
then be started against the 'httpd' executable and then the process started
up from inside of 'gdb' with the `-X` flag to select single-process debug
mode::

    $ /usr/local/apache/bin/apachectl stop
    $ sudo gdb /usr/local/apache/bin/httpd
    GNU gdb 6.1-20040303 (Apple version gdb-384) (Mon Mar 21 00:05:26 GMT 2005)
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB.  Type "show warranty" for details.
    This GDB was configured as "powerpc-apple-darwin"...Reading symbols for shared
    libraries ........ done

    (gdb) run -X
    Starting program: /usr/local/apache/bin/httpd -X
    Reading symbols for shared libraries .+++ done
    Reading symbols for shared libraries ..................... done

If Apache is normally started as the 'root' user, this also will need to be
run as the 'root' user otherwise the Apache web server will not have the
required permissions to write to its log directories etc.

If Apache was crashing on startup, you should immediately encounter the
error, otherwise use your web browser to access the URL which is causing
the crash to occur. You can then commence trying to debug why the crash is
occurring.

Note that you should ensure that you have not assigned your WSGI
application to run in a mod_wsgi daemon process using the WSGIDaemonProcess
and WSGIProcessGroup directives. This is because the above procedure will
only catch crashes which occur when the application is running in embedded
mode. If it turns out that the application only crashes when run in mod_wsgi
daemon mode, an alternate method of using 'gdb' will be required.

Under ``mod_wsgi-express`` the embedded-mode flow above
(``apachectl stop`` plus ``gdb httpd`` plus ``run -X``) is
automated by the ``--enable-gdb`` flag. The generated
``apachectl`` wrapper substitutes ``gdb`` for the direct
``httpd`` exec, with a small ``gdb.cmds`` script
supplying ``run`` plus the single-process Apache argv.
``--gdb-executable FILE-PATH`` overrides the path to the
``gdb`` binary if needed. ``--enable-gdb`` implies
``--debug-mode`` and so only catches crashes occurring in
embedded mode; the daemon-mode attach-by-pid procedure
described next has no equivalent express flag.

In this circumstance you should run Apache as normal, but ensure that you
only create one mod_wsgi daemon process and have it use only a single
thread::

    WSGIDaemonProcess debug threads=1
    WSGIProcessGroup debug

If not running the daemon process as a distinct user where you can
tell which process it is, you will also need to ensure that Apache
``LogLevel`` has been set high enough that mod_wsgi's startup
messages are emitted (``LogLevel warn wsgi:info`` is sufficient).
This is needed so that information about daemon processes created
by mod_wsgi appears in the Apache error log. You will need to
consult the error log to determine the process ID of the daemon
process that has been created for that daemon process group. The
relevant entry looks like::

    [Wed Apr 29 12:34:56.123456 2026] [wsgi:info] [pid 666:tid 0x...] Starting process 'debug' with threads=1.

The PID is supplied by Apache's standard log framing (``[pid 666:tid
...]``).

Knowing the process ID, you should then run 'gdb', telling it to attach
directly to the daemon process::

    $ sudo gdb /usr/local/apache/bin/httpd 666
    GNU gdb 6.1-20040303 (Apple version gdb-384) (Mon Mar 21 00:05:26 GMT 2005)
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB.  Type "show warranty" for details.
    This GDB was configured as "powerpc-apple-darwin"...Reading symbols for shared
    libraries ........ done

    /Users/grahamd/666: No such file or directory.
    Attaching to program: `/usr/local/apache/bin/httpd', process 666.
    Reading symbols for shared libraries .+++..................... done
    0x900c7060 in sigwait ()
    (gdb) cont
    Continuing.

Once 'gdb' has been started and attached to the process, then initiate the
request with the URL that causes the application to crash.

Attaching to the running daemon process can also be useful where a single
request or the whole process is appearing to hang. In this case one can
force a stack trace to be output for all running threads to try and
determine what code is getting stuck. The appropriate gdb command in this
instance is 'thread apply all bt'::

    sudo gdb /usr/local/apache-2.2/bin/httpd 666
    GNU gdb 6.3.50-20050815 (Apple version gdb-477) (Sun Apr 30 20:06:22 GMT 2006)
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB.  Type "show warranty" for details.
    This GDB was configured as "powerpc-apple-darwin"...Reading symbols
    for shared libraries ....... done

    /Users/grahamd/666: No such file or directory.
    Attaching to program: `/usr/local/apache/bin/httpd', process 666.
    Reading symbols for shared libraries .+++..................... done
    0x900c7060 in sigwait ()
    (gdb) thread apply all bt

    Thread 4 (process 666 thread 0xd03):
    #0  0x9001f7ac in select ()
    #1  0x004189b4 in apr_pollset_poll (pollset=0x1894650,
        timeout=-1146117585187099488, num=0xf0182d98, descriptors=0xf0182d9c)
        at poll/unix/select.c:363
    #2  0x002a57f0 in wsgi_daemon_thread (thd=0x1889660, data=0x18895e8)
        at mod_wsgi.c:6980
    #3  0x9002bc28 in _pthread_body ()

    Thread 3 (process 666 thread 0xc03):
    #0  0x9001f7ac in select ()
    #1  0x0041d224 in apr_sleep (t=1000000) at time/unix/time.c:246
    #2  0x002a2b10 in wsgi_deadlock_thread (thd=0x0, data=0x2aee68) at
        mod_wsgi.c:7119
    #3  0x9002bc28 in _pthread_body ()

    Thread 2 (process 666 thread 0xb03):
    #0  0x9001f7ac in select ()
    #1  0x0041d224 in apr_sleep (t=299970002) at time/unix/time.c:246
    #2  0x002a2dec in wsgi_monitor_thread (thd=0x0, data=0x18890e8) at
        mod_wsgi.c:7197
    #3  0x9002bc28 in _pthread_body ()

    Thread 1 (process 666 thread 0x203):
    #0  0x900c7060 in sigwait ()
    #1  0x0041ba9c in apr_signal_thread (signal_handler=0x2a29a0
        <wsgi_check_signal>) at threadproc/unix/signals.c:383
    #2  0x002a3728 in wsgi_start_process (p=0x1806418, daemon=0x18890e8)
        at mod_wsgi.c:7311
    #3  0x002a6a4c in wsgi_hook_init (pconf=0x1806418, ptemp=0x0,
        plog=0xc8, s=0x18be8d4) at mod_wsgi.c:7716
    #4  0x0000a5b0 in ap_run_post_config (pconf=0x1806418, plog=0x1844418,
        ptemp=0x180e418, s=0x180da78) at config.c:91
    #5  0x000033d4 in main (argc=3, argv=0xbffffa8c) at main.c:706

It is suggested when trying to debug such issues that the daemon process be
made to run with only a single thread. This will reduce how many stack
traces one needs to analyse.

If you are running with multiple processes within the daemon process group
and all requests are hanging, you will need to get a snapshot of what is
happening in all processes in the daemon process group. Because doing this
by hand will be tedious, it is better to automate it.

To automate capturing the stack traces, first create a file called 'gdb.cmds'
which contains the following::

    set pagination 0
    thread apply all bt
    detach
    quit

This can then be used in conjunction with 'gdb' to avoid needing to enter
the commands manually. For example::

    sudo gdb /usr/local/apache-2.2/bin/httpd -x gdb.cmds -p 666

To be able to automate this further and apply it to all processes in a
daemon process group, then first off ensure that daemon processes are named
in 'ps' output by using the 'display-name' option to WSGIDaemonProcess
directive.

For example, to apply the default naming strategy implemented by
mod_wsgi, use::

    WSGIDaemonProcess xxx display-name=%{GROUP}

In the output of a BSD derived ``ps`` command, this will now show
the process as being named ``(wsgi:xxx)``::

    $ ps -cxo command,pid | grep wsgi
    (wsgi:xxx)        666

Note that the name may be truncated as the resultant name can be no longer
than what was the length of the original executable path for Apache. You
may therefore like to name it explicitly::

    WSGIDaemonProcess xxx display-name=(wsgi:xxx)

Having named the processes in the daemon process group, we can now parse the
output of 'ps' to identify the process and apply the 'gdb' command script to
each::

    for pid in `ps -cxo command,pid | awk '{ if ($0 ~ /wsgi:xxx/ && $1 !~ /grep/) print $NF }'`; do sudo gdb -x gdb.cmds -p $pid; done

The actual name given to the daemon process group using the 'display-name'
option should be replaced in this command line. That is, change 'wsgi:xxx'
appropriately.

If you are having problems with process in daemon process groups hanging,
you might consider implementing a monitoring system which automatically
detects somehow when the processes are no longer responding to requests and
automatically trigger this dump of the stack traces before restarting the
daemon process group or Apache.

Extracting Python Stack Traces
------------------------------

Using gdb to get stack traces as described above only gives you information
about what is happening at the C code level. This will not tell where in the
actual Python code execution was at. Your only clue is going to be where a
call out was being made to some distinct C function in a C extension module
for Python.

A small helper that walks every active Python thread's stack and writes
it to the Apache error log::

    import os
    import sys
    import traceback

    def dump_stack_traces():
        code = []
        for thread_id, stack in sys._current_frames().items():
            code.append(f'\n# ProcessId: {os.getpid()}')
            code.append(f'# ThreadID: {thread_id}')
            for filename, lineno, name, line in traceback.extract_stack(stack):
                code.append(f'File: "{filename}", line {lineno}, in {name}')
                if line:
                    code.append(f'  {line.strip()}')

        for line in code:
            print(line, file=sys.stderr)

The caveat here obviously is that the process has to still be running. There
is also the issue of how you trigger that function to dump stack traces for
executing Python threads.

If the problem you have is that some request handler threads are stuck,
either blocked, or stuck in an infinite loop, and you want to know what they
are doing, then so long as there are still some handler threads left and
the application is still responding to requests, then you could trigger it
from a request handler triggered by making a request against a specific URL.

This though depends on you only running your application within a single
process because as soon as you have multiple processes you have no guarantee
that a request will go to the process you want to debug.

The two patterns below show the recommended triggers for daemon mode and
embedded mode respectively. Both invoke the same ``dump_stack_traces()``
helper defined above.

In daemon mode the supported trigger is to register a ``SIGUSR2``
subscriber via ``mod_wsgi.subscribe_signals``. The signal is delivered
to each daemon process independently, so a stack dump fires in whichever
process the operator sent the signal to, and all processes can be dumped
at once by sending the signal to every PID in the daemon group. See
:doc:`subscribing-to-events` for the full subscription API. Add the
following to your WSGI script file::

    import mod_wsgi

    @mod_wsgi.subscribe_signals
    def _on_signal(name, *, signame, **event):
        if signame == 'SIGUSR2':
            dump_stack_traces()

For service-script daemons (``WSGIDaemonProcess threads=0``)
``mod_wsgi.subscribe_signals`` is inert (the dispatcher
infrastructure is not started when there are no worker
threads). Use Python's ``signal`` module directly in the
service script instead, since signal-handler registration is
not intercepted for service scripts::

    import signal

    signal.signal(signal.SIGUSR2, lambda signum, frame: dump_stack_traces())

See the service-script notes in :doc:`subscribing-to-events`.

Sending ``SIGUSR2`` to a daemon process will then cause stack traces for
active Python threads to be written to the Apache error log::

    kill -USR2 <daemon-pid>

The daemon PIDs are visible in the Apache error log near startup
(``LogLevel wsgi:info`` or higher) and can also be listed via ``ps`` if
the daemon group is configured with ``display-name=`` (see
:doc:`../configuration-directives/WSGIDaemonProcess`).

In embedded mode ``mod_wsgi.subscribe_signals`` cannot deliver callbacks
(see "Signal delivery is daemon-mode only" in
:doc:`subscribing-to-events`), so the trigger has to come from inside the
process. A long-lived background thread that polls the modification time
of a sentinel file is the standard fallback: touching the file from the
shell makes the next poll fire ``dump_stack_traces()``. Add the
following to your WSGI script file::

    import os
    import threading

    from queue import Queue, Empty

    SENTINEL_FILE = '/tmp/dump-stack-traces.txt'

    _interval = 1.0
    _queue = Queue()
    _lock = threading.Lock()
    _running = False

    def _monitor():
        try:
            mtime = os.path.getmtime(SENTINEL_FILE)
        except OSError:
            mtime = None

        while True:
            try:
                current = os.path.getmtime(SENTINEL_FILE)
            except OSError:
                current = None

            if current != mtime:
                mtime = current
                dump_stack_traces()

            try:
                return _queue.get(timeout=_interval)
            except Empty:
                pass

    def _start():
        global _running
        with _lock:
            if not _running:
                print(f'monitor (pid={os.getpid()}):'
                      f' Starting stack trace monitor.',
                      file=sys.stderr)
                _running = True
                threading.Thread(target=_monitor, daemon=True).start()

    _start()

Once your WSGI script file has been loaded, touching the file
``/tmp/dump-stack-traces.txt`` will cause every Apache child process that
has the script imported to write its active Python stack traces to the
error log.

For either trigger, if you have multiple processes serving the
application and send the trigger to all of them at once, the dumps will
interleave in the Apache error log. ``# ProcessId:`` lines in each dump
identify which process the trace came from. For more separable output,
modify ``dump_stack_traces()`` to write each dump to a distinct file
under a chosen directory, with filenames that include the process ID and
a date/time.

An example of what one might expect to see from the above code is as
follows::

    # ProcessId: 666
    # ThreadID: 140234567890432
    File: "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap
      self._bootstrap_inner()
    File: "/usr/lib/python3.12/threading.py", line 1115, in _bootstrap_inner
      self.run()
    File: "/usr/lib/python3.12/threading.py", line 1052, in run
      self._target(*self._args, **self._kwargs)
    File: "/srv/www/myproject/wsgi.py", line 72, in _monitor
      dump_stack_traces()
    File: "/srv/www/myproject/wsgi.py", line 47, in dump_stack_traces
      for filename, lineno, name, line in traceback.extract_stack(stack):

    # ThreadID: 140234567890123
    File: "/srv/www/myproject/wsgi.py", line 21, in application
      return _application(environ, start_response)
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/django/core/handlers/wsgi.py", line 145, in __call__
      response = self.get_response(request)
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/django/contrib/sessions/middleware.py", line 60, in process_response
      request.session.save()
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/django/contrib/sessions/backends/db.py", line 80, in save
      obj.save(force_insert=must_create, using=using)
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/django/db/models/base.py", line 812, in save
      self.save_base(using=using, force_insert=force_insert, force_update=force_update)
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/django/db/models/base.py", line 880, in save_base
      updated = self._save_table(force_insert=force_insert, using=using)
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/django/db/models/base.py", line 1015, in _save_table
      results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/django/db/models/sql/compiler.py", line 1517, in execute_sql
      cursor.execute(sql, params)
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/django/db/backends/utils.py", line 89, in execute
      return super().execute(sql, params)
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/debug_toolbar/panels/sql/tracking.py", line 230, in execute
      return self._record(self.cursor.execute, sql, params)
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/debug_toolbar/panels/sql/tracking.py", line 158, in _record
      stacktrace = tidy_stacktrace(reversed(get_stack()))
    File: "/srv/www/myproject/venv/lib/python3.12/site-packages/debug_toolbar/panels/sql/tracking.py", line 39, in tidy_stacktrace
      s_path = os.path.realpath(s[0])
    File: "/usr/lib/python3.12/posixpath.py", line 396, in realpath
      path, ok = _joinrealpath(filename[:0], filename, strict, {})
    File: "/usr/lib/python3.12/posixpath.py", line 426, in _joinrealpath
      st = os.lstat(newpath)

Note that one of the displayed threads will be that for the thread which is
dumping the stack traces. That stack trace can obviously be ignored.

One could extend the above recipe in more elaborate ways by using a WSGI
middleware that capture details of each request from the WSGI environment
and also dumping out from that the URL for the request being handled by
any threads. This may assist in working out whether problems are related
to a specific URL.