diff options
Diffstat (limited to 'Documentation/admin-guide/reporting-issues.rst')
-rw-r--r-- | Documentation/admin-guide/reporting-issues.rst | 1631 |
1 files changed, 1631 insertions, 0 deletions
diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst new file mode 100644 index 000000000000..07879d01fe68 --- /dev/null +++ b/Documentation/admin-guide/reporting-issues.rst @@ -0,0 +1,1631 @@ +.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) +.. + If you want to distribute this text under CC-BY-4.0 only, please use 'The + Linux kernel developers' for author attribution and link this as source: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-issues.rst +.. + Note: Only the content of this RST file as found in the Linux kernel sources + is available under CC-BY-4.0, as versions of this text that were processed + (for example by the kernel's build system) might contain content taken from + files which use a more restrictive license. + +.. important:: + + This document is being prepared to replace + 'Documentation/admin-guide/reporting-bugs.rst'. The main work is done and + you are already free to follow its instructions when reporting issues to the + Linux kernel developers. But keep in mind, below text still needs a few + finishing touches and review. It was merged to the Linux kernel sources at + this stage to make this process easier and increase the text's visibility. + + Any improvements for the text or other feedback is thus very much welcome. + Please send it to 'Thorsten Leemhuis <linux@leemhuis.info>' and 'Jonathan + Corbet <corbet@lwn.net>', ideally with 'Linux kernel mailing list (LKML) + <linux-kernel@vger.kernel.org>' and the 'Linux Kernel Documentation List + <linux-doc@vger.kernel.org>' in CC. + + Areas in the text that still need work or discussion contain a hint like this + which point out the remaining issues; all of them start with the word "FIXME" + to make them easy to find. + + +Reporting issues +++++++++++++++++ + + +The short guide (aka TL;DR) +=========================== + +If you're facing multiple issues with the Linux kernel at once, report each +separately to its developers. Try your best guess which kernel part might be +causing the issue. Check the :ref:`MAINTAINERS <maintainers>` file for how its +developers expect to be told about issues. Note, it's rarely +`bugzilla.kernel.org <https://bugzilla.kernel.org/>`_, as in almost all cases +the report needs to be sent by email! + +Check the destination thoroughly for existing reports; also search the LKML +archives and the web. Join existing discussion if you find matches. If you +don't find any, install `the latest Linux mainline kernel +<https://kernel.org/>`_. Make sure it's vanilla, thus is not patched or using +add-on kernel modules. Also ensure the kernel is running in a healthy +environment and is not already tainted before the issue occurs. + +If you can reproduce your issue with the mainline kernel, send a report to the +destination you determined earlier. Make sure it includes all relevant +information, which in case of a regression should mention the change that's +causing it which can often can be found with a bisection. Also ensure the +report reaches all people that need to know about it, for example the security +team, the stable maintainers or the developers of the patch that causes a +regression. Once the report is out, answer any questions that might be raised +and help where you can. That includes keeping the ball rolling: every time a +new rc1 mainline kernel is released, check if the issue is still happening +there and attach a status update to your initial report. + +If you can not reproduce the issue with the mainline kernel, consider sticking +with it; if you'd like to use an older version line and want to see it fixed +there, first make sure it's still supported. Install its latest release as +vanilla kernel. If you cannot reproduce the issue there, try to find the commit +that fixed it in mainline or any discussion preceding it: those will often +mention if backporting is planed or considered too complex. If backporting was +not discussed, ask if it's in the cards. In case you don't find any commits or +a preceding discussion, see the Linux-stable mailing list archives for existing +reports, as it might be a regression specific to the version line. If it is, +report it like you would report a problem in mainline (including the +bisection). + +If you reached this point without a solution, ask for advice one the +subsystem's mailing list. + + +Step-by-step guide how to report issues to the kernel maintainers +================================================================= + +The above TL;DR outlines roughly how to report issues to the Linux kernel +developers. It might be all that's needed for people already familiar with +reporting issues to Free/Libre & Open Source Software (FLOSS) projects. For +everyone else there is this section. It is more detailed and uses a +step-by-step approach. It still tries to be brief for readability and leaves +out a lot of details; those are described below the step-by-step guide in a +reference section, which explains each of the steps in more detail. + +Note: this section covers a few more aspects than the TL;DR and does things in +a slightly different order. That's in your interest, to make sure you notice +early if an issue that looks like a Linux kernel problem is actually caused by +something else. These steps thus help to ensure the time you invest in this +process won't feel wasted in the end: + + * Stop reading this document and report the problem to your vendor instead, + unless you are running the latest mainline kernel already or are willing to + install it. This kernel must not be modified or enhanced in any way, and + thus be considered 'vanilla'. + + * See if the issue you are dealing with qualifies as regression, security + issue, or a really severe problem: those are 'issues of high priority' that + need special handling in some steps that are about to follow. + + * Check if your kernel was 'tainted' when the issue occurred, as the event + that made the kernel set this flag might be causing the issue you face. + + * Locate the driver or kernel subsystem that seems to be causing the issue. + Find out how and where its developers expect reports. Note: most of the + time this won't be bugzilla.kernel.org, as issues typically need to be sent + by mail to a maintainer and a public mailing list. + + * Search the archives of the bug tracker or mailing list in question + thoroughly for reports that might match your issue. Also check if you find + something with your favorite internet search engine or in the Linux Kernel + Mailing List (LKML) archives. If you find anything, join the discussion + instead of sending a new report. + + * Create a fresh backup and put system repair and restore tools at hand. + + * Ensure your system does not enhance its kernels by building additional + kernel modules on-the-fly, which solutions like DKMS might be doing locally + without your knowledge. + + * Make sure it's not the kernel's surroundings that are causing the issue + you face. + + * Write down coarsely how to reproduce the issue. If you deal with multiple + issues at once, create separate notes for each of them and make sure they + work independently on a freshly booted system. That's needed, as each issue + needs to get reported to the kernel developers separately, unless they are + strongly entangled. + +After these preparations you'll now enter the main part: + + * Install the latest Linux mainline kernel: that's where all issues get + fixed first, because it's the version line the kernel developers mainly + care about. Testing and reporting with the latest Linux stable kernel can + be an acceptable alternative in some situations, for example during the + merge window; but during that period you might want to suspend your efforts + till its end anyway. + + * Ensure the kernel you just installed does not 'taint' itself when + running. + + * Reproduce the issue with the kernel you just installed. If it doesn't show + up there, head over to the instructions for issues only happening with + stable and longterm kernels. + + * Optimize your notes: try to find and write the most straightforward way to + reproduce your issue. Make sure the end result has all the important + details, and at the same time is easy to read and understand for others + that hear about it for the first time. And if you learned something in this + process, consider searching again for existing reports about the issue. + + * If the failure includes a stack dump, like an Oops does, consider decoding + it to find the offending line of code. + + * If your problem is a regression, try to narrow down when the issue was + introduced as much as possible. + + * Start to compile the report by writing a detailed description about the + issue. Always mention a few things: the latest kernel version you installed + for reproducing, the Linux Distribution used, and your notes on how to + reproduce the issue. Ideally, make the kernel's build configuration + (.config) and the output from ``dmesg`` available somewhere on the net and + link to it. Include or upload all other information that might be relevant, + like the output/screenshot of an Oops or the output from ``lspci``. Once + you wrote this main part, insert a normal length paragraph on top of it + outlining the issue and the impact quickly. On top of this add one sentence + that briefly describes the problem and gets people to read on. Now give the + thing a descriptive title or subject that yet again is shorter. Then you're + ready to send or file the report like the MAINTAINERS file told you, unless + you are dealing with one of those 'issues of high priority': they need + special care which is explained in 'Special handling for high priority + issues' below. + + * Wait for reactions and keep the thing rolling until you can accept the + outcome in one way or the other. Thus react publicly and in a timely manner + to any inquiries. Test proposed fixes. Do proactive testing: retest with at + least every first release candidate (RC) of a new mainline version and + report your results. Send friendly reminders if things stall. And try to + help yourself, if you don't get any help or if it's unsatisfying. + + +Reporting issues only occurring in older kernel version lines +------------------------------------------------------------- + +This section is for you, if you tried the latest mainline kernel as outlined +above, but failed to reproduce your issue there; at the same time you want to +see the issue fixed in older version lines or a vendor kernel that's regularly +rebased on new stable or longterm releases. If that case follow these steps: + + * Prepare yourself for the possibility that going through the next few steps + might not get the issue solved in older releases: the fix might be too big + or risky to get backported there. + + * Check if the kernel developers still maintain the Linux kernel version + line you care about: go to the front page of kernel.org and make sure it + mentions the latest release of the particular version line without an + '[EOL]' tag. + + * Check the archives of the Linux stable mailing list for existing reports. + + * Install the latest release from the particular version line as a vanilla + kernel. Ensure this kernel is not tainted and still shows the problem, as + the issue might have already been fixed there. + + * Search the Linux kernel version control system for the change that fixed + the issue in mainline, as its commit message might tell you if the fix is + scheduled for backporting already. If you don't find anything that way, + search the appropriate mailing lists for posts that discuss such an issue + or peer-review possible fixes; then check the discussions if the fix was + deemed unsuitable for backporting. If backporting was not considered at + all, join the newest discussion, asking if it's in the cards. + + * Check if you're dealing with a regression that was never present in + mainline by installing the first release of the version line you care + about. If the issue doesn't show up with it, you basically need to report + the issue with this version like you would report a problem with mainline + (see above). This ideally includes a bisection followed by a search for + existing reports on the net; with the help of the subject and the two + relevant commit-ids. If that doesn't turn up anything, write the report; CC + or forward the report to the stable maintainers, the stable mailing list, + and those who authored the change. Include the shortened commit-id if you + found the change that causes it. + + * One of the former steps should lead to a solution. If that doesn't work + out, ask the maintainers for the subsystem that seems to be causing the + issue for advice; CC the mailing list for the particular subsystem as well + as the stable mailing list. + + +Reference section: Reporting issues to the kernel maintainers +============================================================= + +The detailed guides above outline all the major steps in brief fashion, which +should be enough for most people. But sometimes there are situations where even +experienced users might wonder how to actually do one of those steps. That's +what this section is for, as it will provide a lot more details on each of the +above steps. Consider this as reference documentation: it's possible to read it +from top to bottom. But it's mainly meant to skim over and a place to look up +details how to actually perform those steps. + +A few words of general advice before digging into the details: + + * The Linux kernel developers are well aware this process is complicated and + demands more than other FLOSS projects. We'd love to make it simpler. But + that would require work in various places as well as some infrastructure, + which would need constant maintenance; nobody has stepped up to do that + work, so that's just how things are for now. + + * A warranty or support contract with some vendor doesn't entitle you to + request fixes from developers in the upstream Linux kernel community: such + contracts are completely outside the scope of the Linux kernel, its + development community, and this document. That's why you can't demand + anything such a contract guarantees in this context, not even if the + developer handling the issue works for the vendor in question. If you want + to claim your rights, use the vendor's support channel instead. When doing + so, you might want to mention you'd like to see the issue fixed in the + upstream Linux kernel; motivate them by saying it's the only way to ensure + the fix in the end will get incorporated in all Linux distributions. + + * If you never reported an issue to a FLOSS project before you should consider + reading `How to Report Bugs Effectively + <https://www.chiark.greenend.org.uk/~sgtatham/bugs.html>`_, `How To Ask + Questions The Smart Way + <http://www.catb.org/esr/faqs/smart-questions.html>`_, and `How to ask good + questions <https://jvns.ca/blog/good-questions/>`_. + +With that off the table, find below the details on how to properly report +issues to the Linux kernel developers. + + +Make sure you're using the upstream Linux kernel +------------------------------------------------ + + *Stop reading this document and report the problem to your vendor instead, + unless you are running the latest mainline kernel already or are willing to + install it. This kernel must not be modified or enhanced in any way, and + thus be considered 'vanilla'.* + +Like most programmers, Linux kernel developers don't like to spend time dealing +with reports for issues that don't even happen with the source code they +maintain: it's just a waste everybody's time, yours included. That's why you +later will have to test your issue with the latest 'vanilla' kernel: a kernel +that was build using the Linux sources taken straight from `kernel.org +<https://kernel.org/>`_ and not modified or enhanced in any way. + +Almost all kernels used in devices (Computers, Laptops, Smartphones, Routers, +…) and most kernels shipped by Linux distributors are ancient from the point of +kernel development and heavily modified. They thus do not qualify for reporting +an issue to the Linux kernel developers: the issue you face with such a kernel +might be fixed already or caused by the changes or additions, even if they look +small or totally unrelated. That's why issues with such kernels need to be +reported to the vendor that distributed it. Its developers should look into the +report and, in case it turns out to be an upstream issue, fix it directly +upstream or report it there. In practice that sometimes does not work out. If +that the case, you might want to circumvent the vendor by installing the latest +mainline kernel yourself and reporting the issue as outlined in this document; +just make sure to use really fresh kernel (see below). + + +.. note:: + + FIXME: Should we accept reports for issues with kernel images that are pretty + close to vanilla? But when are they close enough and how to put that line in + words? Maybe something like this? + + *Note: Some Linux kernel developers accept reports from vendor kernels that + are known to be close to upstream. That for example is often the case for + the kernels that Debian GNU/Linux Sid or Fedora Rawhide ship, which are + normally following mainline closely and carry only a few patches. So a + report with one of these might be accepted by the developers that need to + handle it. But if they do, depends heavily on the individual developers and + the issue at hand. That's why installing a mainline vanilla kernel is the + safe bet.* + + *Arch Linux, other Fedora releases, and openSUSE Tumbleweed often use quite + recent stable kernels that are pretty close to upstream, too. Some + developers accept bugs from them as well. But note that you normally should + avoid stable kernels for reporting issues and use a mainline kernel instead + (see below).* + + Are there any other major Linux distributions that should be mentioned here? + + +Issue of high priority? +----------------------- + + *See if the issue you are dealing with qualifies as regression, security + issue, or a really severe problem: those are 'issues of high priority' that + need special handling in some steps that are about to follow.* + +Linus Torvalds and the leading Linux kernel developers want to see some issues +fixed as soon as possible, hence there are 'issues of high priority' that get +handled slightly differently in the reporting process. Three type of cases +qualify: regressions, security issues, and really severe problems. + +You deal with a 'regression' if something that worked with an older version of +the Linux kernel does not work with a newer one or somehow works worse with it. +It thus is a regression when a WiFi driver that did a fine job with Linux 5.7 +somehow misbehaves with 5.8 or doesn't work at all. It's also a regression if +an application shows erratic behavior with a newer kernel, which might happen +due to incompatible changes in the interface between the kernel and the +userland (like procfs and sysfs). Significantly reduced performance or +increased power consumption also qualify as regression. But keep in mind: the +new kernel needs to be built with a configuration that is similar to the one +from the old kernel (see below how to achieve that). That's because the kernel +developers sometimes can not avoid incompatibilities when implementing new +features; but to avoid regressions such features have to be enabled explicitly +during build time configuration. + +What qualifies as security issue is left to your judgment. Consider reading +'Documentation/admin-guide/security-bugs.rst' before proceeding, as it +provides additional details how to best handle security issues. + +An issue is a 'really severe problem' when something totally unacceptably bad +happens. That's for example the case when a Linux kernel corrupts the data it's +handling or damages hardware it's running on. You're also dealing with a severe +issue when the kernel suddenly stops working with an error message ('kernel +panic') or without any farewell note at all. Note: do not confuse a 'panic' (a +fatal error where the kernel stop itself) with a 'Oops' (a recoverable error), +as the kernel remains running after the latter. + + +Check 'taint' flag +------------------ + + *Check if your kernel was 'tainted' when the issue occurred, as the event + that made the kernel set this flag might be causing the issue you face.* + +The kernel marks itself with a 'taint' flag when something happens that might +lead to follow-up errors that look totally unrelated. The issue you face might +be such an error if your kernel is tainted. That's why it's in your interest to +rule this out early before investing more time into this process. This is the +only reason why this step is here, as this process later will tell you to +install the latest mainline kernel; you will need to check the taint flag again +then, as that's when it matters because it's the kernel the report will focus +on. + +On a running system is easy to check if the kernel tainted itself: if ``cat +/proc/sys/kernel/tainted`` returns '0' then the kernel is not tainted and +everything is fine. Checking that file is impossible in some situations; that's +why the kernel also mentions the taint status when it reports an internal +problem (a 'kernel bug'), a recoverable error (a 'kernel Oops') or a +non-recoverable error before halting operation (a 'kernel panic'). Look near +the top of the error messages printed when one of these occurs and search for a +line starting with 'CPU:'. It should end with 'Not tainted' if the kernel was +not tainted when it noticed the problem; it was tainted if you see 'Tainted:' +followed by a few spaces and some letters. + +If your kernel is tainted, study 'Documentation/admin-guide/tainted-kernels.rst' +to find out why. Try to eliminate the reason. Often it's caused by one these +three things: + + 1. A recoverable error (a 'kernel Oops') occurred and the kernel tainted + itself, as the kernel knows it might misbehave in strange ways after that + point. In that case check your kernel or system log and look for a section + that starts with this:: + + Oops: 0000 [#1] SMP + + That's the first Oops since boot-up, as the '#1' between the brackets shows. + Every Oops and any other problem that happens after that point might be a + follow-up problem to that first Oops, even if both look totally unrelated. + Rule this out by getting rid of the cause for the first Oops and reproducing + the issue afterwards. Sometimes simply restarting will be enough, sometimes + a change to the configuration followed by a reboot can eliminate the Oops. + But don't invest too much time into this at this point of the process, as + the cause for the Oops might already be fixed in the newer Linux kernel + version you are going to install later in this process. + + 2. Your system uses a software that installs its own kernel modules, for + example Nvidia's proprietary graphics driver or VirtualBox. The kernel + taints itself when it loads such module from external sources (even if + they are Open Source): they sometimes cause errors in unrelated kernel + areas and thus might be causing the issue you face. You therefore have to + prevent those modules from loading when you want to report an issue to the + Linux kernel developers. Most of the time the easiest way to do that is: + temporarily uninstall such software including any modules they might have + installed. Afterwards reboot. + + 3. The kernel also taints itself when it's loading a module that resides in + the staging tree of the Linux kernel source. That's a special area for + code (mostly drivers) that does not yet fulfill the normal Linux kernel + quality standards. When you report an issue with such a module it's + obviously okay if the kernel is tainted; just make sure the module in + question is the only reason for the taint. If the issue happens in an + unrelated area reboot and temporarily block the module from being loaded + by specifying ``foo.blacklist=1`` as kernel parameter (replace 'foo' with + the name of the module in question). + + +Locate kernel area that causes the issue +---------------------------------------- + + *Locate the driver or kernel subsystem that seems to be causing the issue. + Find out how and where its developers expect reports. Note: most of the + time this won't be bugzilla.kernel.org, as issues typically need to be sent + by mail to a maintainer and a public mailing list.* + +It's crucial to send your report to the right people, as the Linux kernel is a +big project and most of its developers are only familiar with a small subset of +it. Quite a few programmers for example only care for just one driver, for +example one for a WiFi chip; its developer likely will only have small or no +knowledge about the internals of remote or unrelated "subsystems", like the TCP +stack, the PCIe/PCI subsystem, memory management or file systems. + +Problem is: the Linux kernel lacks a central bug tracker where you can simply +file your issue and make it reach the developers that need to know about it. +That's why you have to find the right place and way to report issues yourself. +You can do that with the help of a script (see below), but it mainly targets +kernel developers and experts. For everybody else the MAINTAINERS file is the +better place. + +How to read the MAINTAINERS file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +To illustrate how to use the :ref:`MAINTAINERS <maintainers>` file, lets assume +the WiFi in your Laptop suddenly misbehaves after updating the kernel. In that +case it's likely an issue in the WiFi driver. Obviously it could also be some +code it builds upon, but unless you suspect something like that stick to the +driver. If it's really something else, the driver's developers will get the +right people involved. + +Sadly, there is no way to check which code is driving a particular hardware +component that is both universal and easy. + +In case of a problem with the WiFi driver you for example might want to look at +the output of ``lspci -k``, as it lists devices on the PCI/PCIe bus and the +kernel module driving it:: + + [user@something ~]$ lspci -k + [...] + 3a:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32) + Subsystem: Bigfoot Networks, Inc. Device 1535 + Kernel driver in use: ath10k_pci + Kernel modules: ath10k_pci + [...] + +But this approach won't work if your WiFi chip is connected over USB or some +other internal bus. In those cases you might want to check your WiFi manager or +the output of ``ip link``. Look for the name of the problematic network +interface, which might be something like 'wlp58s0'. This name can be used like +this to find the module driving it:: + + [user@something ~]$ realpath --relative-to=/sys/module/ /sys/class/net/wlp58s0/device/driver/module + ath10k_pci + +In case tricks like these don't bring you any further, try to search the +internet on how to narrow down the driver or subsystem in question. And if you +are unsure which it is: just try your best guess, somebody will help you if you +guessed poorly. + +Once you know the driver or subsystem, you want to search for it in the +MAINTAINERS file. In the case of 'ath10k_pci' you won't find anything, as the +name is too specific. Sometimes you will need to search on the net for help; +but before doing so, try a somewhat shorted or modified name when searching the +MAINTAINERS file, as then you might find something like this:: + + QUALCOMM ATHEROS ATH10K WIRELESS DRIVER + Mail: A. Some Human <shuman@example.com> + Mailing list: ath10k@lists.infradead.org + Status: Supported + Web-page: https://wireless.wiki.kernel.org/en/users/Drivers/ath10k + SCM: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git + Files: drivers/net/wireless/ath/ath10k/ + +Note: the line description will be abbreviations, if you read the plain +MAINTAINERS file found in the root of the Linux source tree. 'Mail:' for +example will be 'M:', 'Mailing list:' will be 'L', and 'Status:' will be 'S:'. +A section near the top of the file explains these and other abbreviations. + +First look at the line 'Status'. Ideally it should be 'Supported' or +'Maintained'. If it states 'Obsolete' then you are using some outdated approach +that was replaced by a newer solution you need to switch to. Sometimes the code +only has someone who provides 'Odd Fixes' when feeling motivated. And with +'Orphan' you are totally out of luck, as nobody takes care of the code anymore. +That only leaves these options: arrange yourself to live with the issue, fix it +yourself, or find a programmer somewhere willing to fix it. + +After checking the status, look for a line starting with 'bugs:': it will tell +you where to find a subsystem specific bug tracker to file your issue. The +example above does not have such a line. That is the case for most sections, as +Linux kernel development is completely driven by mail. Very few subsystems use +a bug tracker, and only some of those rely on bugzilla.kernel.org. + + +.. note:: + + FIXME: The old text took a totally different approach to bugzilla.kernel.org, + as it mentions it as the place to file issue for people that don't known how + to contact the appropriate people. The new one mentions it rarely; and when + it does like here, it warns users that it's often the wrong place to go. + + This approach was chosen as the main author of this document noticed quite a + few users (or even a lot?) get no reply to the bugs they file in bugzilla. + That's kind of expected, as quite a few (many? most?) of the maintainers + don't even get notified when reports for their subsystem get filed there. And + not getting a single reply to report is something that is just annoying for + users and might make them angry. Improving bugzilla.k.o would be an option, + but on the kernel and maintainers summit 2017 it was agreed on to first go + this route (sorry it took so long): it's easier to achieve and less + controversial, as putting additional burden on already overworked maintainers + is unlikely to get well received. + + +In this and many other cases you thus have to look for lines starting with +'Mail:' instead. Those mention the name and the email addresses for the +maintainers of the particular code. Also look for a line starting with 'Mailing +list:', which tells you the public mailing list where the code is developed. +Your report later needs to go by mail to those addresses. Additionally, for all +issue reports sent by email, make sure to add the Linux Kernel Mailing List +(LKML) <linux-kernel@vger.kernel.org> to CC. Don't omit either of the mailing +lists when sending your issue report by mail later! Maintainers are busy people +and might leave some work for other developers on the subsystem specific list; +and LKML is important to have one place where all issue reports can be found. + + +.. note:: + + FIXME: Above section tells users to always CC LKML. These days it's a kind of + "catch-all" list anyway, which nearly nobody seems to follow closely. So it + seems appropriate to go "all in" and make people send their reports here, + too, as everything (reports, fixes, ...) then can be found in one place (at + least for all reports sent by mail and all subsystems that CC LKML). + + Related: Should we create mailing list like 'linux-issues@vger.kernel.org' + and tell users above to always CC it when reporting issues? Then there would + be one central place reporters could search for existing reports (at least + for issues reported by mail) without getting regular LKML traffic mixed into + the results. + + +Finding the maintainers with the help of a script +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For people that have the Linux sources at hand there is a second option to find +the proper place to report: the script 'scripts/get_maintainer.pl' which tries +to find all people to contact. It queries the MAINTAINERS file and needs to be +called with a path to the source code in question. For drivers compiled as +module if often can be found with a command like this:: + + $ modinfo ath10k_pci | grep filename | sed 's!/lib/modules/.*/kernel/!!; s!filename:!!; s!\.ko\(\|\.xz\)!!' + drivers/net/wireless/ath/ath10k/ath10k_pci.ko + +Pass parts of this to the script:: + + $ ./scripts/get_maintainer.pl -f drivers/net/wireless/ath/ath10k* + Some Human <shuman@example.com> (supporter:QUALCOMM ATHEROS ATH10K WIRELESS DRIVER) + Another S. Human <asomehuman@example.com> (maintainer:NETWORKING DRIVERS) + ath10k@lists.infradead.org (open list:QUALCOMM ATHEROS ATH10K WIRELESS DRIVER) + linux-wireless@vger.kernel.org (open list:NETWORKING DRIVERS (WIRELESS)) + netdev@vger.kernel.org (open list:NETWORKING DRIVERS) + linux-kernel@vger.kernel.org (open list) + +Don't sent your report to all of them. Send it to the maintainers, which the +script calls "supporter:"; additionally CC the most specific mailing list for +the code as well as the Linux Kernel Mailing List (LKML). In this case you thus +would need to send the report to 'Some Human <shuman@example.com>' with +'ath10k@lists.infradead.org' and 'linux-kernel@vger.kernel.org' in CC. + +Note: in case you cloned the Linux sources with git you might want to call +``get_maintainer.pl`` a second time with ``--git``. The script then will look +at the commit history to find which people recently worked on the code in +question, as they might be able to help. But use these results with care, as it +can easily send you in a wrong direction. That for example happens quickly in +areas rarely changed (like old or unmaintained drivers): sometimes such code is +modified during tree-wide cleanups by developers that do not care about the +particular driver at all. + + +Search for existing reports +--------------------------- + + *Search the archives of the bug tracker or mailing list in question + thoroughly for reports that might match your issue. Also check if you find + something with your favorite internet search engine or in the Linux Kernel + Mailing List (LKML) archives. If you find anything, join the discussion + instead of sending a new report.* + +Reporting an issue that someone else already brought forward is often a waste +of time for everyone involved, especially you as the reporter. So it's in your +own interest to thoroughly check if somebody reported the issue already. Thus +do not hurry with this step of the reporting process. Spending 30 to 60 minutes +or even more time can save you and others quite a lot of time and trouble. + +The best place to search is the bug tracker or the mailing list where your +report needs to be filed. You'll find quite a few of those lists on +`lore.kernel.org <https://lore.kernel.org/>`_, but some are hosted in +different places. That for example is the case for the ath10k WiFi driver used +as example in the previous step. But you'll often find the archives for these +lists easily on the net. Searching for 'archive ath10k@lists.infradead.org' for +example will quickly lead you to the `Info page for the ath10k mailing list +<https://lists.infradead.org/mailman/listinfo/ath10k>`_, which at the top links +to its `list archives <https://lists.infradead.org/pipermail/ath10k/>`_. + +Sadly this and quite a few other lists miss a way to search the archives. In +those cases use a regular internet search engine and add something like +'site:lists.infradead.org/pipermail/ath10k/' to your search terms, which limits +the results to the archives at that URL. + +Additionally, search the internet and the `Linux Kernel Mailing List (LKML) +archives <https://lore.kernel.org/lkml/>`_, as maybe the real culprit might be +in some other subsystem. Searching in `bugzilla.kernel.org +<https://bugzilla.kernel.org/>`_ might also be a good idea, but if you find +anything there keep in mind: most subsystems expect reports in different +places, hence those you find there might have not even reached the people +responsible for the subsystem in question. Nevertheless, the data there might +provide valuable insights. + +If you get flooded with results consider telling your search engine to limit +search timeframe to the past month or year. And wherever you search, make sure +to use good search terms; vary them a few times, too. While doing so try to +look at the issue from the perspective of someone else: that will help you to +come up with other words to use as search terms. Also make sure not to use too +many search terms at once. Remember to search with and without information like +the name of the kernel driver or the name of the affected hardware component. +But its exact brand name (say 'ASUS Red Devil Radeon RX 5700 XT Gaming OC') +often is not much helpful, as it is too specific. Instead try search terms like +the model line (Radeon 5700 or Radeon 5000) and the code name of the main chip +('Navi' or 'Navi10') with and without its manufacturer ('AMD'). + +In case you find an existing report about your issue, join the discussion, as +you might be able to provide valuable additional information. That can be +important even when a fix is prepared or in its final stages already, as +developers might look for people that can provide additional information or +test a proposed fix. Jump to the section 'Duties after the report went out' for +details on how to get properly involved. + + +Prepare for emergencies +----------------------- + + *Create a fresh backup and put system repair and restore tools at hand.* + +Reminder, you are dealing with computers, which sometimes do unexpected things, +especially if you fiddle with crucial parts like the kernel of its operating +system. That's what you are about to do in this process. Thus, make sure to +create a fresh backup; also ensure you have all tools at hand to repair or +reinstall the operating system as well as everything you need to restore the +backup. + + +Make sure your kernel doesn't get enhanced +------------------------------------------ + + *Ensure your system does not enhance its kernels by building additional + kernel modules on-the-fly, which solutions like DKMS might be doing locally + without your knowledge.* + +Your kernel must be 'vanilla' when reporting an issue, but stops being pure as +soon as it loads a kernel module not built from the sources used to compile the +kernel image itself. That's why you need to ensure your Linux kernel stays +vanilla by removing or disabling mechanisms like akmods and DKMS: those might +build additional kernel modules automatically, for example when your boot into +a newly installed Linux kernel the first time. Reboot after removing them and +any modules they installed. + +Note, you might not be aware that your system is using one of these solutions: +they often get set up silently when you install Nvidia's proprietary graphics +driver, VirtualBox, or other software that requires a some support from a +module not part of the Linux kernel. That why your might need to uninstall the +packages with such software to get rid of any 3rd party kernel module. + + +Ensure a healthy environment +---------------------------- + + *Make sure it's not the kernel's surroundings that are causing the issue + you face.* + +Problems that look a lot like a kernel issue are sometimes caused by build or +runtime environment. It's hard to rule out that problem completely, but you +should minimize it: + + * Use proven tools when building your kernel, as bugs in the compiler or the + binutils can cause the resulting kernel to misbehave. + + * Ensure your computer components run within their design specifications; + that's especially important for the main processor, the main memory, and the + motherboard. Therefore, stop undervolting or overclocking when facing a + potential kernel issue. + + * Try to make sure it's not faulty hardware that is causing your issue. Bad + main memory for example can result in a multitude of issues that will + manifest itself in problems looking like kernel issues. + + * If you're dealing with a filesystem issue, you might want to check the file + system in question with ``fsck``, as it might be damaged in a way that leads + to unexpected kernel behavior. + + * When dealing with a regression, make sure it's not something else that + changed in parallel to updating the kernel. The problem for example might be + caused by other software that was updated at the same time. It can also + happen that a hardware component coincidentally just broke when you rebooted + into a new kernel for the first time. Updating the systems BIOS or changing + something in the BIOS Setup can also lead to problems that on look a lot + like a kernel regression. + + +Document how to reproduce issue +------------------------------- + + *Write down coarsely how to reproduce the issue. If you deal with multiple + issues at once, create separate notes for each of them and make sure they + work independently on a freshly booted system. That's needed, as each issue + needs to get reported to the kernel developers separately, unless they are + strongly entangled.* + +If you deal with multiple issues at once, you'll have to report each of them +separately, as they might be handled by different developers. Describing +various issues in one report also makes it quite difficult for others to tear +it apart. Hence, only combine issues in one report if they are very strongly +entangled. + +Additionally, during the reporting process you will have to test if the issue +happens with other kernel versions. Therefore, it will make your work easier if +you know exactly how to reproduce an issue quickly on a freshly booted system. + +Note: it's often fruitless to report issues that only happened once, as they +might be caused by a bit flip due to cosmic radiation. That's why you should +try to rule that out by reproducing the issue before going further. Feel free +to ignore this advice if you are |