| Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. It became a bit larger than wished, but
all of them are device-specific small fixes, and it should be still
fairly safe to take at the last minute.
Included are a few quirks and fixes for Intel, AMD, HD-audio, and
USB-audio, as well as a race fix in aloop driver and corrections of
Cirrus firmware kunit test"
* tag 'sound-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Enable headset mic for Acer Nitro 5
ASoC: fsl_xcvr: fix missing lock in fsl_xcvr_mode_put()
ASoC: dt-bindings: ti,tlv320aic3x: Add compatible string ti,tlv320aic23
ASoC: amd: fix memory leak in acp3x pdm dma ops
ALSA: usb-audio: fix broken logic in snd_audigy2nx_led_update()
ALSA: aloop: Fix racy access at PCM trigger
ASoC: rt1320: fix intermittent no-sound issue
ASoC: SOF: Intel: use hdev->info.link_mask directly
firmware: cs_dsp: rate-limit log messages in KUnit builds
ASoC: amd: yc: Add quirk for HP 200 G2a 16
ASoC: cs42l43: Correct handling of 3-pole jack load detection
ASoC: Intel: sof_es8336: Add DMI quirk for Huawei BOD-WXX9
ASoC: sof_sdw: Add a quirk for Lenovo laptop using sidecar amps with cs42l43
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
"A stable fix for memory allocation profiling tag not being cleared
when aborting an allocation due to memcg charge failure (Hao Ge)"
* tag 'slab-for-6.19-rc8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: Add alloc_tagging_slab_free_hook for memcg_alloc_abort_single
|
|
Larysa Zaremba says:
====================
Fix some corner cases in xskxceiver
While working on XDP and AF_XDP support for ixgbevf driver,
I came across two distinct problems that caused tests to fail
when they shouldn't have.
====================
Link: https://patch.msgid.link/20260203155103.2305816-1-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The issue occurs in TOO_MANY_FRAGS test case when xdp_zc_max_segs is set to
an odd number.
TOO_MANY_FRAGS test case contains an invalid packet consisting of
(xdp_zc_max_segs) frags. Every frag, even the last one has XDP_PKT_CONTD
flag set. This packet is expected to be dropped. After that, there is a
valid linear packet, which is expected to be received back.
Once (xdp_zc_max_segs) is an odd number, the last packet cannot be
received, if packet forwarding between Rx and Tx interfaces relies on
the ethernet header, e.g. checks for ETH_P_LOOPBACK. Packet is malformed,
if all traffic is looped.
Turns out, sending function processes multiple invalid frags as if they
were in 2-frag packets. So once the invalid mbuf packet contains an odd
number of those, the valid packet after gets paired with the previous
invalid descriptor, and hence does not get an ethernet header generated, so
it is either dropped or malformed.
Make invalid packets in verbatim mode always have only a single frag. For
such packets, number of frags is otherwise meaningless, as descriptor flags
are pre-configured in verbatim mode and packet data is not generated for
invalid descriptors.
Fixes: 697604492b64 ("selftests/xsk: add invalid descriptor test for multi-buffer")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20260203155103.2305816-3-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Referenced commit reduced the scope of the variable pkt, so now it has to
be reinitialized via pkt_stream_get_next_rx_pkt(), which also increments
some counters. When the packet is interrupted by the batch ending, pkt
stream therefore proceeds to the next packet, while xsk ring still contains
the previous one, this results in a pkt_nb mismatch.
Decrement the affected counters when packet is interrupted.
Fixes: 8913e653e9b8 ("selftests/xsk: Iterate over all the sockets in the receive pkts function")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20260203155103.2305816-2-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The dma-coherent property is used to indicate a device is capable of
coherent DMA operations. On i.MX952, one of EDMA devices support such
feature, in order to support the EDMA device, the memory needs to be
allocated from the DMA device.
Make this driver to support both non dma-coherent and dma-coherent dma
engine.
Remove dma coerce_mask_and coherent() because DMA provider already set it
according to its capability.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260206014805.3897764-5-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add a compatible string, clock mapping table and enable the option
'start_before_dma' to support ASRC on the i.MX952 platform.
The clock mapping table is to map the clock sources on i.MX952 to the
clock ids in the driver, the clock ids are for all the clock sources on
all supported platforms.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260206014805.3897764-4-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
There is a limitation on i.MX952 that dma request is not cleared at the
end of conversion with dma slave mode. Which causes sample is dropped
from the input fifo on the second time if dma is triggered before the
client device and EDMA may copy wrong data from output fifo as the output
fifo is not ready in the beginning.
The solution is to trigger asrc before dma on i.MX952, and add delay to
wait output data is generated then start the EDMA for output, otherwise
the m2m function has noise issues.
So add an option to start ASRC first for M2M before ASRC is enabled on
i.MX952.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260206014805.3897764-3-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add new compatible string 'fsl,imx952-asrc' for i.MX952 platform,
below are the differences that make this ASRC not fallback compatible
with other platforms.
1) There is a power domain on i.MX952 for the wakeupmix system where
ASRC is in. But it is enabled by default, ASRC device don't need
to enable it, so it is optional for i.MX952.
2) The clock sources of ASRC are different on i.MX952.
3) There is a limitation on i.MX952 that DMA request is not cleared at the
end of conversion with dma slave mode. Which causes sample is dropped from
the input fifo on the second time if DMA is triggered before the client
device and DMA may copy wrong data from output fifo as the output fifo is
not ready in the beginning. So there is specially handling in the driver.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Link: https://patch.msgid.link/20260206014805.3897764-2-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Pull ARM fix from Russell King:
"Just one fix for memset64() on big endian 32-bit ARM systems"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9468/1: fix memset64() on big-endian
|
|
Add comprehensive documentation for the ``blockers`` field format
in AUDIT_LANDLOCK_ACCESS records, including all possible prefixes
(fs., net., scope.) and their meanings.
Also fix a typo and update the documentation date to reflect these
changes.
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Link: https://lore.kernel.org/r/20260128031814.2945394-4-samasth.norway.ananda@oracle.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add errata section with code examples for querying errata and a warning
that most applications should not check errata. Use kernel-doc directives
to include errata descriptions from the header files instead of manual
links.
Also enhance existing DOC sections in security/landlock/errata/abi-*.h
files with Impact sections, and update the code comment in syscalls.c
to remind developers to update errata documentation when applicable.
This addresses the gap where the kernel implements errata tracking
but provides no user-facing documentation on how to use it, while
improving the existing technical documentation in-place rather than
duplicating it.
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260128031814.2945394-3-samasth.norway.ananda@oracle.com
[mic: Cosmetic fix]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add backwards compatibility handling for the restrict flags introduced
in ABI version 7. This is shown as a separate code block (similar to
the ruleset_attr handling in the switch statement) because restrict flags
are passed to landlock_restrict_self() rather than being part of the
ruleset attributes.
Also fix misleading description of the /usr rule which incorrectly
stated it "only allow[s] reading" when the code actually allows both
reading and executing (LANDLOCK_ACCESS_FS_EXECUTE is included in
allowed_access).
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260128031814.2945394-2-samasth.norway.ananda@oracle.com
[mic: Rebased and fixed conflict]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Move the socket type check earlier, so that we will later be able to add
elseifs for other types. Ordering of checks (socket is of a type we
enforce restrictions on) / (current creds have Landlock restrictions)
should not change anything.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251212163704.142301-3-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
- Move ABI requirement next to each access right to prepare adding more
access rights;
- Mention the possibility to remove the random component of a socket's
ephemeral port choice within the netns-wide ephemeral port range,
since it allows choosing the "random" ephemeral port.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251212163704.142301-2-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add documentation for LANDLOCK_RESTRICT_SELF_TSYNC. It does not need to go
into the main example, but it has a section in the ABI compatibility notes.
In the HTML rendering, the main reference is the system call documentation,
which is included from the landlock.h header file.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-4-gnoack@google.com
[mic: Update date]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Exercise various scenarios where Landlock domains are enforced across
all of a processes' threads.
Test coverage for security/landlock is 91.6% of 2130 lines according to
LLVM 21.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-3-gnoack@google.com
[mic: Fix subject, use EXPECT_EQ(close()), make helpers static, add test
coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Introduce the LANDLOCK_RESTRICT_SELF_TSYNC flag. With this flag, a
given Landlock ruleset is applied to all threads of the calling
process, instead of only the current one.
Without this flag, multithreaded userspace programs currently resort
to using the nptl(7)/libpsx hack for multithreaded policy enforcement,
which is also used by libcap and for setuid(2). Using this
userspace-based scheme, the threads of a process enforce the same
Landlock policy, but the resulting Landlock domains are still
separate. The domains being separate causes multiple problems:
* When using Landlock's "scoped" access rights, the domain identity is
used to determine whether an operation is permitted. As a result,
when using LANLDOCK_SCOPE_SIGNAL, signaling between sibling threads
stops working. This is a problem for programming languages and
frameworks which are inherently multithreaded (e.g. Go).
* In audit logging, the domains of separate threads in a process will
get logged with different domain IDs, even when they are based on
the same ruleset FD, which might confuse users.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-2-gnoack@google.com
[mic: Fix restrict_self_flags test, clean up Makefile, allign comments,
reduce local variable scope, add missing includes]
Closes: https://github.com/landlock-lsm/linux/issues/2
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
This reverts commit 62eb557580eb2177cf16c3fd2b6efadff297b29a.
The revocable implementation uses two separate abstractions, struct
revocable_provider and struct revocable, in order to store the SRCU read
lock index which must be passed unaltered to srcu_read_unlock() in the
same context when a resource is no longer needed.
With the merged revocable API, multiple threads could however share the
same struct revocable and therefore potentially overwrite the SRCU index
of another thread which can cause the SRCU synchronisation in
revocable_provider_revoke() to never complete. [1]
An example revocable conversion of the gpiolib code also turned out to
be fundamentally flawed and could lead to use-after-free. [2]
An attempt to address both issues was quickly put together and merged,
but revocable is still fundamentally broken. [3]
Specifically, the latest design relies on RCU for storing a pointer to
the revocable provider, but since the resource can be shared by value
(e.g. as in the now reverted selftests) this does not work at all and
can also lead to use-after-free:
static void revocable_provider_release(struct kref *kref)
{
struct revocable_provider *rp = container_of(kref,
struct revocable_provider, kref);
cleanup_srcu_struct(&rp->srcu);
kfree_rcu(rp, rcu);
}
void revocable_provider_revoke(struct revocable_provider __rcu **rp_ptr)
{
struct revocable_provider *rp;
rp = rcu_replace_pointer(*rp_ptr, NULL, 1);
...
kref_put(&rp->kref, revocable_provider_release);
}
int revocable_init(struct revocable_provider __rcu *_rp,
struct revocable *rev)
{
struct revocable_provider *rp;
...
scoped_guard(rcu) {
rp = rcu_dereference(_rp);
if (!rp)
return -ENODEV;
if (!kref_get_unless_zero(&rp->kref))
return -ENODEV;
}
...
}
producer:
priv->rp = revocable_provider_alloc(&priv->res);
// pass priv->rp by value to consumer
revocable_provider_revoke(&priv->rp);
consumer:
struct revocable_provider __rcu *rp = filp->private_data;
struct revocable *rev;
revocable_init(rp, &rev);
as _rp would still be non-NULL in revocable_init() regardless of whether
the producer has revoked the resource and set its pointer to NULL.
Essentially revocable still relies on having a pointer to reference
counted driver data which holds the revocable provider, which makes all
the RCU protection unnecessary along with most of the current revocable
design and implementation.
As the above shows, and as has been pointed out repeatedly elsewhere,
these kind of issues are not something that should be addressed
incrementally. [4]
Revert the revocable implementation until a redesign has been proposed
and evaluated properly.
Link: https://lore.kernel.org/all/20260124170535.11756-4-johan@kernel.org/ [1]
Link: https://lore.kernel.org/all/aXT45B6vLf9R3Pbf@hovoldconsulting.com/ [2]
Link: https://lore.kernel.org/all/20260129143733.45618-1-tzungbi@kernel.org/ [3]
Link: https://lore.kernel.org/all/aXobzoeooJqxMkEj@hovoldconsulting.com/ [4]
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260204142849.22055-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit cd7693419bb5abd91ad2f407dab69c480e417a61.
The new revocable functionality is fundamentally broken and at a minimum
needs to be redesigned.
Drop the revocable Kunit tests to allow the implementation to be reverted.
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260204142849.22055-3-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit 9d4502fef00fa7a798d3c0806d4da4466a7ffc6f.
The new revocable functionality is fundamentally broken and at a minimum
needs to be redesigned.
Drop the revocable selftests to allow the implementation to be reverted.
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260204142849.22055-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently io_uring supports restricting operations on a per-ring basis.
To use those, the ring must be setup in a disabled state by setting
IORING_SETUP_R_DISABLED. Then restrictions can be set for the ring, and
the ring can then be enabled.
This commit adds support for IORING_REGISTER_RESTRICTIONS with ring_fd
== -1, like the other "blind" register opcodes which work on the task
rather than a specific ring. This allows registration of the same kind
of restrictions as can been done on a specific ring, but with the task
itself. Once done, any ring created will inherit these restrictions.
If a restriction filter is registered with a task, then it's inherited
on fork for its children. Children may only further restrict operations,
not extend them.
Inheriting restrictions include both the classic
IORING_REGISTER_RESTRICTIONS based restrictions, as well as the BPF
filters that have been registered with the task via
IORING_REGISTER_BPF_FILTER.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Called when copy_process() is called to copy state to a new child.
Right now this is just a stub, but will be used shortly to properly
handle fork'ing of task based io_uring restrictions.
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This was converted to a _scoped() loop but this of_node_put() was
accidentally left behind which is a double free.
Fixes: aa8cb72c2018 ("mtd: spi-nor: hisi-sfc: Simplify with scoped for each OF child loop")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
The commit 1b35a57b1c178 ("sparc32: Kill off software 32-bit multiply/divide
routines") removed the last usage of strtab in funtion module_frob_arch_sections
Therefore, it can be removed now.
Reported-by: kernel test robot <lkp@intel.com>
Cc: sparclinux@vger.kernel.org
Cc: David S. Miller <davem@davemloft.net>
Cc: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
arch/sparc/mm/init_64.c: In function 'arch_hugetlb_valid_size':
arch/sparc/mm/init_64.c:361:24: warning: variable 'hv_pgsz_idx'
set but not used [-Wunused-but-set-variable]
361 | unsigned short hv_pgsz_idx;
| ^~~~~~~~~~~
Reported-by: kernel test robot <lkp@intel.com>
Cc: sparclinux@vger.kernel.org
CC: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Alex Shi <alexs@kernel.org>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
Similar in nature to commit ab107276607a ("powerpc: Fix struct termio related ioctl macros").
glibc-2.42 drops the legacy termio struct, but the ioctls.h header still
defines some TC* constants in terms of termio (via sizeof). Hardcode the
values instead.
This fixes building Python for example, which falls over like:
./Modules/termios.c:1119:16: error: invalid application of 'sizeof' to incomplete type 'struct termio'
Link: https://bugs.gentoo.org/961769
Link: https://bugs.gentoo.org/962600
Signed-off-by: Sam James <sam@gentoo.org>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
Replace snprintf("%s", ...) with the faster and more direct strscpy().
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
Add support for the clone3 system call to the SPARC architectures.
The implementation follows the pattern of the original clone syscall.
However, instead of explicitly calling kernel_clone, the clone3
handler calls the generic sys_clone3 handler in kernel/fork.
In case no stack is provided, the parents stack is reused.
The return value convention for clone3 follows the regular kernel return
value convention (in contrast to the original clone/fork on SPARC).
Closes: https://github.com/sparclinux/issues/issues/10
Signed-off-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Tested-by: Andreas Larsson <andreas@gaisler.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20260119144753.27945-3-ludwig.rydberg@gaisler.com
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
Flush all uncommitted user windows before calling the generic syscall
handlers for clone, fork, and vfork.
Prior to entering the arch common handlers sparc_{clone|fork|vfork}, the
arch-specific syscall wrappers for these syscalls will attempt to flush
all windows (including user windows).
In the window overflow trap handlers on both SPARC{32|64},
if the window can't be stored (i.e due to MMU related faults) the routine
backups the user window and increments a thread counter (wsaved).
By adding a synchronization point after the flush attempt, when fault
handling is enabled, any uncommitted user windows will be flushed.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=31394
Closes: https://lore.kernel.org/sparclinux/fe5cc47167430007560501aabb28ba154985b661.camel@physik.fu-berlin.de/
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20260119144753.27945-2-ludwig.rydberg@gaisler.com
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
|
|
No need to assign "uctl" to itself. Delete it.
Fixes: 55f98ece9939 ("ALSA: oss: Relax __free() variable declarations")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/aYXvm2YoV2yRimhk@stanley.mountain
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Currently, KVM assumes the minimum of implemented HGEIE bits and
"BIT(gc->guest_index_bits) - 1" as the number of guest files available
across all CPUs. This will not work when CPUs have different number
of guest files because KVM may incorrectly allocate a guest file on a
CPU with fewer guest files.
To address above, during initialization, calculate the number of
available guest interrupt files according to MMIO resources and
constrain the number of guest interrupt files that can be allocated
by KVM.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://lore.kernel.org/r/20260104133457.57742-1-luxu.kernel@bytedance.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Use block mapping if backed by a THP, as implemented in architectures
like ARM and x86_64.
Signed-off-by: Jessica Liu <liu.xuemei1@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251127165137780QbUOVPKPAfWSGAFl5qtRy@zte.com.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The KVM RISC-V allows Zalasr extensions for Guest/VM so add this
extension to get-reg-list test.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251020042904.32096-1-luxu.kernel@bytedance.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Extend the KVM ISA extension ONE_REG interface to allow KVM user space
to detect and enable Zalasr extensions for Guest/VM.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251020042457.30915-5-luxu.kernel@bytedance.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Current vm modes cannot represent riscv guest modes precisely, here add
all 9 combinations of P(56,40,41) x V(57,48,39). Also the default vm
mode is detected on runtime instead of hardcoded one, which might not be
supported on specific machine.
Signed-off-by: Wu Fei <wu.fei9@sanechips.com.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251105151442.28767-1-wu.fei9@sanechips.com.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The KVM RISC-V allows Zilsd and Zclsd extensions for Guest/VM so add
this extension to get-reg-list test.
Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250826162939.1494021-6-pincheng.plct@isrc.iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Extend the KVM ISA extension ONE_REG interface to allow KVM user space
to detect and enable Zilsd and Zclsd extensions for Guest/VM.
Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250826162939.1494021-5-pincheng.plct@isrc.iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
kvm_riscv_vcpu_aia_imsic_update() assumes that the vCPU IMSIC state has
already been initialized and unconditionally accesses imsic->vsfile_lock.
However, in fuzzed ioctl sequences, the AIA device may be initialized at
the VM level while the per-vCPU IMSIC state is still NULL.
This leads to invalid access when entering the vCPU run loop before
IMSIC initialization has completed.
The crash manifests as:
Unable to handle kernel paging request at virtual address
dfffffff00000006
...
kvm_riscv_vcpu_aia_imsic_update arch/riscv/kvm/aia_imsic.c:801
kvm_riscv_vcpu_aia_update arch/riscv/kvm/aia_device.c:493
kvm_arch_vcpu_ioctl_run arch/riscv/kvm/vcpu.c:927
...
Add a guard to skip the IMSIC update path when imsic_state is NULL. This
allows the vCPU run loop to continue safely.
This issue was discovered during fuzzing of RISC-V KVM code.
Fixes: db8b7e97d6137a ("RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260127084313.3496485-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Add a null pointer check for imsic_state before dereferencing it in
kvm_riscv_aia_imsic_rw_attr(). While the function checks that the
vcpu exists, it doesn't verify that the vcpu's imsic_state has been
initialized, leading to a null pointer dereference when accessed.
The crash manifests as:
Unable to handle kernel paging request at virtual address
dfffffff00000006
...
kvm_riscv_aia_imsic_rw_attr+0x2d8/0x854 arch/riscv/kvm/aia_imsic.c:958
aia_set_attr+0x2ee/0x1726 arch/riscv/kvm/aia_device.c:354
kvm_device_ioctl_attr virt/kvm/kvm_main.c:4744 [inline]
kvm_device_ioctl+0x296/0x374 virt/kvm/kvm_main.c:4761
vfs_ioctl fs/ioctl.c:51 [inline]
...
The fix adds a check to return -ENODEV if imsic_state is NULL and moves
isel assignment after imsic_state NULL check.
Fixes: 5463091a51cfaa ("RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260127072219.3366607-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Add a null pointer check for imsic_state before dereferencing it in
kvm_riscv_aia_imsic_has_attr(). While the function checks that the
vcpu exists, it doesn't verify that the vcpu's imsic_state has been
initialized, leading to a null pointer dereference when accessed.
This issue was discovered during fuzzing of RISC-V KVM code. The
crash occurs when userspace calls KVM_HAS_DEVICE_ATTR ioctl on an
AIA IMSIC device before the IMSIC state has been fully initialized
for a vcpu.
The crash manifests as:
Unable to handle kernel paging request at virtual address
dfffffff00000001
...
epc : kvm_riscv_aia_imsic_has_attr+0x464/0x50e
arch/riscv/kvm/aia_imsic.c:998
...
kvm_riscv_aia_imsic_has_attr+0x464/0x50e arch/riscv/kvm/aia_imsic.c:998
aia_has_attr+0x128/0x2bc arch/riscv/kvm/aia_device.c:471
kvm_device_ioctl_attr virt/kvm/kvm_main.c:4722 [inline]
kvm_device_ioctl+0x296/0x374 virt/kvm/kvm_main.c:4739
...
The fix adds a check to return -ENODEV if imsic_state is NULL, which
is consistent with other error handling in the function and prevents
the null pointer dereference.
Fixes: 5463091a51cf ("RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260125143344.2515451-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
If execution reaches "ret = 0" assignment in
kvm_riscv_vcpu_pmu_event_info() then it means
kvm_vcpu_write_guest() returned 0 hence ret is
already zero and does not need to be assigned 0.
Fixes: e309fd113b9f ("RISC-V: KVM: Implement get event info function")
Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251229072530.3075496-1-maqianga@uniontech.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Open intervals do not have an end element, in particular an open
interval at the end of the set is hard to validate because of it is
lacking the end element, and interval validation relies on such end
element to perform the checks.
This patch adds a new flag field to struct nft_set_elem, this is not an
issue because this is a temporary object that is allocated in the stack
from the insert/deactivate path. This flag field is used to specify that
this is the last element in this add/delete command.
The last flag is used, in combination with the start element cookie, to
check if there is a partial overlap, eg.
Already exists: 255.255.255.0-255.255.255.254
Add interval: 255.255.255.0-255.255.255.255
~~~~~~~~~~~~~
start element overlap
Basically, the idea is to check for an existing end element in the set
if there is an overlap with an existing start element.
However, the last open interval can come in any position in the add
command, the corner case can get a bit more complicated:
Already exists: 255.255.255.0-255.255.255.254
Add intervals: 255.255.255.0-255.255.255.255,255.255.255.0-255.255.255.254
~~~~~~~~~~~~~
start element overlap
To catch this overlap, annotate that the new start element is a possible
overlap, then report the overlap if the next element is another start
element that confirms that previous element in an open interval at the
end of the set.
For deletions, do not update the start cookie when deleting an open
interval, otherwise this can trigger spurious EEXIST when adding new
elements.
Unfortunately, there is no NFT_SET_ELEM_INTERVAL_OPEN flag which would
make easier to detect open interval overlaps.
Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
The existing partial overlap detection does not check if the elements
belong to the interval, eg.
add element inet x y { 1.1.1.1-2.2.2.2, 4.4.4.4-5.5.5.5 }
add element inet x y { 1.1.1.1-5.5.5.5 } => this should fail: ENOENT
Similar situation occurs with deletions:
add element inet x y { 1.1.1.1-2.2.2.2, 4.4.4.4-5.5.5.5}
delete element inet x y { 1.1.1.1-5.5.5.5 } => this should fail: ENOENT
This currently works via mitigation by nft in userspace, which is
performing the overlap detection before sending the elements to the
kernel. This requires a previous netlink dump of the set content which
slows down incremental updates on interval sets, because a netlink set
content dump is needed.
This patch extends the existing overlap detection to track the most
recent start element that already exists. The pointer to the existing
start element is stored as a cookie (no pointer dereference is ever
possible). If the end element is added and it already exists, then
check that the existing end element is adjacent to the already existing
start element. Similar logic applies to element deactivation.
This patch also annotates the timestamp to identify if start cookie
comes from an older batch, in such case reset it. Otherwise, a failing
create element command leaves the start cookie in place, resulting in
bogus error reporting.
There is still a few more corner cases of overlap detection related to
the open interval that are addressed in follow up patches.
This is address an early design mistake where an interval is expressed
as two elements, using the NFT_SET_ELEM_INTERVAL_END flag, instead of
the more recent NFTA_SET_ELEM_KEY_END attribute that pipapo already
uses.
Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Userspace provides an optimized representation in case intervals are
adjacent, where the end element is omitted.
The existing partial overlap detection logic skips anonymous set checks
on start elements for this reason.
However, it is possible to add intervals that overlap to this anonymous
where two start elements with the same, eg. A-B, A-C where C < B.
start end
A B
start end
A C
Restore the check on overlapping start elements to report an overlap.
Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Userspace adds a non-matching null element to the kernel for historical
reasons. This null element is added when the set is populated with
elements. Inclusion of this element is conditional, therefore,
userspace needs to dump the set content to check for its presence.
If the NLM_F_CREATE flag is turned on, this becomes an issue because
kernel bogusly reports EEXIST.
Add special case to ignore NLM_F_CREATE in this case, therefore,
re-adding the nul-element never fails.
Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
nft_counter_reset() calls u64_stats_add() with a negative value to reset
the counter. This will work on 64bit archs, hence the negative value
added will wrap as a 64bit value which then can wrap the stat counter as
well.
On 32bit archs, the added negative value will wrap as a 32bit value and
_not_ wrapping the stat counter properly. In most cases, this would just
lead to a very large 32bit value being added to the stat counter.
Fix by introducing u64_stats_sub().
Fixes: 4a1d3acd6ea8 ("netfilter: nft_counter: Use u64_stats_t for statistic.")
Signed-off-by: Anders Grahn <anders.grahn@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
tests/shell/testcases/packetpath/set_match_nomatch_hash_fast
fails on big endian with:
Error: Could not process rule: No such file or directory
reset element ip test s { 244.147.90.126 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fatal: Cannot fetch element "244.147.90.126"
... because the wrong bucket is searched, jhash() and jhash1_word are
not interchangeable on big endian.
Fixes: 3b02b0adc242 ("netfilter: nft_set_hash: fix lookups with fixed size hash on big endian")
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
The script now requires IPV6 tunnel support, enable this.
This should have caught by CI, but as the config option is missing,
the tunnel interface isn't added. This results in an error cascade
that ends with "route change default" failure.
That in turn means the "ipv6 tunnel" test re-uses the previous
test setup so the "ip6ip6" test passes and script returns 0.
Make sure to catch such bugs, set ret=1 if device cannot be added
and delete the old default route before installing the new one.
After this change, IPV6_TUNNEL=n kernel builds fail with the expected
FAIL: flow offload for ns1/ns2 with IP6IP6 tunnel
... while builds with IPV6_TUNNEL=m pass as before.
Fixes: 5e5180352193 ("selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest")
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|