pyscan.
max_halfplane
(net, mpts, bpts, disc_f)¶Scans the set of halfplanes defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in mpts and bpts. Function takes \(O(ns\log n)\) where n is the net size and s is the size of the mpts and bts.
Parameters: |
|
---|---|
Return type: | Tuple of the found halfplane and the maximum disc_f value |
pyscan.
max_halfplane_fast
(r, mpts, bpts, disc_f)¶HERE BE DRAGONS!!! This method is still experimental and somewhat broken. This method uses a different method to scan the set of potential halfplanes with respect to the fraction of points contained in mpts and bpts. This code could potentially be much faster than the standard max_halfplane method.
Parameters: |
|
---|---|
Return type: | Tuple of the found halfplane and the maximum disc_f value |
pyscan.
max_halfspace
(net, mpts, bpts, disc_f)¶Scans the set of halfspaces defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in mpts and bpts. Function takes \(O(n^2s\log n)\) where n is the net size and s is the size of the mpts and bts.
Parameters: |
|
---|---|
Return type: | Tuple of the found halfspace and the maximum disc_f value |
pyscan.
max_disk
(net, mpts, bpts, disc_f)¶Scans the set of disks defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in mpts and bpts. Function takes \(O(n^2s\log n)\) where n is the net size and s is the size of the mpts and bts.
Parameters: |
|
---|---|
Return type: | Tuple of the found disk and the maximum disc_f value |
pyscan.
max_disk_scale
(net, mpts, bpts, min_res, disc_f)¶Scans the set of disks defined by the points in the net, between the min_res and 2 * min_res and maximizes the function disc_f with respect to the fraction of points contained in mpts and bpts. Internally we use the disk radius restriction to ignore far away points and prune out many potential disks. This can significantly decrease the amount of time the method takes to find the best region. In worst case this method will still take \(O(n^2s\log n)\) where n is the net size and s is the size of the mpts and bts, but usually runtime will be much less.
Parameters: |
|
---|---|
Return type: | Tuple of the found disk and the maximum disc_f value |
pyscan.
max_subgrid
(grid, disc_f)¶Finds the maximum subgrid exactly by enumerating all the subgrids and computing disc_f on each subgrid. Takes \(O(n^4)\) where the grid is \(n \times n\).
Parameters: |
|
---|---|
Return type: | The maximum subgrid found. |
pyscan.
max_subgrid_linear
(grid, x, y)¶Finds the maximum subgrid with respect to the linear function \(f(m, b) = xm + yb\) exactly by using the Kadane algorithm. Takes \(O(n^3)\) where the grid is \(n \times n\).
Parameters: |
|
---|---|
Return type: | The maximum subgrid found. |
pyscan.
max_subgrid_linear_theory
(grid, r, x, y)¶Finds an approximate maximum subgrid with respect to the linear function \(f(m, b) = xm + yb\) using the algorithm from [MP18a]. Takes \(O(n^2 + r^2 \log r)\) where the grid is \(n \times n\). :param grid: A Grid object :param r: Determines how frequently to approximate the subgrids. See [MP18a] for details. :param x: Double defines linear function. :param y: Double defines linear function. :rtype: The maximum subgrid found.
pyscan.
max_subgrid_convex
(grid, eps, disc_f)¶Approximately finds the maximum subgrid of the grid with error eps by successively computing the maximum subgrid with respect to various linear function using the Kadane algorithm. Takes \(O(\frac{1}{\sqrt{\varepsilon}} n^3)\) where the grid is \(n \times n\).
Parameters: |
|
---|---|
Return type: | The maximum subgrid found. |
pyscan.
max_subgrid_convex_theory
(grid, eps, disc_f)¶Approximately finds the maximum subgrid of the grid with error eps by successively computing the maximum subgrid with respect to various linear function using the algorithm from [MP18a]. Takes \(O(\frac{1}{\sqrt{\varepsilon}} n^2 \log n)\) where the grid is \(n \times n\). This method is depreciated and will be replaced shortly.
Parameters: |
|
---|---|
Return type: | The maximum subgrid found. |
pyscan.
max_rectangle
(mpts, bpts, eps, x, y)¶pyscan.
max_rectangle_heap
(mpts, bpts, x, y)¶Implements the algorithm from [APV06][DGM95] to scan the data set in \(s^2 \log s\) time. We use a Treap based rotation scheme to keep the tree balanced.
Parameters: |
|
---|---|
Return type: | The maximum rectangle found. |
pyscan.
ham_tree_sample
(pts, size)¶Takes the list of pts and computes a ham tree sample [MP18b]. This method can shrink the size of the sample significantly while preserving the error with respect to halfplanes. This serves as a useful preprocessing step for speeding up halfplane scanning. In theory sample sizes go from \(O(\frac{1}{\varepsilon^2}\) to \(O(\frac{1}{\varepsilon^{1.53..}}\log^{.766}\frac{1}{\varepsilon})\) where \(\varepsilon\) is the absolute fraction of misplaced points in a halfplane.
Parameters: | pts – List of WPoints |
---|---|
Return type: | List of WPoints |
pyscan.
max_halfplane_labeled
(net, lmpts, lbpts, disc_f)¶Scans the set of halfplanes defined by the points in the net and maximizes the function disc_f with respect to the labeled sets of points lmpts and lbpts. If two points with the same label are in the same region then the two points only contribute one of their weights to the region. This algorithm runs in time \(O(ns \log n)\) where n is the net size and s is the size of the lmpts and lbpts.
Parameters: |
|
---|---|
Return type: | Tuple of the found halfplane and the maximum disc_f value |
pyscan.
max_halfspace_labeled
(net, lmpts, lbpts, disc_f)¶Scans the set of halfspaces defined by the points in the net and maximizes the function disc_f with respect to the labeled sets of points lmpts and lbpts. If two points with the same label are in the same region then the two points only contribute one of their weights to the region. This algorithm runs in time \(O(n^2s \log n)\) where n is the net size and s is the size of the lmpts and lbpts.
Parameters: |
|
---|---|
Return type: | Tuple of the found halfspace and the maximum disc_f value |
pyscan.
max_disk_labeled
(net, lmpts, lbpts, disc_f)¶Scans the set of disks defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in lmpts and lbpts. Points with identical labels are not double counted if they are contained in the same region. In worst case this method will still take \(O(n^2s\log n)\) where n is the net size and s is the size of the lmpts and lbts.
Parameters: |
|
---|---|
Return type: | Tuple of the found disk and the maximum disc_f value |
pyscan.
max_disk_scale_labeled
(net, lmpts, lbpts, compress, min_res, disc_f)¶Scans the set of disks defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in lmpts and lbpts. Only considers disks with radii between min_res and 2 * min_res. This method internally maps the disk scanning problem to halfplane scanning. Using this mapping we can compress sets of points with the same labels. This compression step can significantly improve the runtime and doesn’t incurr any extra error. In worst case this method will still take \(O(n^2s\log n)\) where n is the net size and s is the size of the lmpts and lbts, but in practice this method should be much faster than this.
Parameters: |
|
---|---|
Return type: | Tuple of the found disk and the maximum disc_f value |
pyscan.
max_disk_scale_labeled_alt
(net, lmpts, lbpts, min_res, disc_f)¶Scans the set of disks defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in lmpts and lbpts. Only considers disks with radii between min_res and 2 * min_res. This method internally directly scans disks and lacks the compression step found in the previous algorihtm, so it is slighlty slower in practice then the previous method with compression turned off. In worst case this method will still take \(O(n^2s\log n)\) where n is the net size and s is the size of the lmpts and lbts, but in practice this method should be much faster than this.
Parameters: |
|
---|---|
Return type: | Tuple of the found disk and the maximum disc_f value py::def(“max_rect_labeled”, &pyscan::max_rect_labeled); py::def(“max_rect_labeled_scale”, pyscan::max_rect_labeled_scale); |
pyscan.
max_rect_labeled
(r, max_width, lmpts, lbpts, disc_f)¶Scans an approximate set of rectangles smaller than max_width in lmpts and lbpts by using a grid where each row and column of the grid can have at most \(1/r\) fraction of the total weight.
Parameters: |
|
---|---|
Return type: | Tuple of the found rectangle and the maximum disc_f value |
pyscan.
max_rect_labeled_scale
(r, max_width, alpha, lmpts, lbpts, disc_f)¶Creates a regular sparse grid of size \(1/alpha \times 1/alpha\) and then considers subgrids of max_width size containing a large enough fraction of points to be worth scanning. On these small subgrids this method constructs a new grid such that each row and column of this grid can have at most \(1/r\) fraction of the total weight and then scans ever subgrid of this new grid. This method is usually faster than the previous method, but introduces some spatial error \(\alpha\).
Parameters: |
|
---|---|
Return type: | Tuple of the found rectangle and the maximum disc_f value |
These methods use different scan statistics than the purely combinatorial versions.
pyscan.
gaussian_kernel
(bandwidth)¶Constructs a gaussian kernel to be used with the smooth scanning function of form \(1/\sqrt{2 \pi} \exp(-u^2/(2 b^2))\) where b is bandwidth.
Parameters: | bandwidth – double (kernel bandwidth) |
---|
pyscan.
Bernouli_K
(kernel)¶Constructs a discrepancy object that can be used to scan in the smooth scanning functions. The kernel parameter defines how distance is measured and how the bandwidth is determined.
Parameters: | kernel – The kernel used to determine distance. |
---|
pyscan.
max_annuli
(net, mpts, bpts, radii, disc_f)¶Scans the set of annuli with radii defined by radii and the points in the net. For each annuli we maximizes the function disc_f with respect to the mpts and bpts.
Parameters: |
|
---|---|
Return type: | Tuple of a disk of radius equal to radii[-1] centered at the annuli and the maximum disc_f value |
pyscan.
max_annuli_scale
(net, mpts, bpts, radii, disc_f)¶Scans the set of annuli with radii defined by radii and the points in the net. For each annuli we maximizes the function disc_f with respect to the mpts and bpts. This version ignores far away points, so in theory can be much faster than the previous version.
Parameters: |
|
---|---|
Return type: | Tuple of a disk of radius equal to radii[-1] centered at the annuli and the maximum disc_f value |
pyscan.
max_annuli_scale_multi
(net, mpts, bpts, res_scales, max_radii, disc_f)¶Scans the set of annuli with radii less than max_radii. The algorithm uses the res_scales to define levels in the annuli for approximating the kernel values. For each annuli we maximizes the function disc_f with respect to the mpts and bpts. This version ignores far away points and uses a more efficient method for scanning all the annuli. In theory the runtime of this method should be as good or even better than the previous algorithm, but in practice is probably slightly slower.
Parameters: |
|
---|---|
Return type: | Tuple of a disk of radius equal to radii[-1] centered at the annuli and the maximum disc_f value |