Scanning

pyscan.max_halfplane(net, mpts, bpts, disc_f)

Scans the set of halfplanes defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in mpts and bpts. Function takes \(O(ns\log n)\) where n is the net size and s is the size of the mpts and bts.

Parameters:
  • net – List of Points
  • mpts – List of WPoints
  • bpts – List of WPoints
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found halfplane and the maximum disc_f value

pyscan.max_halfplane_fast(r, mpts, bpts, disc_f)

HERE BE DRAGONS!!! This method is still experimental and somewhat broken. This method uses a different method to scan the set of potential halfplanes with respect to the fraction of points contained in mpts and bpts. This code could potentially be much faster than the standard max_halfplane method.

Parameters:
  • r – Number of regions asymptopically to consider.
  • mpts – List of WPoints
  • bpts – List of WPoints
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found halfplane and the maximum disc_f value

pyscan.max_halfspace(net, mpts, bpts, disc_f)

Scans the set of halfspaces defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in mpts and bpts. Function takes \(O(n^2s\log n)\) where n is the net size and s is the size of the mpts and bts.

Parameters:
  • net – List of Point3s
  • mpts – List of WPoint3s
  • bpts – List of WPoint3s
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found halfspace and the maximum disc_f value

pyscan.max_disk(net, mpts, bpts, disc_f)

Scans the set of disks defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in mpts and bpts. Function takes \(O(n^2s\log n)\) where n is the net size and s is the size of the mpts and bts.

Parameters:
  • net – List of Points
  • mpts – List of WPoints
  • bpts – List of WPoints
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found disk and the maximum disc_f value

pyscan.max_disk_scale(net, mpts, bpts, min_res, disc_f)

Scans the set of disks defined by the points in the net, between the min_res and 2 * min_res and maximizes the function disc_f with respect to the fraction of points contained in mpts and bpts. Internally we use the disk radius restriction to ignore far away points and prune out many potential disks. This can significantly decrease the amount of time the method takes to find the best region. In worst case this method will still take \(O(n^2s\log n)\) where n is the net size and s is the size of the mpts and bts, but usually runtime will be much less.

Parameters:
  • net – List of Points
  • mpts – List of WPoints
  • bpts – List of WPoints
  • min_res – A float that corresponds to minimum radius to consider.
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found disk and the maximum disc_f value

pyscan.max_subgrid(grid, disc_f)

Finds the maximum subgrid exactly by enumerating all the subgrids and computing disc_f on each subgrid. Takes \(O(n^4)\) where the grid is \(n \times n\).

Parameters:
  • grid – A Grid object
  • disc_f – Discrepancy function to maximize over.
Return type:

The maximum subgrid found.

pyscan.max_subgrid_linear(grid, x, y)

Finds the maximum subgrid with respect to the linear function \(f(m, b) = xm + yb\) exactly by using the Kadane algorithm. Takes \(O(n^3)\) where the grid is \(n \times n\).

Parameters:
  • grid – A Grid object
  • x – Double defines linear function.
  • y – Double defines linear function.
Return type:

The maximum subgrid found.

pyscan.max_subgrid_linear_theory(grid, r, x, y)

Finds an approximate maximum subgrid with respect to the linear function \(f(m, b) = xm + yb\) using the algorithm from [MP18a]. Takes \(O(n^2 + r^2 \log r)\) where the grid is \(n \times n\). :param grid: A Grid object :param r: Determines how frequently to approximate the subgrids. See [MP18a] for details. :param x: Double defines linear function. :param y: Double defines linear function. :rtype: The maximum subgrid found.

pyscan.max_subgrid_convex(grid, eps, disc_f)

Approximately finds the maximum subgrid of the grid with error eps by successively computing the maximum subgrid with respect to various linear function using the Kadane algorithm. Takes \(O(\frac{1}{\sqrt{\varepsilon}} n^3)\) where the grid is \(n \times n\).

Parameters:
  • grid – A Grid object
  • eps – The absolute error to incur.
  • disc_f – Discrepancy function to maximize over.
Return type:

The maximum subgrid found.

pyscan.max_subgrid_convex_theory(grid, eps, disc_f)

Approximately finds the maximum subgrid of the grid with error eps by successively computing the maximum subgrid with respect to various linear function using the algorithm from [MP18a]. Takes \(O(\frac{1}{\sqrt{\varepsilon}} n^2 \log n)\) where the grid is \(n \times n\). This method is depreciated and will be replaced shortly.

Parameters:
  • grid – A Grid object
  • eps – The absolute error to incur.
  • disc_f – Discrepancy function to maximize over.
Return type:

The maximum subgrid found.

pyscan.max_rectangle(mpts, bpts, eps, x, y)
pyscan.max_rectangle_heap(mpts, bpts, x, y)

Implements the algorithm from [APV06][DGM95] to scan the data set in \(s^2 \log s\) time. We use a Treap based rotation scheme to keep the tree balanced.

Parameters:
  • mpts – List of WPoints.
  • bpts – List of WPoints.
  • x – Linear parameter.
  • y – Linear parameter.
Return type:

The maximum rectangle found.

pyscan.ham_tree_sample(pts, size)

Takes the list of pts and computes a ham tree sample [MP18b]. This method can shrink the size of the sample significantly while preserving the error with respect to halfplanes. This serves as a useful preprocessing step for speeding up halfplane scanning. In theory sample sizes go from \(O(\frac{1}{\varepsilon^2}\) to \(O(\frac{1}{\varepsilon^{1.53..}}\log^{.766}\frac{1}{\varepsilon})\) where \(\varepsilon\) is the absolute fraction of misplaced points in a halfplane.

Parameters:pts – List of WPoints
Return type:List of WPoints

Labeled Scanning

pyscan.max_halfplane_labeled(net, lmpts, lbpts, disc_f)

Scans the set of halfplanes defined by the points in the net and maximizes the function disc_f with respect to the labeled sets of points lmpts and lbpts. If two points with the same label are in the same region then the two points only contribute one of their weights to the region. This algorithm runs in time \(O(ns \log n)\) where n is the net size and s is the size of the lmpts and lbpts.

Parameters:
  • net – List of Points
  • mpts – List of LPoints
  • bpts – List of LPoints
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found halfplane and the maximum disc_f value

pyscan.max_halfspace_labeled(net, lmpts, lbpts, disc_f)

Scans the set of halfspaces defined by the points in the net and maximizes the function disc_f with respect to the labeled sets of points lmpts and lbpts. If two points with the same label are in the same region then the two points only contribute one of their weights to the region. This algorithm runs in time \(O(n^2s \log n)\) where n is the net size and s is the size of the lmpts and lbpts.

Parameters:
  • net – List of Point3s
  • mpts – List of LPoint3s
  • bpts – List of LPoint3s
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found halfspace and the maximum disc_f value

pyscan.max_disk_labeled(net, lmpts, lbpts, disc_f)

Scans the set of disks defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in lmpts and lbpts. Points with identical labels are not double counted if they are contained in the same region. In worst case this method will still take \(O(n^2s\log n)\) where n is the net size and s is the size of the lmpts and lbts.

Parameters:
  • net – List of Points
  • lmpts – List of LPoints
  • lbpts – List of LPoints
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found disk and the maximum disc_f value

pyscan.max_disk_scale_labeled(net, lmpts, lbpts, compress, min_res, disc_f)

Scans the set of disks defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in lmpts and lbpts. Only considers disks with radii between min_res and 2 * min_res. This method internally maps the disk scanning problem to halfplane scanning. Using this mapping we can compress sets of points with the same labels. This compression step can significantly improve the runtime and doesn’t incurr any extra error. In worst case this method will still take \(O(n^2s\log n)\) where n is the net size and s is the size of the lmpts and lbts, but in practice this method should be much faster than this.

Parameters:
  • net – List of Points
  • lmpts – List of LPoints
  • lbpts – List of LPoints
  • compress – Turns the compression step on or off.
  • min_res – Defines the radius range to consider.
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found disk and the maximum disc_f value

pyscan.max_disk_scale_labeled_alt(net, lmpts, lbpts, min_res, disc_f)

Scans the set of disks defined by the points in the net and maximizes the function disc_f with respect to the fraction of points contained in lmpts and lbpts. Only considers disks with radii between min_res and 2 * min_res. This method internally directly scans disks and lacks the compression step found in the previous algorihtm, so it is slighlty slower in practice then the previous method with compression turned off. In worst case this method will still take \(O(n^2s\log n)\) where n is the net size and s is the size of the lmpts and lbts, but in practice this method should be much faster than this.

Parameters:
  • net – List of Points
  • lmpts – List of LPoints
  • lbpts – List of LPoints
  • min_res – Defines the radius range to consider.
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found disk and the maximum disc_f value

py::def(“max_rect_labeled”, &pyscan::max_rect_labeled);

py::def(“max_rect_labeled_scale”, pyscan::max_rect_labeled_scale);

pyscan.max_rect_labeled(r, max_width, lmpts, lbpts, disc_f)

Scans an approximate set of rectangles smaller than max_width in lmpts and lbpts by using a grid where each row and column of the grid can have at most \(1/r\) fraction of the total weight.

Parameters:
  • r – integer
  • max_width – double
  • lmpts – List of LPoints
  • lbpts – List of LPoints
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found rectangle and the maximum disc_f value

pyscan.max_rect_labeled_scale(r, max_width, alpha, lmpts, lbpts, disc_f)

Creates a regular sparse grid of size \(1/alpha \times 1/alpha\) and then considers subgrids of max_width size containing a large enough fraction of points to be worth scanning. On these small subgrids this method constructs a new grid such that each row and column of this grid can have at most \(1/r\) fraction of the total weight and then scans ever subgrid of this new grid. This method is usually faster than the previous method, but introduces some spatial error \(\alpha\).

Parameters:
  • r – integer
  • alpha – double
  • max_width – double
  • lmpts – List of LPoints
  • lbpts – List of LPoints
  • disc_f – Discrepancy function to maximize over.
Return type:

Tuple of the found rectangle and the maximum disc_f value

Kernelized Scanning

These methods use different scan statistics than the purely combinatorial versions.

pyscan.gaussian_kernel(bandwidth)

Constructs a gaussian kernel to be used with the smooth scanning function of form \(1/\sqrt{2 \pi} \exp(-u^2/(2 b^2))\) where b is bandwidth.

Parameters:bandwidth – double (kernel bandwidth)
pyscan.Bernouli_K(kernel)

Constructs a discrepancy object that can be used to scan in the smooth scanning functions. The kernel parameter defines how distance is measured and how the bandwidth is determined.

Parameters:kernel – The kernel used to determine distance.
pyscan.max_annuli(net, mpts, bpts, radii, disc_f)

Scans the set of annuli with radii defined by radii and the points in the net. For each annuli we maximizes the function disc_f with respect to the mpts and bpts.

Parameters:
  • net – List of Points
  • mpts – List of WPoints
  • bpts – List of WPoints
  • min_res – A float that corresponds to minimum radius to consider.
  • disc_f – KDisc object to maximize over.
Return type:

Tuple of a disk of radius equal to radii[-1] centered at the annuli and the maximum disc_f value

pyscan.max_annuli_scale(net, mpts, bpts, radii, disc_f)

Scans the set of annuli with radii defined by radii and the points in the net. For each annuli we maximizes the function disc_f with respect to the mpts and bpts. This version ignores far away points, so in theory can be much faster than the previous version.

Parameters:
  • net – List of Points
  • mpts – List of WPoints
  • bpts – List of WPoints
  • min_res – A float that corresponds to the radius to consider.
  • disc_f – KDisc object to maximize over.
Return type:

Tuple of a disk of radius equal to radii[-1] centered at the annuli and the maximum disc_f value

pyscan.max_annuli_scale_multi(net, mpts, bpts, res_scales, max_radii, disc_f)

Scans the set of annuli with radii less than max_radii. The algorithm uses the res_scales to define levels in the annuli for approximating the kernel values. For each annuli we maximizes the function disc_f with respect to the mpts and bpts. This version ignores far away points and uses a more efficient method for scanning all the annuli. In theory the runtime of this method should be as good or even better than the previous algorithm, but in practice is probably slightly slower.

Parameters:
  • net – List of Points
  • mpts – List of WPoints
  • bpts – List of WPoints
  • res_scales – A list of floats that correspond to the relative sizes of the different annuli.
  • max_radii – A float that corresponds to the maximum radius to consider. This will relate to the bandwidth of the kernel used.
  • disc_f – KDisc object to maximize over.
Return type:

Tuple of a disk of radius equal to radii[-1] centered at the annuli and the maximum disc_f value